专利摘要:
The invention relates to a method of creating a statistical classification model to automatically generate feedback from a set of features describing possible errors and a set of raw data units, each raw data unit comprising at least one response, correction and feedback. The method comprises: a) converting units with raw data into examples composed of one or more differences obtained by comparison of answer and correction, thus assigning a value to characteristics from said set of characteristics, b) deriving a set of classes from said feedback of the units with raw data, c) reducing the number of classes in the set based on similarities between classes, d) labeling each example with one of the classes of the reduced set, and e) building the statistical classification model through a machine learning classification tool, using the labeled examples as training data.
公开号:BE1022627B1
申请号:E2014/5103
申请日:2014-12-04
公开日:2016-06-20
发明作者:Brecht Stubbe;Ruben Lagatie
申请人:Televic Education Nv;
IPC主号:
专利说明:

FIELD OF THE INVENTION The present invention relates to the field of artificial intelligence systems to automatically provide feedback on errors to course participants of, for example, a language course.
Background of the Invention Feedback is an essential element in a student's learning process. Providing appropriate feedback to the student about the quality of his / her answer, both in terms of grammar and communication suitability, is a key requirement of any stimulating CALL environment (Computer Assisted Language Learning). In addition, the creative use of language in a communicatively relevant environment must be encouraged. Filling in one of these two requirements is easy. However, the combination of the two, for example when evaluating a student's answer to an open-ended question, is very difficult. The reason for this is that the possible mistakes a student can make in such a situation are virtually endless. It is generally known that it is very difficult to build a system that allows free text input and at the same time gives detailed feedback to students about their language errors.
The ultimate goal of any CALL system is to robustly model the cognitive behavior of people in a specific social role, for example, that of a language teacher. To achieve this, important aspects of human cognitive skills must be simulated in a machine. When one does that, one speaks of so-called intelligent CALL systems. Such systems use artificial intelligence (A1) and natural language processing (NLP) techniques to introduce the intelligence needed to emulate human teachers. One way to gain intelligence is by parsing, i.e. a technique that enables a computer to encode complex grammatical knowledge, such as people do to compile sentences, recognize errors, and make corrections.
Modern solutions can be categorized into two types: parser-based CALL or statistical CALL.
Parser-based CALL uses parsers to analyze the student's answer. Traditional parsers are unable to cope with incorrect entries, so they must be adjusted to be more robust. In order to provide feedback, information about the errors must also be stored and then used to compile an appropriate feedback message. This is sometimes called sensitive parsing. There are some well-known limitations to parsers: - parsers are not infallible: some errors do not detect them and they reject some sentences that are grammatically correct, - parsers rarely go beyond syntax: they are usually unable to detect semantic or pragmatic errors detect, - sensitive parsers are very expensive: some methods were used such as error rules and relaxation of restrictions, but both require the creation of detailed grammar rules or (weighted) restrictions for faulty constructions and they often change the internal operation of the parser, whereby procedural methods are introduced in an inherently declarative algorithm. - parsers are language and context dependent: the vocabulary and all grammar rules, error rules and limitations that have been carefully designed can only be used for a specific language and domain, - sensitive parsing is arithmetically complex: due to the inherent ambiguity of a language and the large number of ways in which a mistake can be made, the most sensitive "sensitive parsing" techniques must use methods to avoid an excessive calculation burden, thereby reducing overall accuracy and reliability.
Despite their complexity and cost, sensitive parsers are still unable to parse random ungrammatical input. That is why the research world has increasingly focused on statistical CALL techniques. Such methods only need data, preferably annotated data, and learn language patterns themselves. They are inexpensive to develop and often use readily available classification algorithms to classify input, i.e. (parts of) a student's answer. A number of different classifiers were used, such as Bayesian classifiers, decision lists, decision trees, transformation-based learning tools, SVMs (Support Vector Machines) and classifiers with maximum entropy.
The choice of classification means has no significant impact on efficiency. More important is the choice of classes and features to be used. Classification can be performed in two different ways. First, a binary classification can be used to build a model that detects a specific type of error. The attributes can then be selected so that they are relevant to the error type in question. To detect a variety of errors, a model must be created for each error type. Secondly, it is also possible to use classification with multiple classes by assigning a class to each error type and putting together a single model that classifies sentences into one of the specified classes. Both methods have some disadvantages.
The biggest problem with binary classification is that detecting multiple errors requires multiple consecutive classifications and this often results in more complex errors not being detected correctly. Memory usage is also less efficient since the same features are often present in multiple models, such as the word type (POS) of the word. Multiple-class classification performs better, but the inclusion of all characteristics in a single model often means that characteristics are relevant to some error types and irrelevant to others. This often leads to the classification model learning incorrect patterns.
The latter method was used in the paper "Automatic Error Detection in Japanese learners' English spoken data" (Izumi et al., ACL '03 Proceedings 41st Annual Meeting on Association for Computational Linguistics, pp. 145-148, July 2003). They describe a method for detecting grammatical and lexical errors of Japanese students learning English. The set of features includes a word type, a grammatical / lexical system and a corrected form. Special tags are provided for some errors that cannot be categorized into any word class.
With sufficiently cleared data, statistical CALL systems are relatively simple and inexpensive to build, but their accuracy is too low to be of practical use. They usually only detect half of the errors present in a text and about 30% of the errors that they detect are actually no errors at all. In addition, systems that can identify errors make an incorrect classification in about 50% of the cases. The system proposed in the aforementioned Izumi paper also suffers from these poor performance figures. The most important cause of this poor performance is the choice of characteristics. A lot of attention and money was spent on searching for better characteristics.
A simple way to generate feedback is to resubmit the answers to the user and mark where, and possibly which, letters (or words) are missing, are redundant, or are wrong. This is an example of repeat feedback with markers. This approach was used in the implementation of the LISC (Language Independent Sequence Comparison) application which, as the name suggests, is language independent because it is only based on approximate string matching.
To be able to give meaningful feedback to students who learn a language by practicing, one must not only state what is wrong, but also why it is wrong. For a grammatical error this means that one must be able to provide feedback explaining the relevant grammar rule. This is sometimes referred to as meta-linguistic feedback. Metalinguistic information generally gives either a form of grammatical meta-language that refers to the nature of the error or a word definition in the case of lexical errors.
Proposing metaling-optical feedback in parser-based CALL is simple, simply adding a feedback message to each error line describing an error. A large number of rules can be defined whereby feedback messages are added to a set of operations. It goes without saying that one cannot define rules for all possible errors. Even if this were feasible, the rules would be too complex and too abstract to do this manually, the edits are not directly linked to the grammar rules. A solution can be found in the domain of machine learning.
Accordingly, there is a need for an improved tool to help create a learning system capable of automatically generating feedback.
Summary of the Invention It is an object of embodiments of the present invention to provide a computer-implemented method for creating a classification model to automatically generate feedback based on machine learning.
The above object is achieved by the solution according to the present invention.
In a first aspect, the invention relates to a computer-implemented method to create a statistical classification model to automatically generate feedback, starting from a set of features describing potential errors and a set of raw data units, each unit with at least one answer, correction and feedback with raw data. The method comprising the steps of: a) converting raw data units into examples that are composed of one or more differences obtained by comparing the answer with the correction, thus assigning a value to features from said set of characteristics, b) deriving a set of classes from said feedback from said raw data units, c) reducing the number of classes in said set based on correspondences between classes, d) labeling each example with one of said classes of said reduced set, and e) compiling said statistical classification model to automatically generate feedback through a machine learning classification means, using said labeled examples as training data.
The proposed method indeed provides a classification model that is capable of automatically providing feedback. Initially, a set of features is available that describe potential errors, as well as an amount of raw data units including an answer, correction, and a feedback portion. By comparing response and correction for each unit of raw data, one or more differences are determined so that a value can be assigned to features from the set of features for the different units of raw data. A number of classes is derived based on the feedback in the raw data units. The number of classes is then reduced on the basis of agreements between classes. This reduction step is necessary because semantically identical but differently formulated feedback has a negative effect on performance and accuracy. The best accuracy is achieved when each error is uniquely identified by a single feedback message. In a labeling step, the various examples are then assigned to a class from said reduced set of classes. The initial feedback message that is assigned to the raw data unit is used to identify the correct feedback message. This step varies in complexity from a simple one-to-one assignment of feedback messages to classes, to a complex categorization of errors, depending on the information in the resulting feedback message. A machine learning classification tool then uses the labeled examples as training data to compile the classification model to generate automatic feedback.
In a preferred embodiment, the one or more differences correspond to operations required to convert the response to the correction.
When the raw data units are converted into examples, approximate string matching is preferably used.
In an advantageous embodiment, the machine learning classifier is a decision tree classification model. Preferably, the classification tree with decision tree is a C4.5 classification model that is adapted to support examples that include multiple operations.
In a preferred embodiment, the characteristics are derived from linguistic knowledge. In one embodiment, the linguistic knowledge is retrieved from a lemmatizer or a word tagger.
In another embodiment, the step of reducing the number of classes is based on agreements in the area of Lifetime Distance or regular expressions.
In certain embodiments of the invention, the characteristics determine audio properties or relate to mathematical calculations.
In one embodiment, the step of reducing the number of classes is based on external information. The external information is advantageously derived from the wrong classifications from previous training phases.
In another aspect, the invention relates to a program that is executable on a programmable device that contains instructions that, when executed, perform the method as described above.
In order to summarize the invention and the realized advantages over the prior art, certain objects and advantages of the invention have been described above. It goes without saying that all such objectives or advantages are not necessarily achieved according to one specific embodiment of the invention. Thus, for example, persons skilled in the art will recognize that the invention may be embodied or embodied in a manner that achieves or optimizes one advantage or group of benefits as described herein, without necessarily realizing other goals or benefits described or suggested herein. .
The above and other aspects of the invention will become clear and further explained with reference to the embodiment (s) described below.
Brief description of the drawings The invention will now be further described, by way of example, with reference to the accompanying drawings, in which like reference numerals refer to like elements in the various figures.
Fig. 1 illustrates a flow chart of an embodiment of the method according to the invention.
Fig. 2 illustrates an example of a decision tree.
Detailed Description of Illustrative Embodiments The present invention will be described with reference to specific embodiments and with reference to certain drawings, but the invention is not limited thereto, but is only limited by the claims.
Moreover, the terms first, second, etc. are used in the description and in the claims to distinguish between similar elements and not necessarily for describing a sequence, either in time, in space, in importance or in any other way. It is to be understood that the terms used are interchangeable under proper conditions and that the embodiments of the invention described herein are capable of operating in sequences other than those described or illustrated herein.
It is to be noted that the term "comprising" as used in the claims should not be interpreted as being limited to the means specified thereafter; it does not exclude other elements or steps. It must therefore be interpreted as a specification of the presence of the listed features, units, steps or components referred to, but it does not exclude the presence or addition of one or more other features, units, steps or components or groups thereof. Therefore, the scope of the expression "a device comprising means A and B" should not be limited to devices consisting only of parts A and B. It means that with regard to the present invention, the only relevant parts of the device A and B to be.
References in this specification to "one embodiment" or "an embodiment" mean that a particular feature, structure, or feature described in connection with the embodiment is included in at least one embodiment of the present invention. Statements of the phrase "in one embodiment" or "in an embodiment" at different places in this specification do not necessarily all refer to the same embodiment, but it is possible. Furthermore, the specific features, structures or characteristics may be combined in any suitable manner in one or more embodiments, as will be apparent to those skilled in the art from this disclosure.
In a similar manner, it should be noted that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped in a single embodiment, figure, or description thereof to streamline disclosure and understanding of one or more of the facilitate various inventive aspects. However, this method of disclosure should not be interpreted as an expression of an intention that the claimed invention requires more features than expressly stated in each claim. As shown in the following claims, the inventive aspects lie in less than all the features of a single preceding disclosed embodiment. Therefore, the claims that follow the detailed description are hereby explicitly included in this detailed description, wherein each claim stands on its own as a separate embodiment of the present invention.
In addition, since some embodiments described herein include some, but not other, features included in other embodiments, combinations of features of different embodiments are intended to fall within the scope of the invention and to form different embodiments, such as will be understood by someone skilled in this field. For example, in the following claims, any of the claimed embodiments can be used in any combination.
It should be noted that the use of particular terminology in describing certain aspects of the invention does not imply that the terminology herein is redefined to be limited to any specific features of the features or aspects of the invention with which that terminology is associated.
Numerous specific details are set forth in the description given here. However, it is understood that embodiments of the invention can be worked out without these specific details. In other cases, well-known methods, structures and techniques were not shown in detail in order not to obstruct the understanding of this description.
The present invention describes a computer-implemented method of teaching a computer to derive a classification model to generate a feedback portion when the computer receives data that is typically entered by a student. The derivation of the model is based on a set of features that describe potential errors and data units including an answer, a correction and a feedback section, as described below.
In the approach of the invention, meta-linguistic feedback is provided using machine learning. A system can be built in such a way that it learns which feedback message is suitable for a certain error based on previous experiences. In essence, the system learns from and offers support to teachers who have provided manual feedback in the past.
The proposed approach applies to a wide range of learning environments. One of the most important application domains is learning a (foreign) language. In the following, the case of learning a language is taken as an example to describe the invention technically in detail. However, it is emphasized that language learning is only one possible field of application. Another area of application is learning mathematics. This and a few other alternatives are described in more detail at the end of this description.
The proposed approach uses better, more detailed features to describe the errors extracted by combining approximate string matching and natural language processing techniques to compare student responses with known, correct solutions. In many applications, such as translation exercises, it is possible to provide the correct solutions. Alternatively, it is sometimes possible to generate the solutions automatically, for example using machine translation in the case of translation exercises or a mathematical reasoning system for math exercises.
In the invention, the object is to teach a computer to recognize language errors and give feedback about the errors. Language errors are therefore suggested by showing the computer the wrong sentence and the correction. For each such error, the computer is told what feedback would be shown to the student. The system should then be able to provide the same feedback for similar errors. Suppose the computer sees the following error:
and the computer is told that the feedback is next
Then the objective is that if the student makes a similar mistake as this one:
the computer recognizes this as the same error and is able to provide the same feedback.
As mentioned above, modern algorithms to generate feedback analyze the student's input to detect and provide feedback for a range of language errors. All of these methods achieve this objective by comparing the sentence in question with a predefined language model describing correct language use. For any structural deviation between the input and the model, the meaning most likely is derived. Using the same algorithm, the changes required to transform the input to the correction are then presented as feedback.
A logical first step of any algorithm to generate feedback is to find the closest correction. In many applications, such as post-editing and word processors, however, there is no indication as to whether a sentence is correct or not, nor is the correction of such a false sentence available. Consequently, there is no alternative but to use a complex language model as a comparison.
However, in learning environments, students often answer specific questions, such as translation or fill-in exercises. In this case the teacher cannot predict all the possible wrong answers that the students give, but it is possible to provide the right solutions to such a question. As the solutions are available, it is therefore possible to define more detailed features that are derived from the difference between the error and the correction.
This approach differs significantly from the current state of the art. First, the system does not have to parse the sentence to determine its accuracy. It can simply compare the input with the solution. Secondly, detailed features can be derived based on the transformations needed to turn the answer into correction. Third, because these characteristics are based on operations and because any error can be translated into such operations, the characteristics are universal and able to describe every possible error. Finally, the operations can be described without any linguistic information, for example by means of the string operation, which results in a language-independent solution.
Unlike any other method of generating feedback, the proposed approach does not model the correct or incorrect patterns of a language. It also does not attempt to use such a model to determine whether a certain sentence is correct or not. Because the corrections are available, the operations needed to correct the errors can be modeled. In summary, the model used in this invention does not describe language structures, but transformations. Each transformation can then be classified as (part of) an error and corresponding feedback message.
By way of example, the approach to building such a model will be explained on the basis of a set of language errors. However, the same approach can be followed to provide feedback in other areas of application, such as math, science, etc. Fig. 1 illustrates a high-level block diagram of the proposed approach. The different blocks of the diagram are explained in detail in the description below.
Each classification model is built on the basis of training data. In the case of supervised learning, where information is always needed to train a model, each training example is annotated with its associated class. Based on this, the classifier can learn patterns that can identify to which specific class an example belongs. It is this collection of patterns (often implemented as rules) that make the classification model. The patterns specify these relationships by comparing specific characteristics of an example. Take for example a classification system that, based on an image of an animal, tries to identify the class of the animal (invertebrate, fish, amphibian, ...). Such a system should compare relevant features such as color, shape, size, etc.
The training data set is essentially a table with n rows, one for each example, and k + 1 columns with a value for each of the k characteristics and the corresponding class. This means that both the characteristics and the classes must be defined in advance. As mentioned above, the approach of the invention attempts to classify the differences between the incorrect and the correct sentence into an error or its explanation. An important part of this invention therefore relates to the comparison of input and correction, and the subsequent transformation to features.
In the case of language errors, a possible comparison technique is approximate string matching (ASM). This technique calculates the correspondence of two strings based on the edit distance, which is defined as the number of string operations needed to convert one string into the other. Depending on which operations are considered, different processing distances have been defined. The Hamming distance only allows replacement and therefore only applies to strings with the same length. The longest common substring takes into account insertions and deletions. The Levenshteina distance provides the three basic operations for strings: replacement, insertion and removal.
An example for clarification: the Levenshteina distance between string ADF and AFE is two, because the sentence can be transformed by deleting D and adding E. Note that it is also possible to replace D with F and F with E, which results in the same distance value. Thus, although the distance value is deterministic, the operations behind it are not.
The algorithm for calculating the Life Distance between two strings uses dynamic programming by continuously reducing the problem to calculate the distance between smaller substrings until the substrings have a length of 1, making the calculation trivial: the distance between two characters is 0 if they are equal, otherwise the distance is 1. The results of the sub-calculations are stored in a matrix nxm, in which essentially each of the n elements of the first string is compared with the m elements of the second .
The Levenshtein distance is simply a number and can essentially be used as a feature in the classification model. However, it is impossible to differentiate all possible language errors based solely on the number of operations that they produce. Fortunately, the same algorithm can be used to calculate individual operations. This yields much more detailed characteristics.
The matrix composed by the dynamic programming method can then be followed from the back to the front to find the actual operations. This algorithm is executed twice, once at the level of words and a second time at the level of letters. The first step calculates the words to be added, deleted, or replaced, and for each replacement, compare both words to determine which letters should be added, deleted, or replaced.
An operation can be represented as a tuple (letter from, letter to). In the case of an addition, the letter is empty and in the case of a deletion, the letter is empty. So for both of the above examples, one would have the following attribute values:
However, this set of features, although more detailed than the distance, is still not enough. For example, take the following error:
This very different error naturally results in the same characteristic values as above. Even if the relative position of the operation were to be included in the word, one would still have the same values (both were added at the end of the word). It is clearly advantageous to include information about the word.
A simple solution would be to include the word from and the word to as features (making the letter operations redundant). However, adding these characteristics has a limiting effect. Not only is there a large amount of correct words (word to) in every language, the number of errors (word to) is even greater. Very detailed features can help the classification tool, but there is a good chance that they will not be used often. They are only useful when many values are present in the training set.
It is better to add a nominal feature with fewer possible values. One possibility is to use the word lemma instead of the word itself. In this way, all conjugations are mapped to the same lemma, which effectively reduces the amount of possible values. However, there are still many possible entries. Another more restrictive feature is the use of the word type (POS or part-of-speech) of the word in question. There are traditionally eight types of words in English: verb, noun, pronoun, adjective, adverb, preposition, conjunction and interjection.
To determine the POS of a word, a so-called POS tagger is required. It assigns a tag to every word in the sentence. There are many more tags than POS because a tag often contains more information, for example the verb time, the plural of the noun, etc. Various techniques are possible to tag a sentence, but each technique uses classification and they are all trained on an annotated corpus. Some commonly used examples are n-gram taggers (ie generalized unigram taggers that look at the individual word in the corpus and take n-1 surrounding words into account, and store the most frequent associated tag), the Brill tagger (which is a tagger, eg a unigram tagger or an n-gram tagger, used for initialization, but then learns a number of transformation rules that improve the tagger), taggers that use hidden Markov models (HMM) and the Viterbi algorithm.
The features can now be expanded with these POS tags. It makes sense to include both the tag of the word to and the tag of the word from. Simple grammar errors such as time or plural errors can then be easily detected. Spelling errors can result in an attempt to tag a non-existent word, but these words are mapped by nouns and therefore not a real problem. The exact values of the attributes are not really important, as long as the same attributes describe the same error.
A suitable set of features is, for example: {Letter from, Letter to, Position in word, Position in sentence, Word from, Word to, Lemma from, Lemma to, POS tag from, POS tag to}. Another useful feature is the relative index of the operation in the sentence and in the word. The list of features mentioned here is not exclusive. Other features can be added to improve accuracy.
Now that a set of features has been obtained, the next step is to convert units of raw data into a data set as training input for the classifier. In the case of learning a language, the input and correction phrases are first tagged with POS, then a word level Life algorithm is used to search the word operations and finally the letter operations are extracted for each word replacement. In each subsequent step, a reference to the corresponding word and its POS tag is retained.
The traversal Life algorithm is adjusted to take into account the lemmata and POS tag to resolve any ambiguities. Consider, for example, the comparison between "They gave her flowers" and "They have given her flowers". The transformation can be performed in two ways, with the same number of letter operations. The algorithm can replace either "gave" by "have" and "given" add, or add "have" and replace "gave" with "given." By using linguistic information such as the POS and the lemmata, the second transformation can be preferred.
The following example is considered:
The resulting training data would then be the following
wherein VBP, VBD and VBN respectively point to a verb in the non-3rd-person singular present tense, the past tense of the verb and the past participle of the verb. If there is sufficient such data, the classifier would teach that this feedback should be given when a verb is added in the past and another verb is changed in the past to a past participle. It may also need to learn that the lemma of the first verb must be "have" and that the lemma of the second verb must remain the same.
Other features can be added if these lines are still insufficient to differentiate all feedback messages. The position in the sentence can be added so that such feedback only applies when the first verb is in the immediate vicinity of the second. In addition, as already mentioned above, one can also add the individual letter operations that are useful for detecting orthographic errors, such as accents.
On the basis of the above example, it is clear that there are often multiple training examples for a single error. When classifying an error, this results in multiple class distributions. The most likely class can then be found by taking the product from the capabilities of each class. This method assumes, similar to a Naïve Bayes approach, that the examples are independent of each other. But as is the case for Naïve Bayes, such an assumption, even if incorrect, is often useful.
It is possible to introduce dependencies by repeating the set of characteristics of each operation, as shown in the table below. This ensures that there is only one example for each error and takes into account the simultaneous occurrence of operations and the order. For language errors, however, the number of operations and their order is not consistent; the same error can be produced in different ways and there can be several separate errors in a student's response.
Another advantageous approach is to change the classification algorithm to be able to classify examples that are composed of multiple operations. This approach will be illustrated by the C4.5 decision tree.
A classification tool with decision tree has a number of essential advantages. First, this is capable of ignoring characteristics that would reduce the performance of the classification tool. Secondly, the resulting model is easy to understand, even by people. Consequently, it is able to explain why it makes a certain prediction. Thirdly, the algorithm is relatively easy to implement, and more importantly, easy to adjust.
An example of a simple decision tree is shown in FIG. 2. Each node in the tree represents a decision point based on an attribute. For each possible value of said attribute, a sub-tree further subdivides the data. The algorithm therefore inherently considers feature dependencies, as opposed to a Naïve Bayes approach. The class divisions of the training data are stored in the leaves.
The C4.5 algorithm trains its decision tree in two phases. First it builds a tree that maps the training data perfectly. This results in an overfitting: the tree predicts the training data perfectly, but performs poorly on new test data. To counter this, the algorithm pruning this tree to remove specific branches.
The C4.5 algorithm can be adapted to support sub-examples by basing a decision on the characteristic of one of its operations. The selection of this operation must be deterministic over examples. A simple selection criterion may be, for example, to select the nth operation arranged at its position in the sentence. This is identical to the approach described above where the characteristics are repeated in the training example. As stated above, this assumes that the presence of language errors always results in the same number and sequence of operations.
In the case of learning a language, it would be a better approach to select an operation based on one of its characteristics. This would introduce another type of decision that selects an operation based on an attribute value and stores it in a register. Sub nodes can then further base a decision on an attribute value of the same operation. Take the example with the verb against the verb time explained above as an example. The use of the wrong verb time results in two word operations. The decision tree, or at least a part thereof, may resemble that in FIG. 3.
In the previous sections, the features were described, as well as how they should be constructed and possible ways to adjust the classification algorithm to handle multiple operations. As mentioned above, each example of the training data is annotated by a class. In this invention, this class corresponds to a feedback message for the error in question. This can be as simple as a one-to-one agreement, but more advanced techniques can also be used.
Depending on the raw data, presumably collected through teachers who provide feedback for specific errors, a different feedback message is given to the same error. The reason for this is two-fold. First, there can be multiple teachers and it is unlikely that two teachers will always give identical feedback for the same errors. Secondly, the feedback can be more specific and contain information about the context of the error.
One of the necessary conditions of each classification algorithm is that there must be sufficient features to distinguish between classes. In this case, however, there may be a different feedback message for exactly the same set of operations. This has a negative influence on the accuracy and must therefore be avoided. In the present invention, this problem can be solved by adding additional features, such as the identity of the teacher. Feedback is also often personalized for the student, so it is advantageous to include details about the student, such as age, gender, level, etc.
Training data has shown that often different feedback was given for the same error to the same user and there were no clear features that could explain why. Therefore, a pre-processing step is added that uses an approximate string matching algorithm to identify almost identical feedback messages and remove duplicates. More complex techniques can also be used, ranging from a simple regular expression to specialized systems.
Take these two feedback messages, for example: "Pay attention to the verb conjugation!" and "You have conjugated the verb incorrectly". Both messages are about the same error, but are expressed differently. To identify such similar feedback messages, for example, string comparison techniques can be used to discover that both strings contain "verb" and the second string contains a word similar to "conjugation". Regular expressions can also be used that describe such similarities.
A more advanced approach is to iteratively train the above classifier and examine the incorrectly classified examples. Based on this, a list of feedback messages can be obtained that the system often classifies incorrectly. This can be useful to help identify potential similarities between classes.
Some other fields of application of the invention are described briefly below. First, it was demonstrated how the system can be used to provide feedback for natural language errors, but it can also be easily applied to provide feedback for programming languages or even mathematical calculations. Programming languages have less ambiguity and follow fewer and less complex rules. The same applies to mathematical calculations, although the language is more abstract and sometimes requires a numerical approach. For example, take this error when decomposing factors: ny2 + 4y2 = 4ny2, which should of course be: ny2 + 4y2 = (4 + n) y2. The input can then be compared to the correction and the feedback can be represented as "The coefficients must be added, since the common element from the sum has to be factorized." In a similar way, it would be insufficient to consider only one string comparison. Other advanced features are needed that consider the entire formula.
A second application is that of providing feedback on questions that require specific knowledge about the field, for example geography and history. Take the question: "What is the capital of Belgium " If the student answers "Paris", appropriate feedback would be: "Paris is the capital of France, the capital of Belgium is Brussels". Providing this feedback solely on the basis that there is a transformation from the word Paris to Brussels is insufficient unless all exercises involve identifying capitals. Otherwise, an attribute can be added to the set of attributes that identifies the pedagogical tasks of the question. One can even include characteristics of domain knowledge that are similar to the POS tag or function. In this case, an attribute such as "capital Van", to indicate the country whose element is the capital, can be used to distinguish errors against capitals from other errors.
A final application is that of correcting recorded speech or even music, such as piano playing. The student's input can be compared to the teacher's by comparing audio waves, potentially ignoring pitch, speed, and level differences. Characteristics can then be extracted from this comparison. Take, for example, the piano-playing student who plays the wrong note while playing Für Elise van Beethoven. If this is a common error, appropriate feedback can be given using the same algorithm.
Although the invention has been illustrated and described in detail in the drawings and foregoing description, such illustrations and descriptions are to be considered as illustrative or exemplary and not restrictive. The foregoing description explains certain embodiments of the invention in detail. It should be noted, however, that no matter how detailed the foregoing is contained in the text, the invention can be made in many ways. The invention is not limited to the disclosed embodiments.
Other variations on the disclosed embodiments may be understood and performed by persons skilled in the art and by practicing the claimed invention, through a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps and the indefinite article "a" does not exclude a plural. A single processor or other unit can perform the functions of different items in the claims. The mere fact that certain measures are listed in mutually different dependent claims does not mean that a combination of those measures cannot be used to benefit. A computer program can be stored / distributed on a suitable medium, such as an optical storage medium or semiconductor medium supplied with or as part of other hardware, but can also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems. Any references in the claims should not be construed as limiting the scope.
权利要求:
Claims (12)
[1]
CONCLUSIONS
A computer-implemented method for creating a statistical classification model to automatically generate feedback for an application in a learning environment, comprising the steps of: * taking a set of raw data units related to said application in said learning environment wherein each raw data unit comprises at least a first portion of data representing a response, a second portion of data representing a correction and a third portion of data representing feedback, and a set of features describing potential errors in said first portions of data, * converting raw data units into examples composed of one or more differences obtained by comparing said first portion of data representing the response with said second portion of data representing the correction, thus assigning a value to features from said set of features, * the derivation of a set with classes from said third portion of data representing feedback from said raw data units, * reducing the number of classes in said set based on correspondence between classes, * labeling each example with one of said classes of said reduced set, and * compiling said statistical classification model to automatically generate feedback through a machine learning classification tool, using said labeled examples as training data.
[2]
A computer-implemented method of creating a statistical classification model according to claim 1, wherein said one or more differences correspond to operations necessary to transform said first portion of data representing said response into said second portion of data representing said correction proposes.
[3]
A computer-implemented method for creating a statistical classification model according to claim 1 or 2, wherein in the step of transforming said raw data units into examples, approximate string matching is used.
[4]
A computer-implemented method according to any of claims 1 to 3, wherein said machine learning classification means is a classification tree with decision tree.
[5]
A computer-implemented method according to claim 4, wherein said decision tree classification model is a C4.5 classification model adapted to support examples including multiple operations.
[6]
A computer-implemented method according to any one of the preceding claims, wherein said features are derived from linguistic knowledge.
[7]
A computer-implemented method according to claim 6, wherein said linguistic knowledge is retrieved from a lemmatizer or word tagger.
[8]
A computer-implemented method according to any one of the preceding claims, wherein said step of reducing said number of classes is based on agreements based on Life distance or regular expressions.
[9]
A computer-implemented method according to any of the preceding claims, wherein said features characterize audio properties or are related to mathematical calculations.
[10]
A computer-implemented method according to any of the preceding claims, wherein said step of reducing said number of classes is based on external information.
[11]
A computer-implemented method according to claim 10, wherein said external information is derived from incorrect classifications from previous training phases.
[12]
A program executable on a programmable device comprising instructions that, when executed, perform the method according to any of claims 1 to 11.
类似技术:
公开号 | 公开日 | 专利标题
JP6618735B2|2019-12-11|Question answering system training apparatus and computer program therefor
US9342499B2|2016-05-17|Round-trip translation for automated grammatical error correction
US8548805B2|2013-10-01|System and method of semi-supervised learning for spoken language understanding using semantic role labeling
US7412385B2|2008-08-12|System for identifying paraphrases using machine translation
Sukkarieh et al.2003|Automarking: using computational linguistics to score short ‚free− text responses
Shaalan et al.2015|Analysis and feedback of erroneous Arabic verbs
Hana et al.2014|Building a learner corpus
KR102199835B1|2021-01-07|System for correcting language and method thereof, and method for learning language correction model
CN111428104A|2020-07-17|Epilepsy auxiliary medical intelligent question-answering method based on viewpoint type reading understanding
Mudge2010|The design of a proofreading software service
Becerra-Bonache et al.2014|Linguistic models at the crossroads of agents, learning and formal languages
BE1022627B1|2016-06-20|Method and device for automatically generating feedback
US20220019737A1|2022-01-20|Language correction system, method therefor, and language correction model learning method of system
Bastianelli et al.2015|Using semantic maps for robust natural language interaction with robots
JPWO2004084156A1|2006-06-22|Template-Interactive learning system based on template structure
Brajković et al.2017|Tree and word embedding based sentence similarity for evaluation of good answers in intelligent tutoring system
Chen et al.2003|A new template-template-enhanced ICALL system for a second language composition course
Berleant1995|Engineering “word experts” for word disambiguation
Xiao et al.2018|Automatic generation of multiple-choice items for prepositions based on word2vec
Fong et al.2013|Treebank parsing and knowledge of language
KR20180093791A|2018-08-22|System and method for biometric terminology object recognition
Quan et al.2012|KU Leuven at HOO-2012: a hybrid approach to detection and correction of determiner and preposition errors in non-native English text
Qiu et al.2012|Joint segmentation and tagging with coupled sequences labeling
Ge2010|Learning for semantic parsing using statistical syntactic parsing techniques
Kloppenburg et al.2016|Native-data models for detecting and correcting errors in learners’ Dutch
同族专利:
公开号 | 公开日
EP2884434A1|2015-06-17|
BE1022627A1|2016-06-20|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP2001166789A|1999-12-10|2001-06-22|Matsushita Electric Ind Co Ltd|Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end|
CN103548041B|2011-06-28|2016-06-29|国际商业机器公司|For determining the information processor of weight of each feature in subjective hierarchical clustering, methods and procedures|CN110245265B|2019-06-24|2021-11-02|北京奇艺世纪科技有限公司|Object classification method and device, storage medium and computer equipment|
法律状态:
优先权:
申请号 | 申请日 | 专利标题
EP13196364.7|2013-12-10|
EP13196364.7A|EP2884434A1|2013-12-10|2013-12-10|Method and device for automatic feedback generation|
[返回顶部]