Cast the question!

Lopez-Gazpio, Iñigo

EHUko IXA Taldeko ikertzailea

bota-galdera
Ed. Andrei Kovalev (Follow)/350RF

Do you remember little ones when the teacher finishes a topic "to see, cast your questions!" When does he say? At the same time the game of hiding began: the shoulder of the previous companion, behind the table, behind the door of the class... any movement of escape served us so as not to meet the teacher's gaze. To all these timid we bring you good news, since from today the computer will bear the kisses for us.

In the following lines we are going to expose the beginnings of the system that, starting from a corpus, will generate questions at the phrase level automatically and massively (these systems are known as QG (Question Generator). To do this, as we do in the kitchen to prepare a delicious dish, in the elaboration of questions it will also be necessary to follow a few concrete steps. In fact, the steps to follow to automatically create questions and, for example, to make a pizza are very similar. The process that we will continue to implement this whole question consists of four steps to follow in a sequential way:

1. Processing of the input corpus

First step, aimed at processing --analyzing linguistically - the input corpus. Therefore, in this section we will try to obtain the maximum information from the corpus of entry. Although this search for information is repeated in various fields of language processing, it is a fundamental task if we want to move forward with common sense. To address this section with desire, it will be necessary to use different linguistic resources, such as syntactic analyzers, morphological analyzers, semantic analyzers, lematizers, tokenizers, etc. Thus, we will be able to identify verbs --played and not plays-, names, adjectives, adverbs... in the original phrases, in addition to words and their nominal syntagmas, declining cases of nominal syntagma, syntagmatic compositions between verbs --simple verbs, compounds, auxiliaries... --and identification of the syntactic dependencies of the words of the phrases.

Even when we go to the kitchen, without realizing it, we do an analysis similar to that initialization: we prepare information and tools. First we take the manual we need to prepare the dish, then we read the ingredients we need, enumerate them and look for them, and then we put the tools we need on the kitchen table to perform the steps of the recipe without obstacles.

2. Selection of results of questions

Once we have listed the recipe dishes - better than we are in our refrigerator than in the store!-, the next step is to go for them. Tomato and ham in the refrigerator, cheese on the kitchen table, water in the fountain, flour in the shelf, salt in the cupboard... However, instead of using the tomatoes from his grandfather's orchard, we will use the tomato sauce of the previous day, which was very sweet and which surely stays very well.

In this second step of the question creator we are developing, as we have done in the kitchen, the goal is to go looking for the best results of the questions. In this section it is also necessary to use the natural processing resources of the language, since we must select the best of the components we have classified previously. This selection can be made through different sources of information, such as semantic networks, live and inanimate dictionaries, databases of articles, news, institutions, illustrious characters, etc. Thus, among all the words and syntagmas, we will choose those that give us more information and, therefore, will have more possibilities than the questions that we will generate are interesting.

3. Selection of pollsters

Little is missing to finish, we start to heat the oven and we are already able to smell the dish that we are preparing. We have the components above the table, the useful ones we need. So, it's time to get your hands dirty, as we have to prepare each of the ingredients: form the dough of flour, peel and scrape the cheese, remove the seeds to the tomato, remove the plastic ones to the ham...

Likewise, the work to be done in the process of generating questions is similar. In fact, we have analyzed the corpus and identified the results of the questions, so to be able to go ahead we need to stain our hands, even if it is in a virtual way, because we have to prepare the types of questions that we will generate. The choice of the question type, together with the question, is a task related to the work done in previous phases. Thus, if we have managed to identify dates in the corpus of entry, it would be logical to create a question of temporary type using for it a question “when” or of this type. If, on the contrary, we have managed to identify illustrious characters, it would be logical that we constituted a question associated with a character, in which case the question “who” was used. We will work as explained until the question type has been defined for all candidates identified in the previous phase.

4th Construction of questions

We have everything prepared, there is only the final straight line, that is, to unify all the work done, but we have to be careful because a last minute error can damage all the work done. After reviewing all the ingredients we have prepared, we start to work: spread the tomato over the pizza dough, spread the cheese and ham over and put the dish in the oven. Now we only have to wait about twenty minutes, meanwhile, prepare dishes, cutlery and glasses until the alarm of the oven warns us that the time has come to remove the food.

In the last step of the question generator system we will also have to perform similar tasks. That is, we have to take the original phrase and build the question from some patterns and rules. In addition, we will have to make other adaptations, such as the elimination of connectors, the adequacy of scoring marks, the adaptation of cases and verb marks, etc., but not only that, but you can also do other things that happen to us, since any work that aims to improve the questions will be welcome, such as the resolution of anaphorae and the exclusion of information other than the questions --rephrases, alegia-sumé.

Conclusions of the conclusions: Conclusions

As you have seen, the implementation of a system of mass generation of questions requires four steps: process the input corpus, select the results of the questions, select the questionnaires and finally build the questions. As easy as making a pizza!

Ed. Created from the images of the web http://openclipart.org. In public domain

The wonder-creative system we have developed - which is a very basic system of QG- and other systems capable of generating exercises automatically are resources closely related to the field of teaching-learning, since they considerably reduce the time of preparation of the material by the teachers. In general, automatic exercise creation systems are usually integrated into larger e-learning platforms, which allows you to have a lot of users who participate and learn in exchange for a small workload.

To finish, to all the timid ones we mentioned at the beginning you already know that your verguenzas can disappear as easily as making a pizza (although you need something more than twenty minutes), so leave your hiding places and cast your questions !

Bibliography Bibliography Bibliography

Official website of Wikipedia, http://www.wikipedia.org.
Eihera application area of the IXA group http://ixa.si.ehu.es/Ixa/Produkzioak/1273220198
Eulia application area of the IXA group http://ixa.si.ehu.es/Ixa/Produkzioak/1274694158
Ixati application area of the IXA group http://ixa.si.ehu.es/Ixa/Produkzioak/1273220525
Aldabe, I.; López de Lacalle, M.; Maritxalar, M.; Martínez, E.; Uria, L.: Source: An Automatic Question Generator Based on Corpora and NLP Techniques. UPV/EHU.
Yao, X.; Bouma, G.; Zhang, Y. Semantics-based Question Generation and Implementation. Johns Hopkins University, University of Groning. Saarland University.
Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila