Wliquidata, free collaborative database for knowledge

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

We all know Wikipedia, a free encyclopedia that is being formed among users around the world and that the Wikimedia Foundation manages and promotes. However, this Foundation has other less well-known projects of interest aimed at the joint generation of free knowledge: Commons for imaginary and audiovisual material, Wictionary for dictionaries, Wikibooks, Wikisource, Wikiversity… The latest is Wliquidata, a free database for knowledge. Although it has existed since 2012, it has flourished in recent years and has brought many interesting things.
wikidata-ezagutzarako-datu-base-libre-kolaboratibo
Ed. -

Wliquidata is a free and collaborative database for knowledge. But unlike Wikipedia, which is a collection of text articles and other graphic resources, Wliquidata is a collection of structured information consisting of records with few and brief fields. This database collects the dates and places of birth of people, as well as the numbers of cities and other data. And relationships are maintained, such as twinning between people, the provinces to which they belong and their territories, the taxonomic relationships of the species of

Another big difference with the Wikipedia is that there isn't one for every language. Because it's just data, there's only one multilingual Wliquidata. Subsequently, each data unit can have its name and description in as many languages as you want.

wliquidata structure

All kinds of data and their relationships are stored in the widows. But there are really only three types of data: elements, properties and expressions.

The element type data is used to express people, cities, songs, species of paper, abstract concepts, etc. Each of them has an identifier in Wíquidos, composed of the character “Q” and a number. For example, element Q1 represents the universe and can be accessed https://www.wliquidata/wiki/Q1; element Q12256717 refers to the Elhuyar brothers; element Q47588 refers to Euskal Herria... In addition, each item may have a name or tag, a description and several aliases or other names for each language.

On the contrary, the properties show the kind of information and resources that the elements can have. For example, the P31 property is used to indicate the type of element and there are properties to indicate the date of birth ( P569 ), to indicate that it is part of something ( P361 ), for authorship ( P51 )…

Finally, the expressions add information to the elements by relating them to a property to a value or other element. For example, almost all elements have an expression with the P31 (type) property that relates them to their type; almost all people have the expression P569 (date of birth)... For example, an expression may be Q937 (Einstein) – P31 (type) – Q5 (person), or Q937 (Einstein) – P569 (date of birth) – 1879/03/14, respectively, indicate that Einstein is a person and was born on that date.

Combining these three types of data allows you to get all the information about anything. Currently, Wliquidata has about 7,000 properties, almost 100 million elements and 1.4 billion declarations.

Also lexicographical information

Although at the origin of Wliquidata was only that (elements, properties and expressions), new types of data were added to save also lexicographical information. Their identifiers start with “L” and define the languages, words and categories (for example, the Basque word “nine”, of the category “name”, is L74178). A lexeme can take different forms, with a type of form that is identified by adding to the lexeme an identifier that starts with “F”. In addition to the form itself, you can keep grammatical traits and how many expressions you want. Finally, lexemes can also have different understandings, and to keep them there's a kind of meaning data.

With this structure you can form lexicons of any language. In addition, if the meanings are associated with the concepts of English, it is possible to establish interlinguistic relationships and, therefore, to form bilingual dictionaries among any pair of languages.

Utilities, thousands

And what could a database of this kind be worth? Why not! Use offers thousands of options and opportunities. Any user can download Wliquidata and use it for whatever they want. In the web interface simple searches can be made, but in addition to usual searches, consultations can also be made in the SPARQL language, which allow complex and interesting questions such as the “number of ministers born of a minister per country”.

And either through the API or through the download, programs can be developed to leverage the information. For example, in Wikipedia itself, infotablos (tables with information on the right at the beginning of some articles) are not edited manually, there are several written programs that can be used for this purpose in Wikipedia articles with one line. The program will take the information from Wíquita and complete the table and, in case the information is modified or updated in Wíquida, it will automatically appear in the infocloth table of the article without changing the update. This new Infotaul system was developed by the Catalan Wikimedia Church and the Basque Cultural Association of Wikilaris (EWKE). The company CodeSyntax also uses Wliquidata to create questions in a set of questions once a day.

As has been said, there is a single Wliquidata database, which can include information from all languages. Thus, as for the Basque language, it is essential that Wikipedia be as developed as possible in the Basque language, the presence of names and descriptions and lexicographical information in the Basque language also in Portuguese is very important. At Elhuyar, on behalf of the EWKE and in collaboration with them, we have carried out two projects. On the one hand, we write the definitions in the Encyclpedic Dictionary of Science and Technology of Elhuyar in 6,500 scientific and technological elements. On the other hand, in 2019 we incorporated the 10,000 most used names in the Elhuyar Pupil Dictionary, 65 forms of each and their acceptance and definitions. With this work, the Basque language became the sixth language in number of lexemes or roots, the second in number of word forms and the first in number of expressions. With the increments that have occurred, we are now ninth in lexemes, about 23,000, but we are still in the second forms, about 1,250,000, and we are the first, expressively, with almost 3,000.

It is therefore a very interesting project, Wliquidata, which is already very useful and which will become even more useful in the future as it grows.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila