Semantic web, existing and necessary technologies

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

In the May article, we saw the problems of the World Wide Web devised by Sir Tim Berners-Lee and the HTML format that supports it, and we explained somely what is the solution proposed by Berners-Lee himself, the semantic web. Below are the technologies and examples found under the semantic web, as well as the problems that exist to make it a reality.
Semantic web, existing and necessary technologies
01/06/2009 | Leturia Azkarate, Igor | Computer scientist and researcher
(Photo: 12RF)

In the semantic web are described by labels objects, people... and their relationships. In the labels, instead of explaining the shape and structure of the page, the meaning of the elements of the page is collected. This allows to create a network parallel to the HTML network, a knowledge base understandable by machines, encoded in expressive semantic formats. Once the machines are understood, they could effectively treat information and open the way to thousands of applications.

Technologies: Technologies: RDF, OWL...

However, in order to define all the concepts present in the network, it is necessary to have schemes and semantic labeling formats. W3C has defined these formats in different standards, being RDF and OWL the most important and known.

RDF ( Resource Description Framework ) is an XML based resource description format. It is based on three elements: resources, properties and property values. The resource is the one described and identified by a URL (web identifier or address). The property is a feature of the resource to be described. The values are specific values of the characteristics to be described (see example on the next page).

In this way we can describe what we want. But you have to agree on the labels that you have to use to describe each type of thing (people, music groups, books...), if not, the machines would still not understand them. To this contributes the language OWL ( Web Ontology Language ). OWL allows defining how objects or entities of a particular area of knowledge or life will be described.

This could be described by a number of the magazine Elhuyar and its articles by RDF
is shown in a simplified and simple way, using other label names in RDF format

A real example: Format RSS

A small example of the capacity of the semantic web we have it among us for a long time: the RSS format ( Really Simple Syndication ) that use blogs from the beginning and today other informative Internet. In fact, it is a type RDF (whose original name is RDF Site Summary) specializing in the news description. Blogs introduced great innovation, as they allowed the user to create content on the Internet without technical knowledge of computer science or HTML, and many new people started to put texts on the Internet. But blogs would not have been so successful if they were not for RSS format.

In fact, if blogs had only been published in HTML format, for a reader interested in the topics of some blogs it would not be easy to track them. You should periodically access all of them to see if there is something new. And that work, moreover, many times so that there is nothing new, or so that you do not remember what we read the last time... In the end, I could only follow up a few blogs.

But blogs, in addition to the HTML version for people, also had the RSS version for machines. In this version appeared the last entries or articles, each of them well differentiated by tags, and well structured the title of each, the author, the date, the summary, the link, etc., so that the machines understand them. In this way, RSS readers were created to follow up on the blogs that everyone has to their liking. The reader carries out a periodic monitoring of the RSS of our favorite blogs and shows the user only the existing news since their last entry, which allows to track dozens or hundreds of blogs. Specialized search engines were also created in blogs, collection and filtering services of RSS, newspaper and magazine websites, social networks, etc. One of the real "guilty" of the Web 2.0 revolution was the RSS.

Think that if a simple semantic labeling for blogs and news has done so, what will not happen when other concepts like people, goods, events are tagged semantically...

With blogs were created specialized search engines in blogs and services of catchment and filtering of RSS, as well as websites of newspapers and magazines, social networks, etc. One of the real "guilty" of the Web 2.0 revolution was the RSS.
Frank Podgoraiak/350RF

Content of the semantic web

However, everything is good. A few years ago the idea of the semantic web arose and it is costing him much to do it. It is not an easy task. On the one hand, we must define and agree ontologies for all the concepts that exist, and although there are things that have already been done, it is a huge job.

But, on the other hand, what is more important, then the content should be created in those formats, and that can be very laborious. We can't expect people who create the web to manually tag themselves in RDF format. Web pages are created for a long time using tools that should be those that adapt and generate content in semantic format, as the blogs platforms publish directly the RSS. In certain cases, it is to be expected that this occurs with a certain speed, as for example in those where the content is quite structured in itself (events calendars, for example) or in those that are of interest to companies (for example, descriptive sheets of products in online stores).

It will be more difficult to semantic label all the information that currently appears in texts written in natural language. When a text describes people, books, their characteristics, their relationships, etc., labeling semantic this, even with the help of visual tools, is a tremendous task. And it can not be done automatically, as in the case of the calendar or products of the stores...

Does the machine understand the text?

Thanks to the semantic web, we can search for the word sting and receive only tickets on the musician.
Eric Miller/W3C

Or yes. In several experiments, Natural Language Processing (NLP) techniques are being used to automatically extract semantic labelling of conventional texts, sometimes successfully. Web tools can integrate this type of LNP techniques and help creative content create a semantic labeling in a not so distant future. However, if the machines are really capable of doing well, the semantic web is not necessary, which means that the machines are able to "understand" the text and that the search engines and other Internet agents will be able to directly treat the texts in HTML format in an effective way.

We do not know who will arrive before, the semantic web labeled or the machines understand the semantic or meaning of the text. And, in the first case, it is not known how much content will be in the semantic web: semantic in which all the web will be labeled, or just some things (the simplest and of business interest), or something between them... In any case, in one way or another, the meaning on the web is going to have more and more importance, and thanks to the semantics we will have more and more services. Sir Tim Berners-Lee himself said in March of this year: "The web is not finished. The current website is just the tip of the iceberg. New technologies, much more powerful, will arrive that allow us to do things that we would never think. The best is about to arrive." So be it!

Igor Leturia Azkarate. Computer scientist and researcher.

Leturia Azkarate, Igor
Services Services Services
254 254
2009 2009 2009 2009 2009
Security security security security
022 022
Internet access Internet services
Digital world
24 hours 24 hours 24
Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila