Informatikaria eta ikertzailea
Elhuyar Hizkuntza eta Teknologia
One of the main themes of recent years is personalization, that is, to show each user of the web their content according to their tastes, ideas and hobbies. This already applies in many places. For example, several search engines offer the possibility of displaying customized results from our browsing history: previously performed searches, in which of the results of those searches we have clicked, which content we have selected to share on social networks, etc. Websites and music listening services, based on the songs we or our friends have heard, suggest new songs we like. Online stores treat in the same way, based on our previous purchases or attending the purchases of other customers who have acquired the same as those purchased by us. Also some media have started to show a different cover to each user, using the information of the articles we have read earlier.
In principle, although personalization seems a good idea, there are also those who question their suitability. At a truly recommended TED conference, Eli Parise warns that each of them is in danger of staying calm in their own bubble, of receiving only limited and partial information of everything in the world. Previously the media and TV channels leaked the information for us, and if we only read it and saw a type of newspaper and television networks, we only received unilateral information. The internet opened our eyes, but now customization filters can damage diversity again.
On the other hand, in order to display personalized content, these giant companies collect information about us, which generates a concern for privacy among many people. Very interesting is the opinion of Tim Berners-Lee, inventor of the web, and founder and director of W3C, in his keynote address at www2012. In his opinion, the use of the information that companies have about us to sell to others or for other incorrect uses is not correct and should not be done, but he does not believe that this practice is so common, because after all it is not the essence of the business of most companies, and those who do penalize long-term markets. It does not share some concern about the use of company information about us, if it is limited to using it to offer better service. He set the example of a clothing store: the seller of the shop that buys the pants remembers the size of his pants the previous time, always without having to try (the maximum of offering good service has always been “know your customer”).
However, it considers that those interested in companies not saving and using their information for nothing are entitled to it and that companies must respect it, for which W3C proposes that an optional “do not track” header be included in the HTTP protocol, since when a website receives a request containing it should not store any customer data. Almost all browsers have already implemented this option, but you have to see if companies respected...
Defending privacy can become unfavorable. In the European Union, for example, a law prohibits the storage of cookies (system used to store user preferences and others) without user consent. Without cookies, it's not just that a website can't be customized, it can't even be logged in! According to this law, more than 90% of websites are outlawed. So far, States have not enforced the law, but have recently allowed in Britain to impose fines of up to half a million pounds, and some of its websites have begun to make an annoying request for authorization of cookies.
Another issue that is heard a lot in recent years is that of the semantic web, and it has also been in the congress this year. We wrote this in the May and June 2009 issues. It is a parallel web, composed of significant structured information rather than text, unlike textual, easier to understand and properly treat machines. Intelligent advanced services about it, projects to extract structured content from text, etc. There were many such presentations.
The Knowledge Graph product, recently presented by Google, is also based on the semantic web. If we have searched for a specific person, place or thing, in addition to the list of common websites, it shows a table with structured and related information.
It is a fashion theme that there is no doubt that in the coming months and years will appear many services of this type. Although the truth is that long ago it is being said that the year in which the semantic web is going to explode will be the next, and that explosion does not come, it is also true that gradually more and more services are appearing.
A novelty at this year's congress has been the great presence of Natural Language Processing and language technologies. These technologies that have been working for a decade in the Elhuyar R&D Group, about which we have spoken frequently in this section (corpus, machine translation, technologies to make better search engines, systems of answer to questions, agents of dialogue...), have traditionally had little space in congresses related to web and search engines. These types of topics were discussed in conferences related to language or linguistic technologies, but so far the world of web and search engines has not seen their need. They have treated the language very simply: a simple stemming or scrub, a simple search for words...
One of the main reasons for the current interest in these technologies is that previous simple methods have culminated and have seen the need to deepen language analysis to improve results. In the case of search engines, for example, they have realized that to translate better results it is necessary to search by language, multilingual search, search for synonyms or similar words, automatic analysis of people's opinion to build rankings for it, automatic summary, answer to questions, and techniques as deep as these.
Another reason is the semantic, paradoxical web. In theory, with the semantic web, the machine is able to understand the structured information it contains without having to understand the language. But to solve certain problems (for example, to answer questions posed by the user in natural language) it is necessary to disambiguate (know which specific objects of the semantic web the question refers to or what specific properties it refers to). Or if you want to automatically extract structured content from the semantic web, linguistic technologies are necessary.
In addition to the three major topics mentioned, much was talked about social media. And mobile web, html5, video, 3D... The web continues to have many possibilities and paths to evolve and in the coming years we will be sure that this evolution so surprising.