Background

Language processing applications (search, mining, writing, translation, speech, etc.) depend on a basic text processing infrastructure. Such infrastructure is tedious and expensive to develop, since the effort multiplies by languages supported. Therefore LT products are often limited to a few languages and only work well in English.

Even the availability of language resources and tools as provided by META-Share could not solve the problem. To draw a comparison to location-based services: Only the world’s biggest IT companies could convert resources, such as satellite pictures, aerial and terrestrial photos, maps, and cadaster data into the global Nokia, Google, or Bing Maps.

We have therefore proposed the creation of a European Language Cloud (ELC), a public infrastructure which provides the basic functionality required to process unstructured content such as tokenization, lemmatization, part of speech tagging, named entity detection, etc. The idea is to call the ELC, as you would call for example Google Translate, in order to get a piece of text in any language split into sentences, phrases, words, and tokens annotated with metadata such as part of speech, stem form, normalized numbers, dates, measures, links to taxonomies, etc.


The European Language Cloud in One Sentence

For software companies who work on text not numbers the European Language Cloud is a web-based set of APIs that provides the basic functionality to build products for the 24 languages of the Single Digital Markets as well as Europe’s main trading partners in the same base quality under the same favorable terms.


Your Input is Needed

The CEC has launched MLi, a Coordination and Support Action, to design a Multilingual Data &Services Infrastructure. Such infrastructure might have an impact on your business, customers, and competitors. Therefore your input is essential for understanding its necessities, scope, design, and business impact. Please take the time (approx. 10-15min) to give us your input to this crucial initiative for our domain. Thank you so much!


T