The ICT R&D activities in LUCENSE encompass a wide range of topics.

Text Analytics

As digital communication becomes prevalent and document workflows happen more and more frequently in the digital domain, automatic management of documents is increasingly paramount.
LUCENSE avails itself with a wide array of Text Analytics technologies, ranging from text structure analysis (syntax and morphological analysis) to in-depth, content-based document processing.



Syntax Analysis

In order to process textual content automatically, some of its constituents must be identified, such as words, sentences, part-of-speech, morphological variations, phrase structure etc. LUCENSE leverages a full “NLP-pipeline” used, as required, as an enabling technology to allow for the application of higher-level analysis components.




Topic-level categorization

The most basic, but powerful, form of content-based processing is categorization, normally, but not exclusively, of the document topic, along a set of predefined categories. This task allows for effective organization and retrieval of documents, pinpointing relevant content, and is also useful to automatize document flow processes. The categorization software developed by LUCENSE is entirely based on machine learning, and is conceived so as to lower adoption and customization costs, allowing for the adoption in a wide range of application contexts without heavy and costly adaptation phases.



Discovering of emerging themes

As a continuous flow of information is delivered by the various digital communication channels, the ability to identify automatically the most actively discussed topics is of great appeal, for it provides with a timely snapshot of the hottest themes. LUCENSE employs modern topic discovery, clustering and summarisation technologies to distill the hottest topics out of large numbers of contributions (posts, articles, tweets…), assigning speaking lables for easy interpretation. This is useful as a synthesis of the topics discussed as well as an effective “discussion pool” exploration means.




What documents in a collection are related to a given one? While the task is stated simply, it cannot be easily achieved by means of standard keyword-based search. This is why LUCENSE tackles this task by means of “latent semantic indexing”, achieving an effective retrieval system “by affinity” rather than by keyword match. Correlation-based retrieval is a very useful system to retrieve content of interest but of which the user might be unaware.




When several users interact with a system, many information about their behavior can be collected and exploited by means of collective intelligence techniques. This results in a user behavior model as gathered from observing the community. Such model can be used to identify, in relation to the system offering, the items of interest for a user basing on how similar she is to others. This model can also be applied the other way around, obtaining a suggestion of prospect buyers of a specific product within the user community. Apart from the obvious application to personalized ads, this system can also be used to suggest proactively new contents or items, as such rendering an information service to the user.



Graph analysis

Objects and their relation define a graph structure whose flexibility is appropriate to describe very different realms, from computer networks to roads, from concept maps to social networks. This abstraction allows the application of many mathematical models that help identify nodes, edges or subgraphs with characteristics of interest, whose meaning in the actual application scenario often translates naturally. For example, it is possible to determine the connection strength between two nodes, the shortest path, the centrality of a node in the graph, how crucial a node is in the connection between two graph parts, et cetera. This might mean, in the application domain, being able to identify prominent users in a social network, the importance of a road junction, the most critical element in a data network.



Sentiment Analysis

Sentiment Analysis is the ability to catch the mood expressed in a piece of text: either positive, negative, or neutral. Annotating texts with this information provides an extra perspective on collections of documents: for example, by employing this technique together with the emerging themes analysis, the general sentiment towards hot topics can be monitored at a glance. The applications are many, from monitoring a brand, a city or a restaurant, to appraising of the general consensus towards events or people.




All tecnhologies developed by LUCENSE keep extra care to allow for multilingual application: machine learning was used whenever possible, which guarantees reapplicable patterns of usage (provided that models for the target language can be built). An interesting application endeavor of Text Analytics is to monitor discussions occurring in languages that we do not necessarily speak fluently. What is the perception of my brand in China? What is the general response of the Russian population of the Crimean situation? These are immediate examples of application to diverse languages.