The challenges that the modern library services are facing are double nowadays. On one hand, from the user’s perspective a quick access to the relevant information must be offered. On the other hand, in the context of the exponential growth of collection of documents, the libraries find themselves in a position to permanently optimize their document management procedures. This subproject provides both technical solutions for fast information finding and solutions to consolidate the management procedures. The high innovation degree is given by the usage of the most advanced language processing technologies, but also by the newest types of services provided. Thus, a new single structure in Romania will bring together technological information structuring services and public services designed for various communities of users.
Technological services address to the library staff and offer adequate tools for document classification, cataloguing and conservation. Our project will focus on using such tools for the digital document depot of the four Central University Libraries created under project no. 2.
The public services address to the users and involve the following aspects: search assistance (full text search), new reading recommendation based on research-information needs.
The objectives of the Smart Search project are:
- Document systematization - various algorithms will be introduced for automatic categorization and clustering of documents into similar semantic groups; furthermore, a classification model based on The Digital Library Reference Model and Dublin Core Metadata Initiative (DCMI) will be added in order to automatically label the resources per predefined categories;
- Creation of a semantic depot for the domain onthologies made as part of the project;
- Search of relevant documents and exploration of intertextuality links between various collections of documents, starting from semantic models of representation of knowledge (e.g. latent semantic analysis, latent Dirichlet allocation, word2vec);
- Resources recommendations by using ontology-based algorithms or social recommendations. Thus, onthologies will be created for the 17 domains which will be used for automatic semantic annotation of the texts. Onthologies will be created either manually (by using Protege) or semi-automatically (by using Text2Onto or unsupervised trained semantic models from vast collections of texts).