The challenges facing modern library services are twofold. On the one hand, from the user's perspective, quick access to information relevant to the user must be ensured. On the other hand, in the context of the exponential growth of the documentary fund, libraries are forced to permanently optimize their document management procedures. This subproject provides both technical solutions for fast information retrieval and solutions for consolidating management procedures. The high degree of innovation is given by the use of the most advanced language processing technologies, but also by the novelty of the types of services offered, integrating in a single structure, currently non-existent in Romania, both technological information structuring services and public services for different user communities.
Technological services address to the library staff and offer adequate tools for document classification, cataloguing and conservation. Our project will focus on using such tools for the digital document depot of the four Central University Libraries created under project no. 2.
The public services address to the users and involve the following aspects: search assistance (full text search), new reading recommendation based on research-information needs.
The objectives of the Smart Search project are:
- Document systematization - various algorithms will be introduced for automatic categorization and clustering of documents into similar semantic groups; furthermore, a classification model based on The Digital Library Reference Model and Dublin Core Metadata Initiative (DCMI) will be added in order to automatically label the resources per predefined categories;
- Creation of a semantic depot for the domain onthologies made as part of the project;
- Search of relevant documents and exploration of intertextuality links between various collections of documents, starting from semantic models of representation of knowledge (e.g. latent semantic analysis, latent Dirichlet allocation, word2vec);
- Resources recommendations by using ontology-based algorithms or social recommendations. Thus, onthologies will be created for the 17 domains which will be used for automatic semantic annotation of the texts. Onthologies will be created either manually (by using Protege) or semi-automatically (by using Text2Onto or unsupervised trained semantic models from vast collections of texts).