Analysis and extraction of information from scanned documents

În ciuda numeroaselor documente programatice emise de autorităţile guvernamentale sau de asociaţiile profesionale de profil, încă nu există o strategie coerentă de digitizare a fondului de documente aflat în patrimoniul bibliotecilor. Subproiectul eLibrary Builder îşi propune să transpună electronic un fond documentar de aproximativ 4 milioane de pagini cu păstrarea aspectului original. De asemenea, fondului documentar digitizat i se vor adăuga posibilităţi de indexare şi căutare inteligentă. În acest fel, documentele originale valoroase nu vor mai fi deteriorate şi vor putea fi disponibile imediat unui număr nelimitat de utilizatori. Proiectul vizează umătoarele obiective prioritare

  • Creation of a unique digital depot shared by the four Central University Libraries, which will become a genuine National Digital Educational Library;
  • Development of a document quality optimization system especially for those documents with certain spelling particularities;
  • Construction of certain efficient algorithms to recognize the characteristics of the pages;
  • The establishment of good practice norms in the digitalization field that will reunite the technical protocols regarding the document format and selection criteria;

The innovation of the project consists of the four points mentioned above and the scanning technology used in this project. A completely automated system will be used, with a scanning capacity of over 2000 pages/hour, which will be purchased by the consortium leader and will comply with the processing requirements for old and newer documents in different formats. This system will be provided with the latest IT applications in order to recognize texts difficult to be searched.

The types of documents to initially populate the digital depot will be selected from the following categories: manuscripts, archived documents, multimedia document texts, serials and books from the following categories: 1. General information: Information Science. Bibliology. Library Science. Standardization. Civilization and Culture. Reference works: encyclopedias; dictionaries, biographies, bibliographies, biobibliographies; bibliographic researches; 2. Public Administration: Social Assistance, Military Sciences; 3. Theology; 4. Art; 5. Legal Sciences; 6. Economic Sciences; 7. History: Archaeology, Archival Records; 8. Philosophy; Psychology; 9. Politics; 10. Literature; 11. Linguistics, Philology; 12. Sociology: Demographics, Statistics; 13. Ethnography: Folklore; 14. Pedagogy; 15. Natural Sciences: Geology, Geography, Biology; 16. Exact Sciences: Mathematics, Physics, Chemistry; 17. Applied Sciences: Technical Sciences, Engineering, Agronomy, Medicine, Pharmacology.