Näytä suppeat kuvailutiedot

OCR : Unleash the hidden information

Jääskeläinen, Anssi; Uosukainen, Liisa (2018)

dc.contributor.authorJääskeläinen, Anssi
dc.contributor.authorUosukainen, Liisa
dc.date.accessioned2018-06-26T10:45:06Z
dc.date.available2018-06-26T10:45:06Z
dc.date.issued2018
dc.identifier.uriURN:NBN:fi:amk-2018062614235
dc.identifier.urihttp://www.theseus.fi/handle/10024/151769
dc.description.abstractMost of us, even though it is not very rational, commonly take pictures of texts. In a conference it is very unlikely not to see participants taking pictures of presentation slides. Similarly, national archives scan documents without doing an OCR (Optical Character Recognition). Resulting image, in spite of its resolution, quality or file format is not searchable by its content. Unless someone types in a large amount of metadata according to Dublin Core for example. While this is an acceptable behavior in an archival world, an average people is willing to fill the maximum of five fields. Therefore a clear need for an easy and most importantly a free way to get pictures, scanned documents etc. to be fully searchable is a mandatory need. A Digitalia research center has been working on to create an effective workflow that automatically analyzes the document content, generates OCR information as well as gets the most relevant keywords for the content. Furthermore, the workflow produces an archival graded PDF/A file if requested by the user. This workflow has been fully integrated into our Citizen Archive solution to handle everything automatically in the background. With this sophisticated solution usability, findability as well as reusability of the preserved content will be greatly increased. In short this equals better archival user experience and less manual work to be done for both the archivist and the end user.en
dc.language.isoeng
dc.publisherSociety for Imaging Science and Technology
dc.relation.ispartofArchiving 2018 Final Program and Proceedings
dc.rightsIS&T: The Society for Imaging Science and Technology
dc.subjecttekstintunnistus
dc.subjecttiedonhaku
dc.subjecthakusanat
dc.subjectmetadata
dc.subjectkansalaiset
dc.subjectsähköinen arkistointi
dc.subjecthenkilöhistoria
dc.titleOCR : Unleash the hidden informationen
dc.typepublication
dc.identifier.dscollection10024/145097
dc.organizationKaakkois-Suomen ammattikorkeakoulu
dc.identifier.doi10.2352/issn.2168-3204.2018.1.0.19
dc.contributor.organizationKaakkois-Suomen ammattikorkeakoulu
dc.type.okmA4
dc.format.pagerange83-87
dc.relation.ispartofjournalArchiving
dc.relation.issn2168-3204
dc.relation.issn2161-8798
dc.okm.selfarchivedfi=Rinnakkaistallennettu|sv=parallellpublicerad|en=self-archived version|


Tiedostot

Thumbnail

Viite kuuluu kokoelmiin:

Näytä suppeat kuvailutiedot