Conference Publication Details
Mandatory Fields
Mattia Egloff; Alessandro Adamou; and Davide Picca
Third Workshop on Humanities in the Semantic Web (WHiSe 2020)
Enabling Ontology-Based Data Access to Project Gutenberg
Optional Fields
Digital Libraries Ontology-based data access Project Gutenberg Linked Data
Alessandro Adamou; Enrico Daga; and Albert Meroņo-Peņuela
Free and open digital libraries have been gaining steady mo-mentum as key resources to support practice in Digital Humanities.Project Gutenberg is one of the oldest repositories of such a kind. TheDHTK Python library is able to retrieve content from Gutenberg throughquerying the RDF metadata that Gutenberg itself publishes regularly,however this process is hampered by said metadata constituting a datasetthat lacks a documented ontology, is largely unlinked and significantlybloated with redundant RDF triples. In this paper we detail the processesthat were put in place with the aim of improving ontology-based data ac-cess to Gutenberg via DHTK, including (a) bottom-up extraction of theGutenberg Ontology; (b) cleanup, linking and shrinking of the Guten-berg metadata set; (c) refactoring and alignment of said ontology withcommon vocabularies and (d) incorporation of the enhancements into theDHTK access routines. Early results show that we were able to reducethe size of the Gutenberg metadata set by nearly 29% whilst linking itwith Library of Congress datasets, DBpedia and others.
Grant Details
Publication Themes
Humanities in Context, Informatics, Physical and Computational Sciences