Natural language solution for building the knowledge base of publication collections

Hornyák Zsuzsanna Éva <>
Budapesti Műszaki és Gazdaságtudományi Egyetem

Mészáros Tamás <>
Budapesti Műszaki és Gazdaságtudományi Egyetem

Keyword based searches among the digitally available publications give shallow results, which might not always be relevant to the search intentions. To make intelligent, content based searches possible, the documents should include detailed semantic information processable by machines. Although semantic web technologies provide various solutions to this problem, they have not been utilizied by most digital libraries yet.

Our current research focuses on developing a software that helps ordinary users create semantic representations for publications, without the need to learn the underlying technologies. The software enables users to create natural language statements to summarize a document's content, creating a so-called semantic abstract in the process.

On first look this semantic abstract looks the same as the regular abstracts we're used to – however their content is restricted by a controlled grammar. Based on this grammar, the natural language abstract can be parsed into a formal logical representation of the content. These grammars include rules about the allowed sentence structures and lexical forms, and also allow dynamic runtime extension. The grammars can be linked with ontologies describing the taxonomy of the specific study field, and thus helping the parsing process identify unique entities and link them to other available data. To help the users easily create the controlled natural sentences, our software provides a predictive text editor support.

Our goal is to provide a tool for researches and editors to easily summarize the most important information contained in publications and thus creating a semantic knowledge base which can be understood by machines. This knowledge base enables more complex, logical statement based search among the publications.

We integrated our solution into the popular publication organizer software called Zotero. In our presentation, we will showcase the working process of semantic abstract creation through examples.