Linking, Searching, and Visualizing Entities in Wikipedia

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

Abstract

In this paper, we describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, we designed a high-performance, multilingual, entity linker and we used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a specific annotation, here the entities. We evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and we reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. We applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identifiers from Wikidata
Original languageEnglish
Title of host publicationProceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Pages3426-3432
ISBN (Electronic)979-10-95546-00-9
Publication statusPublished - 2018 May
EventLanguage Resources and Evaluation Conference (LREC) - Miyazaki , Japan
Duration: 2018 May 72018 May 12

Conference

ConferenceLanguage Resources and Evaluation Conference (LREC)
Country/TerritoryJapan
CityMiyazaki
Period2018/05/072018/05/12

Subject classification (UKÄ)

  • Computer Science

Fingerprint

Dive into the research topics of 'Linking, Searching, and Visualizing Entities in Wikipedia'. Together they form a unique fingerprint.

Cite this