Linking, Searching, and Visualizing Entities for the Swedish Wikipedia

Anton Södergren, Marcus Klang, Pierre Nugues

Forskningsoutput: KonferensbidragKonferenspaper, ej i proceeding/ej förlagsutgivetPeer review

Sammanfattning

In this paper, we describe a new system to extract, index, search, and visualize entities on Wikipedia. To carry out the extraction, we designed a high-performance entity linker and we used a document model to store the resulting linguistic annotations. The entity linker ,HERD, extracts the mentions from text using a string matching Engine and links the mto entities with a combination of rules, PageRank, and feature vectors based on the Wikipedia categories. The document model, Docforia, consists of layers, where each layer is a sequence of ranges describing a specific annotation,here thee ntities. We evaluated HERD with the ERD’14 protocol (Carmel et al., 2014) and we reached the competitive F1-score of 0.746 on the English development set. We applied HERD to the whole collection of Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve articles and metadata given a title, a phrase, or a property. The user can then select an entity and visualize concordance in articles or paragraphs. A demonstration of the entity search and visualization is available for Swedish at this address: http://vilde.cs.lth.se:9001/sv-herd/.
Originalspråkengelska
StatusPublished - 2016
Evenemang Sixth Swedish Language Technology Conference (SLTC 2016) - Umeå University, Umeå, Sverige
Varaktighet: 2016 nov. 172016 nov. 18

Konferens

Konferens Sixth Swedish Language Technology Conference (SLTC 2016)
Land/TerritoriumSverige
OrtUmeå
Period2016/11/172016/11/18

Ämnesklassifikation (UKÄ)

  • Datavetenskap (datalogi)

Fingeravtryck

Utforska forskningsämnen för ”Linking, Searching, and Visualizing Entities for the Swedish Wikipedia”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här