Identifying the Authors’ National Variety of English in Social Media Texts

Vasiliki Simaki, Panagiotis Simakis, Carita Paradis, Kerren Andreas

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

Abstract

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection
process. The classification accuracy achieved, when the 31 highest ranked
features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.
Original languageEnglish
Title of host publicationRecent Advances in Natural Language Processing
Subtitle of host publicationProceedings
EditorsGalia Angelova, Kalina Bontcheva, Ruslan Metkov, Ivelina Nikolova, Irina Temnikova
Place of PublicationVarna
PublisherAssociation for Computational Linguistics
Pages671-678
ISBN (Electronic)978-954-452-049-6
ISBN (Print)978-954-452-048-9
DOIs
Publication statusPublished - 2017
EventThe 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria - Varna, Bulgaria
Duration: 2017 Sept 22017 Sept 8
http://lml.bas.bg/ranlp2017/start.php

Conference

ConferenceThe 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria
Abbreviated titleRANLP '17
Country/TerritoryBulgaria
CityVarna
Period2017/09/022017/09/08
Internet address

Subject classification (UKÄ)

  • General Language Studies and Linguistics

Fingerprint

Dive into the research topics of 'Identifying the Authors’ National Variety of English in Social Media Texts'. Together they form a unique fingerprint.

Cite this