Identifying the Authors’ National Variety of English in Social Media Texts

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceeding

Standard

Identifying the Authors’ National Variety of English in Social Media Texts. / Simaki, Vasiliki; Simakis, Panagiotis; Paradis, Carita; Andreas, Kerren.

Recent Advances in Natural Language Processing: Proceedings. ed. / Galia Angelova; Kalina Bontcheva; Ruslan Metkov; Ivelina Nikolova; Irina Temnikova. Varna : Association for Computational Linguistics, 2017. p. 671-678.

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceeding

Harvard

Simaki, V, Simakis, P, Paradis, C & Andreas, K 2017, Identifying the Authors’ National Variety of English in Social Media Texts. in G Angelova, K Bontcheva, R Metkov, I Nikolova & I Temnikova (eds), Recent Advances in Natural Language Processing: Proceedings. Association for Computational Linguistics, Varna, pp. 671-678, The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria , Varna, Bulgaria, 2017/09/02. https://doi.org/10.26615/978-954-452-049-6_086

APA

Simaki, V., Simakis, P., Paradis, C., & Andreas, K. (2017). Identifying the Authors’ National Variety of English in Social Media Texts. In G. Angelova, K. Bontcheva, R. Metkov, I. Nikolova, & I. Temnikova (Eds.), Recent Advances in Natural Language Processing: Proceedings (pp. 671-678). Association for Computational Linguistics. https://doi.org/10.26615/978-954-452-049-6_086

CBE

Simaki V, Simakis P, Paradis C, Andreas K. 2017. Identifying the Authors’ National Variety of English in Social Media Texts. Angelova G, Bontcheva K, Metkov R, Nikolova I, Temnikova I, editors. In Recent Advances in Natural Language Processing: Proceedings. Varna: Association for Computational Linguistics. pp. 671-678. https://doi.org/10.26615/978-954-452-049-6_086

MLA

Simaki, Vasiliki et al. "Identifying the Authors’ National Variety of English in Social Media Texts"., Angelova, Galia, Bontcheva, Kalina Metkov, Ruslan Nikolova, Ivelina Temnikova, Irina (editors). Recent Advances in Natural Language Processing: Proceedings. Varna: Association for Computational Linguistics. 2017, 671-678. https://doi.org/10.26615/978-954-452-049-6_086

Vancouver

Simaki V, Simakis P, Paradis C, Andreas K. Identifying the Authors’ National Variety of English in Social Media Texts. In Angelova G, Bontcheva K, Metkov R, Nikolova I, Temnikova I, editors, Recent Advances in Natural Language Processing: Proceedings. Varna: Association for Computational Linguistics. 2017. p. 671-678 https://doi.org/10.26615/978-954-452-049-6_086

Author

Simaki, Vasiliki ; Simakis, Panagiotis ; Paradis, Carita ; Andreas, Kerren. / Identifying the Authors’ National Variety of English in Social Media Texts. Recent Advances in Natural Language Processing: Proceedings. editor / Galia Angelova ; Kalina Bontcheva ; Ruslan Metkov ; Ivelina Nikolova ; Irina Temnikova. Varna : Association for Computational Linguistics, 2017. pp. 671-678

RIS

TY - GEN

T1 - Identifying the Authors’ National Variety of English in Social Media Texts

AU - Simaki, Vasiliki

AU - Simakis, Panagiotis

AU - Paradis, Carita

AU - Andreas, Kerren

PY - 2017

Y1 - 2017

N2 - In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selectionprocess. The classification accuracy achieved, when the 31 highest rankedfeatures were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

AB - In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selectionprocess. The classification accuracy achieved, when the 31 highest rankedfeatures were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

U2 - 10.26615/978-954-452-049-6_086

DO - 10.26615/978-954-452-049-6_086

M3 - Paper in conference proceeding

SN - 978-954-452-048-9

SP - 671

EP - 678

BT - Recent Advances in Natural Language Processing

A2 - Angelova, Galia

A2 - Bontcheva, Kalina

A2 - Metkov, Ruslan

A2 - Nikolova, Ivelina

A2 - Temnikova, Irina

PB - Association for Computational Linguistics

CY - Varna

T2 - The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-8 September 2017, Varna, Bulgaria

Y2 - 2 September 2017 through 8 September 2017

ER -