Abstract
In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.
Original language | English |
---|---|
Title of host publication | Computational Linguistics and Intelligent Text Processing |
Subtitle of host publication | 17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II |
Editors | Alexander Gelbukh |
Place of Publication | Cham |
Publisher | Springer |
Pages | 385-395 |
ISBN (Electronic) | 978-3-319-75487-1 |
ISBN (Print) | 978-3-319-75486-4 |
DOIs | |
Publication status | Published - 2018 |
Externally published | Yes |
Event | 17th International Conference on Intelligent Text Processing and Computational Linguistics: CICLing 2016 - Konya, Turkey Duration: 2016 Apr 3 → 2016 Apr 9 http://www.cicling.org/2016/ |
Publication series
Name | Lecture Notes in Computer Science (LNCS) |
---|---|
Volume | 9624 |
Conference
Conference | 17th International Conference on Intelligent Text Processing and Computational Linguistics |
---|---|
Country/Territory | Turkey |
City | Konya |
Period | 2016/04/03 → 2016/04/09 |
Internet address |
Subject classification (UKÄ)
- General Language Studies and Linguistics
Free keywords
- Text mining
- Age identification
- Text classification
- Computational Sociolinguistics
- Sociolinguistics