Age Identification of Twitter Users: Classification Methods and Sociolinguistic Analysis

Vasiliki Simaki, Iosif Mporas, Vasileios Megalooikonomou

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

Abstract

In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.
Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing
Subtitle of host publication17th International Conference, CICLing 2016, Konya, Turkey, April 3–9, 2016, Revised Selected Papers, Part II
EditorsAlexander Gelbukh
Place of PublicationCham
PublisherSpringer
Pages385-395
ISBN (Electronic)978-3-319-75487-1
ISBN (Print)978-3-319-75486-4
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event17th International Conference on Intelligent Text Processing and Computational Linguistics: CICLing 2016 - Konya, Turkey
Duration: 2016 Apr 32016 Apr 9
http://www.cicling.org/2016/

Publication series

NameLecture Notes in Computer Science (LNCS)
Volume9624

Conference

Conference17th International Conference on Intelligent Text Processing and Computational Linguistics
Country/TerritoryTurkey
CityKonya
Period2016/04/032016/04/09
Internet address

Subject classification (UKÄ)

  • General Language Studies and Linguistics

Free keywords

  • Text mining
  • Age identification
  • Text classification
  • Computational Sociolinguistics
  • Sociolinguistics

Fingerprint

Dive into the research topics of 'Age Identification of Twitter Users: Classification Methods and Sociolinguistic Analysis'. Together they form a unique fingerprint.

Cite this