A System for Popular Thai Slang Extraction from Social Media Content with N-Gram Based Tokenization (KST 2016)

Published date
2016-02
Resource type
Publisher
ISBN
ISSN
DOI
Call no.
Other identifier(s)
Edition
Copyrighted date
Language
eng
File type
application/pdf
Extent
5 pages
Other title(s)
Advisor
Other Contributor(s)
Assumption University. Martin de Tours School of Management and Economics
Citation
Proceedings of the 8th International Conference on Knowledge and Smart Technology (KST 2016) – IEEE XPlore, pp. 130-135
Degree name
Degree level
Degree discipline
Degree department
Degree grantor
Abstract
With increased penetration of smart devices and internet connectivity, many Thais are more readily engaged in social media, online forums, and chat groups. As there is an increased consumption of social media content, there is a shift from the consumption of traditional medias in which formal language are used regularly such as broadcast and traditional print medias. Social media posts are a reflection of the trend, where posts usually made by younger generations usually involve communication in slang and non-formal language which is not typically available in formalized dictionaries. As the Thai population like to follow trends, one of behaviors of that many Thai social media users engage in, is to follow the latest popular social media trends in slang and word usage. As slang are changed and evolved over time, it is usually useful to have an online mining tool in which could capture the trends of emerging and popular slang. This paper proposes an approach that extracts popular Thai slang by comparing social media posts and utilizing tokenization, a dictionary based approach to extract unknown words, before expanding it by using n-gram approach to figure what are currently trending and popular slang words.
Table of contents
Description
punsarn.dc.description.sponsorship
Spatial Coverage
Subject(s)
Rights
Access rights
Rights holder(s)
Location
View External Resources