African Evaluation Database

Back

3365

Source

Scopus

Electronic ID

2-s2.0-84942088631

Authors

Omar M., On B.-W., Lee I., Choi G.S.

Title

LDA topics: Representation and evaluation

Publication Year

2015

Source Title

Journal of Information Science

Volume

Issue

Citations

None

DOI

10.1177/0165551515587839

URL

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84942088631&partnerID=40&md5=4be79d6876a28abf5641ee561d0173d8

Affiliations

Department of Information and Communication Engineering, Yeungnam University, 214-1, Dae-dong, Gyeongsan, Gyeongsangbuk, South Africa; Kunsan National University, South Korea; Troy University, United States

Authors with Affiliations

Omar, M., Department of Information and Communication Engineering, Yeungnam University, 214-1, Dae-dong, Gyeongsan, Gyeongsangbuk, South Africa; On, B.-W., Kunsan National University, South Korea; Lee, I., Troy University, United States; Choi, G.S., Department of Information and Communication Engineering, Yeungnam University, 214-1, Dae-dong, Gyeongsan, Gyeongsangbuk, South Africa

Abstract

In recent years many automated topic coherence formulas (using the top-m words of a topic inferred by latent Dirichlet allocation) based on word similarities have been proposed and evaluated against human ratings. We treat a wordy topic as an object and quantitatively describe it via normalized mean values of pair-wise word similarities. Two types of word similarities, thesaurus and local corpus-based, are used as the descriptive features of a topic. We perform topic classification using represented topics as input and bi-level human ratings about topic coherence as class labels. Classification results (precision, recall and accuracy) based on two datasets and three supervised classification algorithms suggest that the novel topic representation is consistent with human ratings. Corpus-based word similarities are positively correlated with human ratings whereas thesaurus-based similarities have negative relations. The proposed representation of topics opens a window for us to investigate the utilization of topics with different perspectives. © Chartered Institute of Library and Information Professionals.

Author Keywords

Latent Dirichlet allocation (LDA); supervised classification; topic coherence; topic evaluation; topic model; topic representation

Index Keywords

Statistics; Thesauri; Latent dirichlet allocations; Supervised classification; topic evaluation; Topic Modeling; topic representation; Classification (of information)

Funding Details

None