Click Here for
Track Your Paper

International Journal of New Technology and Research

Impact Factor 3.953

(An ISO 9001:2008 Certified Online Journal)
India | Germany | France | Japan

Automatic Extraction of Topics from Documents: Five Probabilistic Topic Model Tests

( Volume 2 Issue 11,November 2016 ) OPEN ACCESS

Sandra Jhean-Larose, Nicolas Leveau, Guy Denhiere , Ba-Linh Nguyen


In this paper, we test the capability of the Topic model to extract topics from documents (Griffiths &Steyvers, 2003, 2004; Griffiths, Steyvers&Tenenbaum, 2007). After presenting the mathematical aspects of the model and demonstrating its behavior on a small corpus, we attempt to falsify the model by manipulating (i) the size and similarities between the sub-corpora, (ii) the relative weight of sub-corpora,and (iii) the permeability to the scope and nature of contexts added to a fixed corpus. The model successfully passed our five tests, demonstrating that first, extracted topics were relevant and congruent to the content of the corpus, and second, that their probability appropriately reflected the relative weight of sub-corpora.



Paper Statistics:

Total View : 486 | Downloads : 477 | Page No: 62-75 |

Cite this Article:
Click here to get all Styles of Citation using DOI of the article.