Micro-blog hot topic detection method based on improved K-means
Author:
Affiliation:

1.Digital Service Center,Guangzhou Radio and Television University,Guangzhou Guangdong 510000,China;2.School of Information Science and Technology,Jinan University,Guangzhou Guangdong 510000,China

Funding:

Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.

    Reference
    Related
    Cited by
Get Citation

陈阳键,温秋华.基于改进K‒均值的微博热点话题发现方法[J]. Journal of Terahertz Science and Electronic Information Technology ,2023,21(3):378~383

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
History
  • Received:September 14,2020
  • Revised:February 09,2021
  • Adopted:
  • Online: March 31,2023
  • Published: