基于改进K‒均值的微博热点话题发现方法
作者:
作者单位:

1.广州开放大学(广州市广播电视大学) 数字化服务中心,广东 广州 510000;2.暨南大学 信息科学技术学院,广东 广州 510000

作者简介:

陈阳键(1980-),男,硕士,实验师,主要从事教育信息技术及设备等工作.email:Chenyangjian688@163.com.
温秋华(1980-),男,在读博士研究生,高级实验师,主要从事侨务信息化、财务报告自动化处理等工作.

通讯作者:

基金项目:

广东省广州市高校第九批教育教学改革基金资助项目(2017F10)

伦理声明:



Micro-blog hot topic detection method based on improved K-means
Author:
Ethical statement:

Affiliation:

1.Digital Service Center,Guangzhou Radio and Television University,Guangzhou Guangdong 510000,China;2.School of Information Science and Technology,Jinan University,Guangzhou Guangdong 510000,China

Funding:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    微博文本数据高维度、同义、多义特征明显,传统基于向量空间模型(VSM)联合K-均值的热点话题发现方法存在准确率低,计算复杂,聚类中心难以确定等问题。提出一种相关向量机(RVM)优化VSM的微博文本向量化方法,首先利用RVM的自适应特征选择能力对VSM特征向量进行降维,然后利用主成分分析(PCA)方法确定K-均值算法的初始聚类中心,进而采用K-均值算法得到聚类结果,最后根据微博转发、评论和高影响力用户数量定义热度指数,热度指数最大的话题即为当前热点话题。采用实际微博文本数据集开展实验,结果表明所提方法相对于2种传统方法的准确率分别提升7.3%和1.1%,实时性分别提升45%和53%。

    Abstract:

    Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.

    参考文献
    相似文献
    引证文献
引用本文

陈阳键,温秋华.基于改进K‒均值的微博热点话题发现方法[J].太赫兹科学与电子信息学报,2023,21(3):378~383

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
历史
  • 收稿日期:2020-09-14
  • 最后修改日期:2021-02-09
  • 录用日期:
  • 在线发布日期: 2023-03-31
  • 出版日期: