一种基于Hadoop平台CloudSVM的网络流量分类方法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

湖南省教育厅科研资助项目(15C0081);湖南省教育厅科研资助项目(14C0064);湖南省教育厅科研资助项目(19C0103)

伦理声明:



A network traffic classification method based on Hadoop platform CloudSVM
Author:
Ethical statement:

Affiliation:

Funding:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    大规模的netflow训练数据集是构建高质量、高稳定网络流量分类器的必然要求。但随着网络流特征维数的提高和数据集规模的扩大,无论是网络流的分析处理还是基于支持向量机(SVM)的分类器模型的训练,都无法在有效的时间内得到有效的处理结果。本文基于Hadoop云计算平台,采用MapReduce技术对SVM网络流量分类器进行分布式学习和训练,构建CloudSVM网络流量分类器。通过对来自校园网出口镜像的近2 T的大规模网络流量的跟踪文件的分布式存储和处理,对抽取的样本数据集进行分类,实验验证了基于Hadoop平台分布式存储和并行处理大规模网络数据集的高效率性,也验证了CloudSVM分类器在不降低分类准确度的情况下可以快速收敛到最佳,并随着大规模网络流样本的增加,SVM分类器训练的时间趋近平稳。

    Abstract:

    Large-scale net flow training data sets are inevitable requirements for building high-quality, highly stable network traffic classifiers. However, with the increase of the feature dimension of the network stream and the expansion of the data set size, neither the analysis processing of the network stream nor the training of the classifier model based on Support Vector Machin(SVM) can obtain effective processing results in effective time. A distributed and parallel large-scale network flow based on Hadoop cloud computing platform is proposed. Distributed learning and training of SVM network traffic classifier is implemented by MapReduce technology on Hadoop cloud computing platform, and CloudSVM network traffic classifier is constructed. Through the distributed storage and processing of trace files of large-scale network traffic from the campus network export mirror, the sample data sets are classified, and the distributed storage and parallel processing of large-scale network data based on Hadoop platform is experimentally verified. The high efficiency of the set also verifies that the CloudSVM classifier can quickly converge to the best without reducing the accuracy of the classification, and with the increase of large-scale network flow samples, the training time of the SVM classifier is approaching constant.

    参考文献
    相似文献
    引证文献
引用本文

邓 河,唐一韬,贺宗梅,袁爱平.一种基于Hadoop平台CloudSVM的网络流量分类方法[J].太赫兹科学与电子信息学报,2020,18(5):918~923

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
历史
  • 收稿日期:2019-10-20
  • 最后修改日期:2019-11-29
  • 录用日期:
  • 在线发布日期: 2020-11-02
  • 出版日期:
关闭