|本期目录/Table of Contents|

[1]王军,刘三民*,刘涛. 具有噪声的动态数据流集成分类方法[J].内江师范学院学报(自然科学),2017,08:51-55.
 WANG Jun,LIU Sanmin,LIU Tao.Integrated Classification Method for Dynamic Data Flows with Noise[J].,2017,08:51-55.
点击复制

 具有噪声的动态数据流集成分类方法(PDF)

《内江师范学院学报(自然科学)》[ISSN:1671-1785/CN:51-1521/Z]

期数:
2017年第08期
页码:
51-55
栏目:
出版日期:
2017-09-05

文章信息/Info

Title:
Integrated Classification Method for Dynamic Data Flows with Noise
文章编号:
1671-1785(2017)08-0051-05
作者:
 王军 刘三民* 刘涛
 安徽工程大学计算机与信息学院, 安徽 芜湖 241000
Author(s):
WANG JunLIU SanminLIU Tao
College of Computer and Information, Anhui Polytechnic University, Wuhu, Anhui241000,China
关键词:
 噪声动态数据流最新基分类器集成分类
Keywords:
data stream classification concept drift integrated learning similarity weighted majority voting differences
分类号:
TP391
DOI:
10. 13603/j. cnki. 51-1621/z. 2017. 08. 012
文献标识码:
A
摘要:
 提出一种基于分类器相似性加权和差异性集成的数据流分类方法. 用最新基分类器作为参照分类器,代表数据流中即将出现的概念,基于此分类器通过 Gower相似系数求出基分类器之间的相似性,并以相似性作为基分类器权值进行加权多数投票;同时采用 Q-statistic方法计算出参照分类器与其他基分类器之间的差异性,并根据差异性大小淘汰较弱基分类器保持集成分类模型多样性. 最终构建的集成模型在标准仿真数据集上进行实验仿真.结果表明:在对隐含噪声的动态数据流进行分类时,该方法分类准确率比传统集成分类方法约提高 11 % ,具有良好的分类准确率和抗噪稳定性.
Abstract:
A new method of data stream classification based on similarity weighting and differential integration of classifiers is proposed. The method uses the latest base classifier as the reference classifier, representing the upcoming concept in the data stream. Based on this classifier, the similarity between the base classifiers is worked out by use of the Gower’s similarity coefficient, and the similarity is used as the base classifier weights to conduct weighted majority vote. At the same time, Q- statistic method is adopted to calculate the difference between referenced classifiers and other base classifiers, and according to the size of the difference, the relatively weak base classifiers were eliminated so that the diversity of the integrated classification model can be kept. Lastly, simulation experiment is carried out on standard simulation dataset, and the results show that the
classification accuracy of the presented method is about11% higher than that of the traditional integrated classification method when used to classify dynamic data flow with noise, indicating the method is of good classification accuracy and anti-noise sta- bility.

参考文献/References

[1]李学龙,龚海刚.大数据系统综述[J].中国科学:信息科学,2015 ,45(1) : 1-44.
[2]牟廉明.数据挖掘中聚类方法比较研究[J]. 内江师范学院学报, 2003, 18(2):16-20.
[3] Dietterich T G. Machine Learning Research: Four Current Directions[J]. AI Magazine, 2000, 18(4) :97-136. [4] Street W N, Kim Y S. A streaming ensemble algorithm (SEA) for large-scale classification[C] // ACM SIGK-DD International Conference on Knowledge Discovery& Data Mining. New York:Association for Computing Ma- chinery,2001:377-382.
[5] Wang H, Fan W, Yu P S, et al. Mining concept-drif- ting data streams using ensemble classifiers[C] // ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:Association for Computing Machinery, 2003:226-235.
[6] Elwell R,Polikar R. Incremental learning of concept drift in non-stationary environments[C]//IEEE Transactions
on Neural Networks,New York: Institute of Electrical and Electronics Engineers,2011,22(10) :1517-1531.
[7] Gogte P S,Theng D P. Hybrid Ensemble Classifier for Stream Data[C] //Fourth International Conference on Communication Systems and Network Technologies. Washington:IEEE Computer Society, 2014:463-467.
[8]王中心,孙刚,王浩. 面向噪音和概念漂移数据流的集成分类算法[J].小型微型计算机系统, 2016, 37(7): 1445-1449.
[9]毛莎莎,熊霖,焦李成,等.利用旋转森林变换的异构多分类器集成算法[J].西安电子科技大学学报,2014 (5) :48-53.
[10] Yang C,Yin X C,Hao H W. Diversity-Based Ensemble with Sample Weight Learning[C] // International Conference on Pattern Recognition. New York:Institute of Electrical and E- lectronics Engineers, 2014: 1236-1241.
[11]邹权,宋莉,陈文强,等.基于集成学习和分层结构的多分类算法[J].模式识别与人工智能, 2015, 28(9):781-787.
[12] Jeh G, Widom J. SimRank: a measure of structuralcontext similarity[C] // Eighth ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining. New York:Association for Computing Ma- chinery, 2002:538-543.
[13]刘余霞,吕虹,刘三民.一种基于分类器相似性集成的数据流分类研究[J].计算机科学,2012,39(12):208-210.
[14] Krzanowski W J. Principle of Multivariate Analysis: A user’s perspective[M]. Oxford: Oxford Science Publi- cations, 1993:96-97.
[15]杨春,殷绪成,郝红卫,等.基于差异性的分类器集成:有效性分析及优化集成[J]. 自动化学报, 2014, 40(4) : 660-674. [16] Hulten G. Spencer L,Doming P. Mining Time-changing Data Streams[C] //ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:Associa- tion for Computing Machinery,2001:97-106.
[17] Kirkby R B. Improving Hoeffding Trees[D]. New Zealand:University of Waikato, 2007.
[18]王欣. MATLAB在图像处理中的应用[J]. 内江师范 学院学报, 2007, 22(2):55-56.

备注/Memo

备注/Memo:
 收稿日期:2017-03-18
基金项目:国家自然科学基金项目(61300170) ,安徽省高校省级优秀人才重点项目(2013SQRL034ZD) ,安徽省自然科学基金项目(1608085MF147)
作者简介:王军( 1992- ) ,男,安徽亳州人,安徽工程大学硕士研究生.研究方向:机器学习、数据挖掘
* 通信作者:刘三民(1978-) ,男,安徽安庆人,安徽工程大学副教授,博士. 研究方向:模式识别、机器学习、数据挖掘等
更新日期/Last Update: 2017-09-06