By Junjie Wu
Nearly we all know K-means set of rules within the fields of information mining and enterprise intelligence. however the ever-emerging info with tremendous advanced features deliver new demanding situations to this "old" set of rules. This booklet addresses those demanding situations and makes novel contributions in setting up theoretical frameworks for K-means distances and K-means established consensus clustering, picking out the "dangerous" uniform impression and zero-value hassle of K-means, adapting correct measures for cluster validity, and integrating K-means with SVMs for infrequent classification research. This ebook not just enriches the clustering and optimization theories, but additionally offers reliable counsel for the sensible use of K-means, specially for vital initiatives equivalent to community intrusion detection and credits fraud prediction. The thesis on which this booklet is predicated has received the "2010 nationwide very good Doctoral Dissertation Award", the top honor for no more than a hundred PhD theses consistent with yr in China.
Read or Download Advances in K-means Clustering: a Data Mining Thinking PDF
Best data mining books
Substantial facts Imperatives, specializes in resolving the major questions about everyone’s brain: Which facts issues? Do you might have adequate info quantity to justify the utilization? the way you are looking to method this volume of knowledge? How lengthy do you really want to maintain it lively to your research, advertising and marketing, and BI purposes?
Biometric process and information research: layout, review, and information Mining brings jointly elements of statistics and desktop studying to supply a entire advisor to judge, interpret and comprehend biometric information. This specialist e-book obviously results in issues together with information mining and prediction, commonly utilized to different fields yet now not conscientiously to biometrics.
Information, facts Mining, and laptop studying in Astronomy: a pragmatic Python consultant for the research of Survey info (Princeton sequence in glossy Observational Astronomy)As telescopes, detectors, and pcs develop ever extra robust, the amount of knowledge on the disposal of astronomers and astrophysicists will input the petabyte area, offering actual measurements for billions of celestial items.
The contributed quantity goals to explicate and deal with the problems and demanding situations for the seamless integration of 2 center disciplines of desktop technological know-how, i. e. , computational intelligence and information mining. information Mining goals on the computerized discovery of underlying non-trivial wisdom from datasets via making use of clever research suggestions.
Additional info for Advances in K-means Clustering: a Data Mining Thinking
Probability and Statistics, 3rd edn. Addison Wesley, Upper Saddle River (2001) 6. : Applied numerical linear algebra. Soc. Ind. App. Math. 32, 206–216 (1997) 7. : A new shared nearest neighbor clustering algorithm and its applications. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the 2nd SIAM International Conference on Data Mining (2002) References 35 8. : A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.
The right hand side of Eq. 6) is also equal to d(C1 , C1 ), as there is no cross-cluster item. 1 holds. When k = 2, by Eq. 2), to prove Eq. 6) is equivalent to prove the following equation: 2d(C1 , C2 ) = n2 n1 d(C1 , C1 ) + d(C2 , C2 ) + 2n 1 n 2 m 1 − m 2 n1 n2 If we substitute m 1 = n1 i=1 xi n1 , m2 = n2 i=1 yi n2 , and 2 . 2 The Uniform Effect of K-means Clustering 21 n1 d(C1 , C1 ) = 2 xi − x j 2 = 2(n 1 − 1) 1≤i< j≤n 1 xi 2 −4 i=1 n2 d(C2 , C2 ) = 2 yi − y j 2 = 2(n 2 − 1) 1≤i< j≤n 2 yi 2 −4 i=1 xi − y j 2 = 2n 2 1≤i≤n 1 1≤ j≤n 2 2 xi i=1 −4 yi y j , 1≤i< j≤n 2 n2 n1 d(C1 , C2 ) = xi x j , 1≤i< j≤n 1 + 2n 1 2 yi i=1 xi y j 1≤i≤n 1 1≤ j≤n 2 into Eq.
6) is equivalent to prove the following equation: 2d(C1 , C2 ) = n2 n1 d(C1 , C1 ) + d(C2 , C2 ) + 2n 1 n 2 m 1 − m 2 n1 n2 If we substitute m 1 = n1 i=1 xi n1 , m2 = n2 i=1 yi n2 , and 2 . 2 The Uniform Effect of K-means Clustering 21 n1 d(C1 , C1 ) = 2 xi − x j 2 = 2(n 1 − 1) 1≤i< j≤n 1 xi 2 −4 i=1 n2 d(C2 , C2 ) = 2 yi − y j 2 = 2(n 2 − 1) 1≤i< j≤n 2 yi 2 −4 i=1 xi − y j 2 = 2n 2 1≤i≤n 1 1≤ j≤n 2 2 xi i=1 −4 yi y j , 1≤i< j≤n 2 n2 n1 d(C1 , C2 ) = xi x j , 1≤i< j≤n 1 + 2n 1 2 yi i=1 xi y j 1≤i≤n 1 1≤ j≤n 2 into Eq.