5. 什么是一个好的聚类方法?一个好的聚类方法要能产生高质量的聚类结果——簇,这些簇要具备以下两个特点:
高的簇内相似性
低的簇间相似性
聚类结果的好坏取决于该聚类方法采用的相似性评估方法以及该方法的具体实现;
聚类方法的好坏还取决与该方法是能发现某些还是所有的隐含模式;2018/10/245Data Mining: Concepts and Techniques
6. Requirements of Clustering in Data Mining 可伸缩性
能够处理不同类型的属性
能发现任意形状的簇
在决定输入参数的时候,尽量不需要特定的领域知识;
能够处理噪声和异常
对输入数据对象的顺序不敏感
能处理高维数据
能产生一个好的、能满足用户指定约束的聚类结果
结果是可解释的、可理解的和可用的2018/10/246Data Mining: Concepts and Techniques
10. 聚类分析中的数据类型区间标度变量(Interval-scaled variables):
二元变量(Binary variables):
标称型,序数型和比例型变量(Nominal, ordinal, and ratio variables):
混合类型变量(Variables of mixed types):2018/10/2410Data Mining: Concepts and Techniques
25. Major Clustering ApproachesPartitioning algorithms: Construct various partitions and then evaluate them by some criterion
Hierarchy algorithms: Create a hierarchical decomposition of the set of data (or objects) using some criterion
Density-based: based on connectivity and density functions
Grid-based: based on a multiple-level granularity structure
Model-based: A model is hypothesized for each of the clusters and the idea is to find the best fit of that model to each other2018/10/2425Data Mining: Concepts and Techniques
26. http://www.cs.sfu.ca/~hanThank you !!!2018/10/2426Data Mining: Concepts and Techniques