What is clustering?
Clustering
is the process of grouping the data into classes or clusters so that objects
within a cluster have high similarity in comparison to one another, but are
very dissimilar to objects in other clusters.
What do you mean by Cluster
Analysis?
A cluster
analysis is the process of analyzing the various clusters to organize the
different objects into meaningful and descriptive objects.
What are
the requirements of clustering?
Scalability
Ability to deal with different types of attributes
Ability to deal with noisy data
Minimal requirements for domain knowledge to determine input parameters
Constraint based clustering
Interpretability and usability
State the
categories of clustering methods?
Partitioning methods
Hierarchical methods
Density based methods
Grid based methods
Model based methods
5.What are the requirements of
cluster analysis?
The basic requirements of cluster
analysis are
Dealing with different types of attributes.
Dealing with noisy data.
Constraints on clustering.
Dealing with arbitrary shapes.
High dimensionality
Ordering of input data
Interpretability and usability
Determining input parameter and
Scalability
6.What
are the different types of data used for cluster analysis?
The
different types of data used for cluster analysis are interval scaled, binary,
ominal, ordinal and ratio scaled data.
7. What are interval scaled
variables?
Interval
scaled variables are continuous measurements of linear scale. For example, height
and weight, weather temperature or coordinates for any cluster. These
measurements can be calculated using Euclidean distance or Minkowski distance.
8. Define
Binary variables? And what are the two types of binary variables?
Binary variables are understood by two states 0 and
1, when state is 0, variable is absent and when state is 1, variable is
present.
There are two types of binary variables, symmetric
and asymmetric binary variables. Symmetric variables are those variables that
have same state values and weights. Asymmetric variables are those variables
that have not same state values and weights.
9. Define
nominal, ordinal and ratio scaled variables?
A nominal variable is a generalization of the
binary variable. Nominal variable has more than two states, For example, a
nominal variable, color consists of four states, red, green, yellow, or black.
In Nominal variables the total number of states is Nand it is denoted by
letters, symbols or integers.
An ordinal variable also has more than two states
but all these states are ordered in a meaningful sequence. A ratio scaled
variable makes positive measurements on a non-linear scale, such as exponential
scale,
10. What do u mean by
partitioning method?
In
partitioning method a partitioning algorithm arranges all the objects into
various partitions, where the total number of partitions is less than the total
number of objects. Here each partition represents a cluster. The two types of
partitioning method are k-means and k-medoids.
Define CLARA and CLARANS?
Clustering in LARge Applications is called as
CLARA. The efficiency of CLARA depends upon the size of the representative data
set. CLARA does not work properly if any representative data set from the
selected representative data sets does not find best k-medoids. To recover this
drawback a new algorithm,
Clustering Large Applications based upon RANdomized
search (CLARANS) is introduced. The CLARANS works like CLARA, the only
difference between CLARA and CLARANS is the clustering process that is done
after selecting the representative data sets.
12. What is Hierarchical method?
Hierarchical
method groups all the objects into a tree of clusters that are arranged in a
hierarchical order. This method works on bottom-up or top-down approaches.
Differentiate
Agglomerative and Divisive Hierarchical Clustering?
Agglomerative Hierarchical clustering method works
on the bottom-up approach. In Agglomerative hierarchical method, each object
creates its own clusters. The single
Clusters
are merged to make larger clusters and the process of merging continues until
all the singular clusters are merged into one big cluster that consists of all
the objects
Divisive Hierarchical clustering method works on
the top-down approach. In this method all the objects are arranged within a big
singular cluster and the large cluster is continuously divided into smaller
clusters until each cluster has a single object.
What is CURE?
Clustering
Using Representatives is called as CURE. The clustering algorithms generally
work on spherical and similar size clusters. CURE overcomes the problem of
spherical and similar size cluster and is more robust with respect to outliers.
Define Chameleon method?
Chameleon
is another hierarchical clustering method that uses dynamic modeling. Chameleon
is introduced to recover the drawbacks of CURE method. In this method two
clusters are merged, if the interconnectivity between two clusters is greater
than the interconnectivity between the objects within a cluster.
Define Density based method?
Density
based method deals with arbitrary shaped clusters. In density-based method,
clusters are formed on the basis of the region where the density of the objects
is high.
What is a DBSCAN?
Density
Based Spatial Clustering of Application Noise is called as DBSCAN. DBSCAN is a
density based clustering method that converts the high-density objects regions
into clusters with arbitrary shapes and sizes. DBSCAN defines the cluster as a
maximal set of density connected points.
What do you mean by Grid Based
Method?
In this
method objects are represented by the multi resolution grid data structure. All
the objects are quantized into a finite number of cells and the collection of
cells build the grid structure of objects. The clustering operations are performed
on that grid structure. This method is widely used because its processing time
is very fastand that is independent of number of objects.
What is a STING?
Statistical
Information Grid is called as STING; it is a grid based multi resolution
clustering method. In STING method, all the objects are contained into
rectangular cells, these cells are kept into various levels of resolutions and
these levels are arranged in a hierarchical structure.
Define Wave Cluster?
It is a
grid based multi resolution clustering method. In this method all the objects
are represented by a multidimensional grid structure and a wavelet
transformation is applied for finding the dense region. Each grid cell contains
the information of the group of objects that map into a cell.
What is Model based method?
For
optimizing a fit between a given data set and a mathematical model based
methods are used. This method uses an assumption that the data are distributed
by probability distributions. There are two basic approaches in this method
that are
Statistical Approach
Neural Network Approach.
Name some of the data mining
applications?
Data mining for Biomedical and DNA data analysis
Data mining for financial data analysis
Data mining for the Retail industry
Data mining for the Telecommunication industry
Define outlier.
Very
often, there exist data objects that do not comply with the general behavior or
model of the data. Such data objects, which are grossly different from or
inconsistent with the remaining set of data, are called outliers.
What are the types of outlier
detection method?
Statistical Distribution-Based Outlier Detection
Distance-Based Outlier Detection
Density-Based Local Outlier Detection
Deviation-Based Outlier Detection
What is Statistical
Distribution-Based Outlier Detection?
The
statistical distribution-based approach to outlier detection assumes a
distribution or probability model for the given data set and then identifies
outliers with respect to the model using a discordancy test. Application of the
test requires knowledge of the data set parameters, knowledge of distribution
parameters and the expected number of outliers
26.What is Density-Based Local
Outlier Detection
Statistical
and distance-based outlier detection both depend on the overall or ―global‖
distribution of the given set of data points, D. However, data are usually not
uniformly distributed. These methods encounter difficulties when analyzing data
with rather different density distributions
What is Deviation-Based Outlier
Detection?
Deviation-based
outlier detection does not use statistical tests or distance-based measures to
identify exceptional objects. Instead, it identifies outliers by examining the
main characteristics of objects in a group
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.