Geo Analysis - Cluster Analysis

Output From Cluster Analysis



The user selects an R mode or a Q mode type of analysis. R mode will produce a dendrogram of variables. The goal is to determine if there are The user selects a measure of similarity. Not all measures are suitable for a particular analysis or data type. Each measure has a built in definition of similarity. For example, if the variance-covariance matrix is used in an R mode analysis, the pair of variables with the largest positive covariance is selected as the most similar. If the matrix of correlation coefficients is used, the most similar pair are those with the largest positive correlation coefficient. Different clusters are expected if different measures of similarity are used in the analyses.

An R mode dendrogram for the Iris data with the matrix of correlation coefficients as the similarity matrix is given below.

The first piece of output gives a summary of the sequence of linking -- once a pair of variables are linked or fused they cannot be separated.

  1. 3 4 0.96
  2. 1 3 0.83
  3. 1 2 -.25

Variables 3 and 4 link at the highest level. At 0.83 variable 1 links with the pair 3-4 and at -0.25 variable 2 links with triplet 1-3-4. The output is duplicated below. The sequence of linking can be used to modify the computer printout to a more satisfactory form.

Dendrograms

Q mode analysis is handled the same way. The difficult part is where to draw the line separating "individual entities" and clusters. In the example above you could treat 3 and 4 as a fused pair and 1 and 2 as individuals. Or, you could treat 3-4 and 1-3-4 as clusters and 2 as an individual. Within the same dendrogram be consistent as to where you draw the boundaries.

I would begin a Q mode anlaysis by doing an R mode. If I found three clusters in the R mode I would look for three in the Q mode. Keep in mind that you may have a sample(s) or variable(s) which are outliers and just don't seem to fit.

Return To Cluster Analysis Exercise