Geo Analysis - Chi Squared Example

Chi Squared Example


The null hypothesis is the distribution of values being examined can be described as a normal distribution (or any other distribution that you wish to test). The data are first converted to "z-scores" by subtracting the mean from each value and dividing by the standard deviation. The mean of the resulting values is 0.0 and the standard deviation is 1.0.

If the data follow a normal distribution then values from the standardized normal distribution (Table 3.8, p. 87) can be used to determine what percentage of the values should fall within a given range. From this table note that 25% of the area under the curve lies from minus infinity to -0.67 (z-scores) so that 25% of the values should have z-scores in this interval. If the distribution is normal, there should be a good agreement between the Expected Values and the Observed Values.

Typically, the "bins" are chosen so that the same percentage of the values should fall in each bin. The height and width measures from the infant dataset are analyzed below. Each distribution was divided into four equal areas as shown below and each bin should include 5 samples (25%) if the distribution was in fact normal. For each bin the difference between the numbers of observed and expected values is squared. These values are added for all of the bins -- the U-squared statistic. If U-squared is greater than the tabulated value from the Chi-squared table (3.16, p. 118) with (Number of Bins - 3) degrees of freedom and a selected level of significance, the null hypothesis is rejected.


Sepal Length

Internal Expected Observed
-INF to -.67 37.5 47
-.67 to -0.00 37.5 34
0.00 to +.67 37.5 27
+.67 to + INF 37.5 42

The U-squared statistic is computed to be 6.21. The tabulated Chi-squared statistic with 1 d.f. and a significance level of 5% is 3.84. Therefore, a normal distribution of Sepal Lengths is rejected at the 95% confidence level; that is, there is a chance (5%) that a correct hypothesis has been rejected. Note that the "true" nature of the distribution is unknown.


Petal Width

Internal Expected Observed
-INF to -.67 37.5 50
-.67 to -0.00 37.5 10
0.00 to +.67 37.5 44
+.67 to + INF 37.5 46

The U-squared statistic is computed to be 27.39. The tabulated Chi-squared statistic with 1 d.f. and a significance level of 5% is 3.84. Therefore, the null hypothesis is rejected.

You will have the opportunity to try different transformations. For example, if the logarithms of the raw data are normally distributed the distribution is said to be log normal

Return to the Chi-**2 Exercise