Geo Analysis - Chi Squared

Regression Analysis


Several authors (including our own Dr. WW) have speculated that the depth to an active subduction zone influences the chemical composition of liquids produced by partial melting above the subducting slab. Data given below were gathered as follows. Suites of igneous rocks from known subduction zone complexes were analyzed for major elements. For each suite a binary plot was made of K2O versus SiO2 and a best fit straight line drawn by eye through the points. The amount of K2O at 57.5% SiO2was determined for each suite. These values and the depths to the subduction zone appear below for 17 suites. Depth is given in Km and K2O in weight percent. Ulitmately, the goal of these authors was to be able to measure the K2 in a volcanic rock produced by partial melting near a subduction zone and then estimate the depth to the subduction zone. This would allow the reconstruction of the geometry of subducting slabs that no longer exist.

Simple linear regression models are available on Data Desk.

Suite Depth K2O

  1. 175 2.29
  2. 170 1.23
  3. 215 3.27
  4. 185 1.95
  5. 190 2.18
  6. 195 1.65
  7. 165 1.38
  8. 180 2.15
  9. 185 0.86
  10. 140 0.24
  11. 155 1.08
  12. 140 1.62
  13. 180 0.98
  14. 185 1.74
  15. 200 2.39
  16. 150 0.71
  17. 190 1.37

    Prepare histograms, normal probability plots,and a scatter diagram of the two variables . For the scatter diagram use a range of 0 to 4 K2O and 100 to 250 Depth.

  1. What is the correlation between the two variables? Interpret this (especially the sign) in terms of the model noted above. Does this make "geological sense"?

  2. Which variable do you think should be selected as the independent variable? Why?

  3. Read chapter 22 in the Data Desk manual. Do a regression analysis using your choice of independent variable as X. Write the equation of the best fit straight line in words. On an appropriate scatter diagram (using the scales given above) plot the regression line and print a relatively large copy. Label the line as YonX.

  4. The regression summary table contains tje s.e. (standard error) of each coefficient; they estimate the standard deviation of the coefficients following a sampling experiment in which many random samples from the same target population were drawn.

    The column t-ratio holds the test-statistics for assessing whether the slope coefficient or the constant term are 0.0. Compare the values of t-ratio with the appropriate t-statistic (page 95). If the null is rejected, the value differes significantly from 0.0. If, for example, you accept the null hypotheses you are claiming that knowledge of X tells you nothing about the value of Y. In this case the best estimate of Y is the mean of all of the Y values. As the correlation coefficient decreases, the likelyhood of accepting the null hypothesis increases. Evaluate the two t-ratios for this example. These are two-tailed tests - the null is that the coefficient equals 0.0 and the alternative is that it does not equal 0.0. Select a significant level 0f 5%.

    The middle part of the table is these results of an Analysis of Variance. See page 22-7 for an explanation of the ANOVA part of the regression summary.

  5. Interpret the computed value of R2.

  6. From the scatter diagram (the open triangle) plot the residuals versus the predicted values. What is the correlation between predicted and residuals? Interpret based on reading 22/10.

  7. Recall that for each pair of variables there are two potential regression lines. You have computed one of them. Now, by switching the X and Y labels, produce the other. Plot this line on the hard copy you made of your first regression line and label it XonY. Define the point of interestion of the two lines and estimate the angle between them. As the correlation coefficient increases (approaches 1.0) you expect this angle to ________.

  8. Calculating the equation of the line of organic correlation

    The form of the equation is Yi = Gx + BxXi where Gx is the intercept and Bx the slope. The user picks one variable to be designated as X and the other as Y. There is NO implication as to dependence versus independence as there is with regression analysis.

    The absolute value of Bx is given by the ratio sy/sx (where s is the standard deviation). The sign of the slope is given by the sign of the correlation coefficient.

    Gx = Yy - BxXx where Bx is defined as above and Y and X are the means of the variables designated as Y and X respectively.

    Example:

    Y: sy = 0.79 and mean = 21.4

    X: sx = 1.38 and mean = 43.4

    rx,y is negative

    |Bx| = (.79/1.38) = .572; as r is -, Bx = -.572.

    Gx = 21.4-(-.572*43.4) = 46.2

    Therefore, Yi = 46.2 - 0.572Xi

    If you reverse your assignment of X and Y, you should obtain the same equation - try it and see.

    Plot this line on your hard copy and label it as the line or organic correlation. This line should pass through the point of intersection of XonY and YonX.

  9. Testing the Correlation Coefficient against a null of 0.0

    tx = r((N-2)/(1-r2)).5

    if tx is > student's t with (N-1) d.f. the null is rejected

    Evaluate the correlation between this pair of variables.

Write up your evaluation of the relationship(s) between K2O and Depth to the Subduction Zone. Make sure that your brief report is documented with appropriate figures and that you answer the questions raised above. At the conclusion of your report address the following issue. Experiments show that the extreme values can have a strong influence on the regression equation (remember the "dumb-bell" effect). Examine the scatter diagrams. Is there something "suspicious"? Remove the two highest and two lowest pairs of variables from the data set. Return to the Geo Analysis Home Page