Geo Analysis -Multiple Regression
Multiple Regression
William Krumbein conducted a study which illustrates the sorting out of several so-called independent variables in terms of their usefulness in predicting a dependent variable. Thedata set waves can be input into DataDesk.
At each location the mean grain size, wave period, deep-water wave height, the angle of approach and water depth were measured along with the Shoaling zone bottom slope. The later was selected as the dependent variable and the objective was to determine if the bottom slope can be satisfactorily predicted from some combination of the independent variables. It was also of interest to assess the sequence in which the independent variables were added to the predicting equation.
We have learned Dr. W.W. is engaged in an analysis of the effect of the shoaling zone bottom slope on the maneuverability of amphibious landing craft. We suspect that he is under contract with someone who may be planning an invasion. Clearly, it would be difficult and expensive to visit those beaches of prime interest.
- Explore the relationships by using various plots to become familiar with these data. Compute the matrix of correlations using Bottom Slope Angle as the Y variable (dependent). Look at the scatterdiagrams for the two strongest independent variables with the dependent variable. Do you notice anything surprising?
- Compute a full model model regression by adding all of the independent variables and the dependent variable to the model. Generalize about the ability of the independent variables to predict the dependent variable. The full model is described in chapter 23 and has the form:
Y = constant + b1x1 + b2x2 + ..... bnxn.
Note that when their is only one independent variable the equation is that of a straight line and we measured significance by extent to which the observed values fall along a line. When there are two independent variables the equation is that of a plain....and so on.
Is the full-model regression significant? How can you tell?
Look at the plot of predicted versus residuals. Although their correlation is zero (which is imposed by the model), there is some sort of pattern. Think about this as you proceed.
If you click on the open triangle at the top of the model window you can compute predicted and residual values and a number of other quantities. Experiment.
- It would be worthwhile to see if some subset of independent variables can do just as "good" a job as the full-model; this is a part of the sorting out aspect. Select the "best" independent variable and compute the regression and record the R2 value.
There are stepwise models which sequentially add variables to the regression equation until nothing "significant" is accomplished.
- Select the second "best" independent variable. How much does R2 increase by the addition of a second independent variable. How do the two variables compare with the full model? Are you convinced that no other pair would do as good a job of predicting the dependent variable? Note, you can start with the best pair and add variables by dragging their icons into the model window. If you click on the name of the variable in the window you have the option of removing it. In this way you can explore various options.
- Select the best triplet. How much improvement does the third variable add? Is this a significant (look at the t-ratios) addition?
Krumbein notes that this example illustrates the substantive search for geologically meaningful variables than for a formal predictor equation. He was surprised that angle of wave approach was not an important addition to the regression equation because the angle sets the pattern of wave refraction.
-
What observations have you made that would question the validity of the full-model to predict the dependent variable?
- Select the dependent variable and the two "best" independent variables and request a rotating plot. Note that you can position the points in several ways - Plot > Plot Options > Rotating. The bottom right tool (below the knife) allows you to expand the cloud of points. Describe the structure. Recall that you can use the lasso and color clustered points. This may help to see whether any structure disappears during a rotation.
Which samples make up which clusters? (Use the ? tool. Prepare histograms, compute summary statistics and any other thing you can think of. This is what data exploration is about. Don't forget to look at the matrix of the raw data. In your write up comment on the structure in the data.
In your write up find a way to summarize the results of your experiments and include pertinent graphics.
Return to the Geo Analysis Home Page