A matrix is an array of values; for example, a table of data.
The data table can be thought of as N row vectors (the samples or objects) and M column vectors (the variables). These vectors can be interpreted algebraically or geometrically.
The simple summary statistics already introduced (mean, standard deviation, etc.) can be thought of algebraic descriptors of the properties of a column vector. Covariances and correlation coefficients are algebraic measures that describe the pair wise behavior of column vectors.
One the the problems with using a computer application is that the machine is capable of doing something which does not make sense. Suppose, for example, that you had a matrix consisting of the long axes of 100 pebbles (measured in centimeters) and their weight (measured in grams). For the variables you could compute the summary statistics introduced previously. However, although you could compute the mean of a row vector by adding up the values in each row and dividing by 2.0, would this make sense given that the units of measurement are different -- 20 cm + 98 grams = ?? You must always consider the units of measurement before subjecting data to a transformation or computation which could be inappropriate.
Variable Space
A vector is a directed line segment. If the columns (variables) are selected as the axes of reference, the objects can be located with respect to their coordinates in 4-dimensional space. Clearly, this call for dealing with abstract space and, as long as the user does not insist on a picture of this space in 2-d or 3-d, the geometrical and algebraic concepts that follow hold. Comparing vectors requires deciding the basis for comparison. For example, we could elect to compare the vectors in the following diagram on the basis of the distance between their end points. The distance between vectors 2 and 3 is shorter than the distance between vectors 3 and 4. Therefore, vectors 2 and 3 are more similar than vectors 3 and 4. We could also compare vectors on the basis of the angle (Theta) between them. A very short vector might point in nearly the same direction as a very long vector. The two vectors would be very similar on the basis of the angle between them but very different on the basis of the distance between their end points. In general, the investigator must decide which is the appropriate measure of similarity. As noted previously, however, some transformations are inappropriate for data measured in different units. These considerations will be taken up later on in multivariate applications.
Object Space
If the rows (samples or objects) are selected as the axes of reference, the objects can be located with respect to their coordinates in 4-dimensional space.
Thus, a data matrix can be viewed as either:
Geometrical properties of vectors are useful in working with multivariate statistical applications.
Quartz Feldspar Rock Fragments Matrix 1 50.0 10.0 30.0 10.0 2 40.0 20.0 10.0 10.0 3 50.0 20.0 10.0 20.0 4 60.0 10.0 30.0 0.0
The columns of the table contain the values of the variables and the rows contain the samples studied. The matrix is labeled X and is described by giving the number of rows and columns - the dimensions of the matrix. X is a 4 by 4 matrix. The individual entries in the matrix are the elements of the matrix. A particular element is specified by giving its coordinates (row, column). X(2,4) is 10.0 -- the amount of matrix (the 4th column) in the second (row) sample.
In multivariate statistics we will have occasions in which we want to work in either or both spaces.