This chapter will help you to:

  1. Explain the goals of the general linear model
  2. Describe the general linear model as an equation for a line
  3. List which tests of statistical inference fit in the general linear model

Overview

The general linear model (or GLM for short) is a popular approach to statistical inference because of its versatility. The term “general” suggests that it can be applied in most cases. We’ll discuss which cases are appropriate for the general linear model, how it implements the tests of significance, and the goals of the approach.

In this course, we will use the GLM procedure in SPSS for each test of significance. This will limit the number of procedures and menus to learn and facilitate the inclusion of chart creation and assumption checking.

Goals of General Linear Model

The main goal of GLM is to form population estimates of the strength and direction of relationships among predictor variables and outcome variables. The estimates are usually in the form of change. That is, how much does the outcome variable change when the values of the predictor variables change.

Of course, anytime we form an estimate for the population, we must account for sampling error. We want to know if our estimate is reliable (i.e., likely to be found in other samples of similar size) or just a fluke. We can form confidence intervals and perform null hypothesis significance testing on these estimates to do help with this.

In addition to testing each estimate, we can assess the fit of the full model. We can summarize the extent to which our model correctly predicts the variability in the outcome variables. Although beyond the scope of this course, this feature of GLM allows us to compare different models so that we can choose the best fitting model before moving to interpretation and application of the model.

What Is A Statistical Model?

The general linear model is a way to state the direction and strength of linear relationships among variables. We call it a model because it is a guess about how the population values are related that is built from sample data. Just as an engineer might construct a small scale model to test hypotheses, so to does a statistician construct a model to test hypotheses about a larger population. There are many ways to state predicted relationships among variables but perhaps the most popular model is the linear model, which follows a specific form.

\[ Y_i = B X_i+\epsilon_i \]

This is the equation for a line where \(Y_i\) is the outcome variable value for the \(i^{\textrm{th}}\) participant, B is the slope of the line, \(X_i\) is the predictor variable value for the \(i^{\textrm{th}}\) participant, and \(\epsilon_i\) is the error term (more on this shortly).

Slope is the change in the y-variable over the change in the x-variable. That is, it represents the direction (positive or negative) and strength (amount of change) of the relationship of the two variables. The slope is kind of like the correlation coefficient except that the correlation coefficient is standardized and thus does not have units. If you are using the general linear model for regression, you can easily plug in a value of x to get a value of y. If you are interested in testing statistically significant change from one group to another, a standardized value may be easier to work with. A standardized slope is represented as \(\beta\).

The error term is the difference between what the model predicts (e.g., multiplying \(X_i\) by the slope) and the actual outcome variable value.

In null hypothesis significance testing, we assume that the slope is 0 (i.e., no change in outcome variable across values of the predictor variable). We’ll highlight the similarity of this approach with t-tests in the next section.

Although the linear regression model pre-dates the general linear model, it turns out to be just one specific case or implementation of the general linear model. With the more general form, we can include multiple predictors and multiple outcomes. The formula looks very similar but the letters represent something different.

\[ Y = B X + U \] Here’s what changed. Rather than \(Y_i\), which represented the observations of a single outcome variable, \(Y\) is a matrix of observations for multiple outcome variables. Similarly, \(X_i\) is replaced with the more general matrix of predictor variable values, \(X\). \(U\) is the matrix of errors.

A Matrix is a way to organize numbers into columns and rows where the columns typically represent some grouping factor. In statistics, the columns often represent variables.

GLM for Common Analyses

By changing the number of and type of variables that give rise to \(Y\) and \(X\), we can derive the various models that we’ll cover in this course. Each of the following can be modeled in the regression formula of the general linear model

We will not need to worry about how to change the general formula to fit each analysis in this course because we will follow similar steps each time and SPSS will set up the equations for us. However, if you are particularly curious about the magic behind the scenes, this site gives a nice overview of a few of the more simple tests. It also relates these to “non-parametric” versions of these tests (in case you do not have normally distributed data).

GLM in SPSS

SPSS provides many ways to reach the same end for analyses. Take an independent samples t-test for example. You could run the “independent samples t-test” analysis from the “Compare Means” menu. You could dummy code your data and enter into the “Linear” analysis from the “Regression” menu. As we’ll focus on this semester, you can also use the “Univariate” analysis under the “General Linear Model” menu. You will get different output from each of these but the interpretation / conclusion will be the same in each case.

I prefer to use and teach GLM in SPSS because it is nicely bundles together assumption checking, figure plotting, and tests of significance / confidence intervals into one set of steps. Furthermore, once you learn the basics of how to use GLM in SPSS, you can fit most data obtained through experiments. To be clear, GLM cannot handle all data. We’ll need to meet for a different course to discuss the generalized linear model and mixed effects modeling.

For the rest of the chapters, I’ll provide you with a step-by-step guide on how to use the GLM in each circumstance. I’m sure that, by the end, it will seem a bit redundant. That’s a good thing!