The General Linear Model

The General Linear ModelOverviewGoals of General Linear ModelWhat Is A Statistical Model?GLM for Common AnalysesGLM in jamovi

Overview

The general linear model (or GLM for short) is a popular approach to statistical inference because of its versatility. The term “general” suggests that it can be applied in most cases. We’ll discuss which cases are appropriate for the general linear model, how it implements the tests of significance, and the goals of the approach.

In this course, we will use the GLM procedure in SPSS for each test of significance. This will limit the number of procedures and menus to learn and facilitate the inclusion of chart creation and assumption checking.

Goals of General Linear Model

The main goal of GLM is to form population estimates of the strength and direction of relationships among predictor variables and outcome variables. The estimates are usually in the form of change. That is, how much does the outcome variable change when the values of the predictor variables change.

Of course, anytime we form an estimate for the population, we must account for sampling error. We want to know if our estimate is reliable (i.e., likely to be found in other samples of similar size) or just a fluke. We can form confidence intervals and perform null hypothesis significance testing on these estimates to do help with this.

In addition to testing each estimate, we can assess the fit of the full model. We can summarize the extent to which our model correctly predicts the variability in the outcome variables. Although beyond the scope of this course, this feature of GLM allows us to compare different models so that we can choose the best fitting model before moving to interpretation and application of the model.

What Is A Statistical Model?

The general linear model is a way to state the direction and strength of linear relationships among variables. We call it a model because it is a guess about how the population values are related that is built from sample data. Just as an engineer might construct a small scale model to test hypotheses, so to does a statistician construct a model to test hypotheses about a larger population. There are many ways to state predicted relationships among variables but perhaps the most popular model is the linear model, which follows a specific form.

Y_{i} = B X_{i} + ϵ_{i}

$Y_i$ ^th $X_i$ ^th $ϵ_i$ is the error term (more on this shortly).

Slope is the change in the y-variable over the change in the x-variable. That is, it represents the direction (positive or negative) and strength (amount of change) of the relationship of the two variables. The slope is kind of like the correlation coefficient except that the correlation coefficient is standardized and thus does not have units. If you are using the general linear model for regression, you can easily plug in a value of x to get a value of y. If you are interested in testing statistically significant change from one group to another, a standardized value may be easier to work with. A standardized slope is represented as ββ.

$X_i$ by the slope) and the actual outcome variable value.

In null hypothesis significance testing, we assume that the slope is 0 (i.e., no change in outcome variable across values of the predictor variable). We’ll highlight the similarity of this approach with t-tests in the next section.

Although the linear regression model pre-dates the general linear model, it turns out to be just one specific case or implementation of the general linear model. With the more general form, we can include multiple predictors and multiple outcomes. The formula looks very similar but the letters represent something different.

Y = B X + U

$Y_i$ $Y$ matrix $X_i$ $X$ $U$ is the matrix of errors.

Matrix: a way to organize numbers into columns and rows where the columns typically represent some grouping factor. In statistics, the columns often represent variables.

GLM for Common Analyses

$Y$ $X$ , we can derive the various models that we’ll cover in this course. Each of the following can be modeled in the regression formula of the general linear model

Independent samples t-test
Paired samples t-test
One-way Between-Subjects ANOVA
Factorial Between-Subjects ANOVA
One-Way Within-Subjects ANOVA
Factorial Within-Subjects ANOVA
Mixed Factorial ANOVA

We will not need to worry about how to change the general formula to fit each analysis in this course because we will follow similar steps each time and jamovi will set up the equations for us. However, if you are particularly curious about the magic behind the scenes, this site gives a nice overview of a few of the more simple tests. It also relates these to “non-parametric” versions of these tests (in case you do not have normally distributed data).

GLM in jamovi

Like many statistical applications, jamovi has incorporated different analytical traditions. Experimental psychology has favored the Analysis of Variance approach. You'll read more about this next week, but here is a quick introduction. Analysis of variance takes all the differences in scores in an outcome variable and determines how much of the difference (i.e., variance) is due to different ways that we can group the data (i.e., predictor variables). As mentioned before, ANOVA is one of the approaches that can be modeled with a linear equation with categorical predictors.

jamovi has an "ANOVA" button that contains different types of ANOVA.

One-way ANOVA (Between-subjects design with one outcome and one predictor)
ANOVA (Between-subjects design with one outcome and multiple predictors)
Repeated Measures ANOVA (Within-subjects and between-subjects design with multiple outcomes and multiple outcomes)
ANCOVA (Analysis of Covariance) (Between-subjects design with one outcome and categorical and continuous predictors)
MANCOVA (Multivariate Analysis of Covariance) (Between-subjects design with multiple outcomes and categorical and continuous predictors)

For this course, we will focus on the "ANOVA" and "Repeated Measure ANOVA"options for all the analyses we'll cover. This will limit the number of steps we'll need to learn and will standardize some of the output we'll encounter.