Mixed Factorial Analysis of Variance (ANOVA)

Combining Between- and Within-Subjects Factors

The great benefit of the general linear model is the ability to account for various designs into a single statistical model. For this course, the pinnacle of this generality is the “mixed factorial ANOVA.” As the term “factorial” implies, we will be including multiple independent variables. The term “mixed” tells us that we will be combining both within- and between-subjects factors.

The best thing about ending the course on this topic is that it is almost all review. We really are just combining the two types of factorial ANOVAs already discussed.

We can see this with the assumptions.

Assumptions

The mixed factorial ANOVA has three assumptions that we’ll need to verify.

  1. Normality of DV within each combination of levels of IVs. This was a common assumption for both between- and within-subjects designs.
  2. Homogeneity of variance for between-subjects IVs. This will appear slightly differently than before because we will have to check this assumption for across the DV scores in each level of the wtihin-subjects factor.
  3. Sphericity or equality of variance within-subjects IV. As in the original within-subjects ANOVA, we’ll want to verify that the difference scores across the combined levels of the variables are roughly equal.

A Mixed Factorial Example

Here is an example data set that involves the changes to SAT scores across the number of test attempts and the impact of a music education background. Table 1 shows a sample of a data set to illustrate the design.

Table 1

Split Plot Design of Example

Test Attempt
Music Background? First Second Third
Yes P1 = 1261 P1 = 1349 P1 = 1415
P2 = 1275 P2 = 1361 P2 = 1430
P3 = 1232 P3 = 1316 P3 = 1384
P4 = 1314 P4 = 1396 P4 = 1470
P5 = 1238 P5 = 1326 P5 = 1390
No P6 = 1216 P6 = 1221 P6 = 1223
P7 = 1179 P7 = 1181 P7 = 1182
P8 = 1188 P8 = 1190 P8 = 1192
P9 = 1181 P9 = 1186 P9 = 1185
P10 = 1203 P10 = 1210 P10 = 1211

Note. Scores in table represent SAT scores

Pay particular attention to the participants. The same participants are in all three attempts for having no music education background but another group of participants are in the three attempts without having a music education background. As such, "Music" background is a between-subjects variable (with two levels) and "Test Attempt" is a within-subjects variable (with three levels). This is a 2 x 3 mixed factorial design.

The difference in participants will be important when we calculate (or rather, have jamovi calculate) the error terms.

Using jamovi Repeated Measures ANVOA for the Mixed Factorial ANOVA

The Data Set

For the final time for the course, I'll be using the example I have for the other sections. Please follow along using the dataset I provided you.

The Research Question

For this mixed factorial ANOVA example, we'll need one between-subjects variable and one within-subjects variable. I will investigate how country of origin (the between-subjects variable with three levels) and timing of mindfulness training (the within-subject variable with two levels) impacts happiness ratings.

Checking Assumptions

Normality

I'll continue to check for normally distributed outcomes scores within each sample with skewness and kurtosis values coupled with Q-Q plots. Start a new descriptive statistics section by clicking on the "Analyses" tab, then the "Exploration" menu and selecting "Descriptives" (see figure 1).

Figure 1

Descriptives from the Exploration Menu in the Analyses Tab

Descriptives from the Exploration Menu in the Analyses Tab

We will need to combine our approaches for between- and within-subjects designs. Drag the variables that contain the outcome scores for the levels of the within-subjects variable to the "Variables" box then drag the between-subjects variable to the "Split by" box. Turn on the "Skewness" and "Kurtosis" option under the "Distribution" area of the "Statistics" section. Lastly, turn on the "Q-Q" option in the "Q-Q Plots" area of the "Plots" section. Figure 2 shows the complete panel set up.

Figure 2

Completed Descriptives Panel Setup

Completed Descriptives Panel Setup

Table 1

Skewness and Kurtosis Values for Happiness Ratings by Timing of Training and Country

Timing of TrainingCountrySkewnessSEKurtosisSE
BeforeCanada0.4960.337-0.08440.662
 Mexico0.2080.337-1.02480.662
 US0.6610.3370.12960.662
AfterCanada-0.4870.337-0.56640.662
 Mexico0.4400.3370.09990.662
 US-0.6300.3370.23170.662

It seems that all values are within the 2 SE range for acceptable results under the assumption of normality. The Q-Q plots (see figure 3) reinforce this as most of the data are close to the reference line, except for Mexico after training, but that was not severe enough to overturn our assumption.

Figure 3

Q-Q Plots for Happiness Ratings by Timing of Training and Country

A) Before Training

Q-Q Plot by Country Before Training

B) After Training

Q-Q Plot by Country After Training

Homogeneity of Variance

We'll need to wait to check this assumption as the test is an option in the Repeated Measures ANOVA setup.

Sphericity

This is an assumption we don't have to worry about this time because the within-subjects variable only has two levels. As such, sphericity does not apply.

Setting up the Repeated Measures ANOVA

Setting up the Variables

We are going to follow the same set up as we had for the factorial within-subjects ANOVA but add the between-subjects variable into the mix. Start by labeling the within-subjects factor in the "Repeated Measures Factors" box. Add a label for each level in the spaces below. You can now add the appropriate variables that contain the outcome variable within each level of the within-subjects variable to the "Repeated Measures Cells". Next, drag your between-subjects variable to the "Between-subjects Factors" box. Be sure to provide an appropriate label in the "Default Variable Label" box and to turn on "Partial η2" under the "Effect Size" options. The completed top portion of the Repeated Measures ANOVA panel is in figure 4.

Figure 4

Completed Top Portion of Repeated Measures ANOVA Panel

Completed Top Portion of Repeated Measures ANOVA Panel

Assumption Checks

In the "Assumption Checks" section, you'll only need to enable the "Homogeneity test" option as sphericity does not apply.

Post Hoc Tests

Next to Post Hoc Tests to determine which samples are reliably different from other samples. You'll want to drag over any effects that include more than two samples. In this example, that will be the between-subjects factor and the interaction term.

You may remember that for within-subjects variables, we use the "Bonferroi" correction but for the between-subjects variables, we use the "Tukey" correction. Simply enable both to get the correct output for each test. We will continue to read the "Tukey" p-value for the between-subjects tests but we'll use the "Bonferroni" p-value for the interaction effect (because it has within-subjects samples). Figure 5 depicts the completed post hoc tests section.

Figure 5

Completed Post Hoc Tests Section

Completed Post Hoc Tests Section

Estimated Marginal Means

To get the details (i.e., means and 95% CI) for each sample, we'll need to set up our marginal means. Drag the within-subjects variable to the box under "Term 1." Click "Add New Term" then drag the between-subjects variable to the box under "Term 2." Click "Add New Term" one more time and drag both variables to the box under "Term 3". Turn on "Marginal means tables" in addition to "Marginal means plots," which should already be enabled. See figure 6 for the completed "Estimated Marginal Means" section

Figure 6

Completed Estimated Marginal Means Section

Completed Estimated Marginal Means Section

We have finished the setup of the mixed factorial ANOVA so let's check out the results

Interpreting the Results

Checking Assumptions

We left off with our assumption of homogeneity of variance. In mixed factorial design, we'll need to check that assumption within each level of the within-subjects variable. That is, we want to check if we have roughly equal variance across of between-subjects levels in each of our samples of within-subjects variable levels. Table 2 contains the two Levene's tests of homogeneity of variance.

Table 2

Levene's Tests of Homogeneity of Variance

Timing of TrainingFdf1df2p
Before0.95721470.386
After4.62521470.011

It seems that we have a violation in happiness ratings after training but we can maintain the assumption for happiness before training. This is likely stemming from the issue with kurtosis in happiness ratings in Mexico after training (see figure 3, panel B). However, because we have large samples and equal sample sizes, our ANOVA is quite robust against this violation.

The ANOVA Tables

For a mixed factorial ANOVA in jamovi, we will have two ANOVA tables, one for the within-subjects effects (including any interaction terms that involve a within-subjects factor) and one for the between-subjects effects. Although they are separated, we'll read them in the same manner. The "within-subjects effects" table can be found in table 3 and the "between-subjects effects" table can be found in table 4.

Table 3

Within-Subjects Effects

SourceSum of SquaresdfMean SquareFpη2p
Timing of Training4661466.340.05< .0010.214
Timing of Training ✻ Country163281.67.010.0010.087
Residual171214711.6   

Table 4

Between-Subjects Effects

SourceSum of SquaresdfMean SquareFpη2p
Country90808245404.22488< .0010.971
Residual268214718.2   

These tables indicate that all (two main and one interaction) effects are statistically significant. Those involving the within-subjects terms are small effects and the between-subjects term is very large. We'll need some more details to interpret each of these effects. For the main effect of timing of training, we only need to explore the marginal means because the ANOVA tells us that happiness before and after the training is reliably different. We will need post hoc comparisons for country and the interaction term (country x timing) because there are more than two levels involved in each of these effects.

Post Hoc Tests

We had a significant interaction effect so we'll need to interpret that first. This table (table 5) is a big one because it contains comparisons of each sample involved to each other sample.

Table 5

Bonferroni Post Hoc Comparisons

Timing of TrainingCountryTiming of TrainingCountryMean DifferenceSEdftpBonferroni
BeforeCanadaBeforeMexico-16.1200.825147-19.549< .001
  BeforeUS27.2000.82514732.986< .001
  AfterCanada-3.8400.682147-5.627< .001
  AfterMexico-16.5600.773147-21.418< .001
  AfterUS24.0000.77314731.041< .001
 MexicoBeforeUS43.3200.82514752.534< .001
  AfterCanada12.2800.77314715.882< .001
  AfterMexico-0.4400.682147-0.6451.000
  AfterUS40.1200.77314751.889< .001
 USAfterCanada-31.0400.773147-40.146< .001
  AfterMexico-43.7600.773147-56.597< .001
  AfterUS-3.2000.682147-4.689< .001
AfterCanadaAfterMexico-12.7200.718147-17.714< .001
  AfterUS27.8400.71814738.769< .001
 MexicoAfterUS40.5600.71814756.483< .001

It seems that all samples are reliably different from all other samples with the exception of Mexico before training and Mexico after training. That is a succinct summary but it doesn't tell the story of the data very well. When we look at the estimated marginal means, we'll refocus to make a more compelling story.

Let's also check post hoc comparisons for the main effect of country (see table 6). I've remove the pBonferroni column because it was not applicable for these comparisons.

Table 6

Tukey HSD Post Hoc Comparisons

CountryACountryBMean DifferenceSEdftpTukey
CanadaMexico-14.40.604147-23.9< .001
 US27.50.60414745.6< .001
MexicoUS41.90.60414769.4< .001

These comparisons suggest that each country had reliably different happiness ratings than each other country, on average. I write "on average" because we are collapsing across "before" and "after" levels of the within-subjects variable of timing of training. We'll check the estimated marginal means for the pattern of results.

Estimated Marginal Means

Now that we know what is different and what is not, we'll want to look at our interaction plot (figure 7) to determine the best way to tell the story of our data.

Figure 7

Interaction Plot of Timing of Training and Country on Happiness Ratings

Interaction Plot of Timing of Training and Country on Happiness Ratings

Note. Error bars represent 95% CI.

Perhaps the easiest thing to see in this figure is that there seem to be three bands, a lower, a mid-range, and an upper band. These correspond to the countries (our main effect). We can start with this description, but there is more to the story. Within two of the countries (US and Canada), we see an increase in happiness from before training to after training. This is not the case for Mexico. This change in the relationship between Timing of Training and Happiness across Countries is the interaction effect we detected in the ANOVA.

What about the main effect of timing of training? Although the marginal means (see figure 8) suggest a reliable increase in happiness ratings after training compared to before training, we should not report this because it is not true for all countries (see "Mexico" data in figure 7).

Figure 8

Main Effect of Timing of Training

Main Effect of Timing of Training

Note. Error bars represent 95% CI

I suggest including the estimated marginal means tables for the interaction (table 7) and for the main effect of country (table 8) for references in the write-up.

Table 7

Estimated Marginal Means for Interaction Effect

CountryTiming of TrainingMeanSE95% CI Lower95% CI Upper
CanadaBefore49.20.58348.050.4
 After53.00.50852.054.0
MexicoBefore65.30.58364.266.5
 After65.80.50864.866.8
USBefore22.00.58320.823.2
 After25.20.50824.226.2

Table 8

Estimated Marginal Means for Main Effect of Country

CountryMeanSE95% CI Lower95% CI Upper
Canada51.10.42750.352.0
Mexico65.50.42764.766.4
US23.60.42722.824.4

The Write-Up

One last reminder to include the three components of each part of the write up:

  1. State the test
  2. Interpret the results
  3. Provide statistical evidence

To determine the impact of timing of mindfulness training and country of origin on happiness rating, I needed to perform a mixed factorial analysis of variance. The assumptions of this analysis are normality of the outcome variable values within each sample, homogeneity of variance, and sphericity. I test for normality by checking skewness and kurtosis values, which were all within 2 SE of 0 (see table 1). I checked the assumption of homogeneity of variance using Levene's test. These tests indicated no violation for happiness ratings before training (F[2,147] = 0.957, p = 0.386) but a violation for happiness ratings after training (F[2,147] = 4.625, p = 0.011). This violation is not a concern because the sample size is large and balanced (N = 150, n = 50 per country). The assumption of sphericity does not apply to this design because the within-subjects variable (i.e., timing of training) only has two levels (i.e., before and after).

The analysis of variance revealed a significant interaction effect (F[2,147] = 7.01, p = 0.001, η2p = 0.087) and significant main effects (FTiming [1,147] = 40.05, p < .001, η2p = 0.214; FCountry[2,147] = 2488, p < .001, η2p = 0.971). The interaction plot (see figure 1) reveals a main effect of country such that Mexico has highest happiness ratings, on average, followed by Canada and then by the U.S. (see table 2 for means and 95% CI). The interaction seems to be driven by the increase in happiness scores after training for the U.S. and Canada, but not for Mexico (see table 3 for pairwise comparisons and table 4 for means and 95% CI)

Table 1

Skewness and Kurtosis Values for Happiness Ratings by Timing of Training and Country

Timing of TrainingCountrySkewnessSEKurtosisSE
BeforeCanada0.4960.337-0.08440.662
 Mexico0.2080.337-1.02480.662
 US0.6610.3370.12960.662
AfterCanada-0.4870.337-0.56640.662
 Mexico0.4400.3370.09990.662
 US-0.6300.3370.23170.662

Figure 1

Interaction Plot of Timing of Training and Country on Happiness Ratings

Interaction Plot of Timing of Training and Country on Happiness Ratings

Note. Error bars represent 95% CI.

Table 2

Estimated Marginal Means for Main Effect of Country

CountryMeanSE95% CI Lower95% CI Upper
Canada51.10.42750.352.0
Mexico65.50.42764.766.4
US23.60.42722.824.4

Table 3

Bonferroni Post Hoc Comparisons of Before and After Training Happiness Ratings

CountryMean DifferenceSEdftpBonferroni
Canada-3.8400.682147-5.627< .001
Mexico-0.4400.682147-0.6451.000
US-3.2000.682147-4.689< .001

Note. Only within-country comparisons are shown

Table 4

Estimated Marginal Means for Interaction Effect

CountryTiming of TrainingMeanSE95% CI Lower95% CI Upper
CanadaBefore49.20.58348.050.4
 After53.00.50852.054.0
MexicoBefore65.30.58364.266.5
 After65.80.50864.866.8
USBefore22.00.58320.823.2
 After25.20.50824.226.2