Paired Samples T-Test

This chapter will help you to:

State the goal of the paired samples t-test
Compare the paired samples t-test to the independent samples t-test
State the null hypothesis for the paired samples t-test
Relate the paired samples t-test to within-subjects designs
State the assumptions of the paired samples t-test
Relate the paired samples t-test to the general linear model
Use SPSS to test the assumptions of the paired samples t-test
Use SPSS to conduct an paired samples t-test with the General Linear Model procedure
Interpret the output of the General Linear Model Procdedure
Write up the results of the paired samples t-test in APA style

The Paired Samples T-test

In the last lesson, we explored the independent samples t-test. Recall that we would want to employ that test when we are judging if two samples (collected from unrelated groups) come from the same population. We have the same goal for the paired samples t-test but we will looking at two samples collected from the same or related groups. The relatedness of samples give the paired samples t-test a statistical advantage over the independent samples t-test (I’ll explain this in more detail in a bit). First, it is important to note that, because of the applicability of this approach to different designs, there are several names for this approach including dependent samples t-test, within-subjects t-test and repeated measures t-test.

Research Designs for the Paired Samples T-test

As the name suggests, there is something that connects the samples of data. In the simplest version, the samples may be connected because they come from the same group of individuals as in the case with a within-subjects design, which is also known as a repeated-measures design. Recall that in this design, each participant receives all levels of the independent variable. This also means that we would record the dependent variable repeatedly (after each level of the independent variable is administered).

In some studies, we may record our dependent variable for similar individuals. This is called a matched-pairs design. For example, if we wanted to know if a certain educational intervention was effective for improving student reading comprehension, regardless of personal factors such as socioeconomic status, gender, and ethnicity, we could compile two groups that had similar types of individuals. One group would receive the intervention (the experimental group) and the other would not (the control group). Importantly, we would then match each participant in the experimental group with a participant in the control group that had the same socioeconomic status, gender, and ethnicity.

The Null Hypothesis

Regardless of which design is used, the null hypothesis is the same. The null hypothesis is that the two samples come from the same population. However, because of the related nature of the samples, we can more directly compare each set of values. As we’ll see in the next section, we can combine the two samples into one sample of “difference scores.” The null hypothesis is thus reduced to:

\[ H_0: \mu_D = 0 \]

Simply stated, if the samples come from the same population, we expect the average of the difference scores to be zero.

The Formula

The formula for the independent samples t-test was

\[ t = \frac{\textrm{Effect}}{\textrm{Error}}=\frac{M_1 - M_2}{S_{M_1-M_2}} \]

In the same spirit, we will be comparing the effect of the independent variable to the expected sampling error but this time, we will combine our two samples first. The formula for the paired samples t-test becomes

\[ t = \frac{M_D}{S_{M_D}} \]

In our formula, \(M_D\) is the average of the difference scores and \(S_{M_D}\) is the standard error of the difference scores.

Statistical Power Boost

As I mentioned earlier, the paired samples t-test is more statistically powerful. Practically, this means that we an use fewer participants than if we had a between-subjects design.

Statistical Power refers to the probability of detecting an effect when an effect is present

The real change in power is derived from the denominator of the formula. \(S_{M_D}\) represents the sampling error associated with the difference scores. Unlike the standard error for the independent samples t-test, the standard error fo the difference scores removes the variability in scores due to participants. What is left is just the variability in the change scores. We’ll further explore this concept when we discuss the analysis of variance approach.

Let’s imagine that we completed both a between-subjects deisgn and a within-subjects design testing the effect of a weekly leadership program on youth self-esteem ratings. Let’s assume that we saw a 5 point increase in both designs. Although the effect is the same, we’re more likely to reject the null hypothesis for the within-subjects design because the overall t-value is larger. This is because the denominator is smaller.

\[ t_{\textrm{Between}} = \frac{5}{3} = 1.67 \] \[ t_{\textrm{Within}} = \frac{5}{2} = 2.50 \] With a larger t-value and the same degrees of freedom, the p-value of the t-value decreases.

Although statistical power is a great benefit of within-subjects designs, there are some concerns as well. For example, we may be concerned about carryover effects in which the previously experienced levels of the IV may influence later DV. As such, we may not be able to properly distinguish the impact of each level of the IV on the DV. We need to weigh the pros and cons of each design each time we plan a study.

Assumptions

Because we are combining our two samples into one sample of difference scores, our assumptions also reduce. We have just one.

Normally distributed difference scores. The focus shifts to the difference scores because we are testing how far from the assumed difference (i.e., 0) the mean of the difference scores is.

The assumptions may be fewer but, because of the structure of the the data in SPSS for within-subjects designs, we’ll find setting up the general linear model in SPSS a little different.

Using SPSS: GLM for Paired Samples T-Test

The Data Set

For this example, we’ll be using the SelfEsteemChange.sav file from Canvas. Figure 6.1 is the variable view of this data set.

Figure 6.1

Variable View of SelfEsteemChange.sav Data Set

We have two variables in the data set, which seems appropriate for a simple study. Notice the variable names of “SE_Before” and “SE_After.” Neither of these seem to be a clear IV or DV> Let’s check out the data view in Figure 6.2.

Figure 6.2

Selection of the Data View for the SelfEsteemChange.sav Data Set

Which of these is the IV? Which is the DV? Both columns contain what look to be DV scores and neither contain categorical variable levels like we’d expect from an IV. The truth is that the data are all DV scores but they are the scores under different conditions (i.e., IV levels). What that IV is and what the levels of the IV are requires a little sleuth work.

A look back to the variable view for the label field tells us that the scores for SE_Before correspond to those ratings provided by participants before 12 weekly sessions and SE_After are the self-esteem ratings after the sessions. As such, it seems that the IV is “time” with 2 levels (i.e., before and after).

This file format for the data is necessary for testing within-subjects design.

The Research Question

We now know the IV and DV in our data set by assessing the data file. This matches the research question nicely: does a 12 week session improve self-esteem scores? Is this an appropriate research question for a paired samples t-test? The self-esteem scores appear to be continous and measured twice. The independent variable is categorical with two levels (i.e., before and after). We’ll have to check out assumption regarding normality of the difference scores before continuing however.

Checking for Normality of Difference Scores

Before we can check the distribution of difference scores, we need to calculate the difference scores.

Calculating the Difference Scores

SPSS has a bult-in “Compute Variable” function that is accessible through the “Transform” menu in the menu bar (See Figure 6.3)

Figure 6.3

Compute Variable in Transform Menu

The window that opens (See Figure 6.4) requires that you enter a name for your new variable (refered to as the “target variable”). Important to note that this is a variable “name” and not a “label.” That means that you cannot have spaces. I recommend entering “Difference” into the “target variable” box.

Figure 6.4

Compute Variable Window

Once you’ve entered the variable name, you’ll want to tell SPSS how to determine the values for the variable. We’ll ask SPSS to subtract each value of “SE_Before” from each corresponding value of “SE_After.” To do this, simply select “SE_After” from the box with variables on the left and drag it to the “Numeric Expression” box on the upper right area of the window. Next, click on the “-” button in the first column of the section of buttons directly below the “numeric expression” box. You shold have a numeric expression that looks like that pictured in Figure 6.5.

Figure 6.5

Creating the Numeric Expression

Now you just need to drag over “SE_Before” to the end of the numeric expression you’ve started. If your expression looks like that pictured in Figure 6.6, click past to generate your syntax. If it does not, you can simply type the expression as it is represented in Figure 6.6. You always have the option to type the expression rather than dragging and clicking if you find it easier or faster.

Figure 6.6

Complete Numeric Expression

Check your syntax to ensure that it matches the syntax in Figure 6.7. If so, highlight that syntax and click the “run” button. If not, be sure to retry the steps above to create the difference score.

Figure 6.7

Compute Variable Syntax

As when we standardized scores in the last lesson, the output we really care about is in the data view of the data editor. We should find a new column named “Difference” that has values like those pictured in Figure 6.8.

Figure 6.8

Data View of Difference Variable

We are now ready to check our assumption regarding normality with these scores. We’ll do so wit a Q-Q Plot.

Producing the Q-Q Plot

Recall that the Q-Q Plot is one way to visually assess normality by examining how for and in what pattern our data deviates from the expected normal distribution line. As with a histogram, we are judging the severity of the deviation to determine if we need to follow up with other, more specific checks (i.e., statistics for skewness and kurtosis). The advantage of the Q-Q plot is that it is fast and can also make possible outliers more easily visible.

To make a Q-Q plot, navigate to the “Q-Q Plot” menu option in the “Analyze” menu of the menu bar (see Figure 6.9).

Figure 6.9

Q-Q Plot Menu

The Q-Q Plot window should appear as in Figure 6.10.

Figure 6.10

The Q-Q Plot Window

This is a very easy to produce plot. You simply drag your variable of interest (in our case, “difference”) to the “Variable” box in the center of the window. When your window looks like Figure 6.11, click “Paste” to generate syntax.

Figure 6.11

Producing Q-Q Plot for Difference Scores

Navigate to the snytax window, highlight the newly created syntax (see Figure 6.12) and click the “run” button.

Figure 6.12

Q-Q Plot Syntax

If your output window does not open, select it from the “Window” menu in the menu bar.

You should find two Q-Q plots in the output. We will only focus on the first Q-Q Plot: the normal Q-Q Plot (See Figure 6.13).

Figure 6.13

The Normal Q-Q Plot for Difference Scores

This plot reveals a nicely normally distributed set of scores. Notice how tightly the dots hang to the expected normal distribution line. Figure 6.14 and 15 represent Q-Q plots that are skewed and kurtotic, respectively.

Figure 6.14

Q-Q Plot of Skewed Distribution

A strong skew is represented by a “bowing” from the normal line up or down.

Figure 6.15

Q-Q Plot of Kurtotic Distribution

A strong kurtosis is represented by a “snaking” aroudn the normal line.

With our difference scores presenting as normally distirbuted, it is time for us to construct our general linear model to test the effect of the sessions on self-esteem scores.

Setting Up the General Linear Model

Step 1 Select the Repeated Measures GLM

Navigate to the “Analyze” menu, go to “General Linear Model” and then select the “Repeated Measures…” (see Figure 6.16).

Figure 6.16

Repeated Measures GLM Menu

Step 2 Setting up Factors and Measures

The “Repeated Measures Define Factor(s)” window (see Figure 6.17) will open. In this window, we will need to tell SPSS what our independent variable is and what are dependent variable is. Notice that we do not have the typical list of available variables from which we can choose. We will need to provide the information that we extracted earlier when investigating the data set. We’ll drag our variables to the appropriate places in the next window.

Figure 6.17

Repeated Measures Define Factors Window

In the “Within-Subject Factor Name:” box, you’ll want to enter the name of our independent variable. I would suggest calling it “Time” as the dependent variable is separate before and after the sessions (see Figure 6.18).

Figure 6.18

Naming the Within-Subjects Factor

Next, we’ll tell SPSS how many levels are associated with our variable of “Time.” Please type “2” into the “Number of Levels:” box below the “Within-Subjects Factor Name.” Your window should look like Figure 6.19.

Figure 6.19

Setting the Within-Subjects Factor Levels

This procedure allows for factorial designs (i.e., multiple predictors) so we’’ have to click the “Add” button to save the factor we’re setting up. This will move the factor name and levels to the box beside the “Add” button (see Figure 6.20).

Factorial Designs include more than one categorical predictor variable.

We only have one independent variable so we can move on to setting the dependent variable. We’ll enter the name of our DV in the “Measure Name:” box. Using “SelfEsteem” seems apt enough. Remember, no spaces! Click the “Add” button in the “Measure Name:” area.

Figure 6.20

Saving the Within-Subjects Factor

When you are finished adding your within-subjects factor and measure, your window should look like Figure 6.21. Click “Define” to move to the next stage.

Figure 6.21

Adding a Dependent Variable

Step 3 Assigning Variables to Factors

The “Repeated Measures” window will now appear (see Figure 6.22). The box on the left lists all of our available variables in the data set. The “Within-Subjects Variables (Time):” box is populated with placeholders determine by the last stage. The first placeholder, “?(1,SelfEsteem)” is asking for the dependent variable scores associated with the first level of the independent variable. The second placeholder is asking for the DV scores associated with the second level of the IV.

Figure 6.22

The Repeated Mesures Main Winow

Click and drag the “SE_Before” variable over to the first placeholder (see Figure 6.23).

Figure 6.23

Setting the First Set of DV Scores

You’ll then need to drag “SE_After” to the other placeholder. Although it is fairly easy to see how the variables align with the placeholders in this case, it can be more complicated for factorial designs. Be sure to pay attention to the combination and order of factors. Also know that each placeholder needs filled with a variable for the model to run.

Step 4 Creating a Bar Chart

The repeated measures GLM has the same options available as the univariate GLM (see the buttons on the right side of the “Repeated Measures” window). We’ll ask SPSS to create a bar chart comparing our two samples of self-esteem scores and to include error bars set to 95% confidence intervals.

Click on the “Plots” button to open the “Repeated Mesures: Profile Plots” window (see Figure 6.24)

Figure 6.24

Profile Plots Window

As with the univariate GLM profile plot, we’ll want our predictor (Time) moved to the “Horizontal Axis” box before clicking “Add.” After it is added, select the “Bar Chart” option in the “Chart Type” area. You’ll also want to be sure that the “Include Error Bars” option is selected in the “Error Bars” area. The default setting of “Confidence Interval (95.0%)” is exactly what we want. Your window should look like that in Figure 6.25.

Figure 6.25

Completed Profile Plots Window

If everything is correct, click the “Continue” button in the bottom of the window.

Step 5 Getting Means and Confidence Intervals

We’ll need the actual values for the means and confidence interval bounds for the write-up so let’s ask SPSS to supply those. Click the “EM Means” button in the main “Repeated Measures” window to open the “Estimated Marginal Means” window.

Drag the “Time” variable from the “Factor(s) and Factor Interactions:” box to the “Display Means for:” box. Click the “Continue” button to return to the main window. Figure 6.26 shows the completed window.

Figure 6.26

Completed Estimated Marginal Means Window

Your GLM is now set up and the “Repeated Measures” main window should look like Figure 6.27. If it doesn, click paste to generate the syntax.

Figure 6.27

Completed Repeated Measures Window

Step 6 Generate the Output

Navigate to the syntax window and select the syntax associated with the general linear model as reflected in Figure 6.28.

Figure 6.28

Syntax for Repeated Measures GLM

Click the green “Run” button to generate the output.

Interpreting the Output

We’ve already checked our assumption of normally distributed difference scores so we can get right to the test of the effect. Note that there are several tables before we get to the table of interest (i.e., “Within-Subjects Factors”,“Multivariate Tests”, and “Mauchly’s Test of Sphericity”). These will discuss these when we get to the factorial designs.

Test of Within-Subjects Effects

The “Test of Within-Subjects Effects” table is presented in Figure 6.29.

Figure 6.29

Test of Within-Subjects Effects

This table is very similar to the “Test of Between-Subjects Effects” table from the last lesson in that they are both ANOVA tables (the columns are identical). There are two notable differences.

The first is that there are several rows associated with both the effect (i.e., Time) and error. These represent adjustments for when the assumption of sphericity is violated. Sphericity IS NOT an assumption for the paired samples t-test. As such, we will only focus on the first row in each section (although, because there is no sphericity to check, all of the values are identical).

The second is that the error term is associated with a within-subjects factor. The term is correctly read as “Error within Time.” Again, because we are conducting the simplest version of the repeated measures GLM, there is only one error term.

To know if we have a significant effect, we’ll check the top line associated with our within-subjects factor of “Time” (see Figure 6.30).

Figure 6.30

Highlighted Row for Within-Subjects Effect

As with the independent samples t-test, we’ll look at the “Sig.” value to determine statistical significance. The value is “.000,” which is less than \(\alpha = .05\). As such, we would reject the null hypothesis that our two samples came from the same population.

We’ll revisit the table for our write-up.

Estimated Marginal Means

The estimated marginal means table provides the mean and 95% confidence intervals for our two groups (see Figure 6.31)

Figure 6.32

Estimated Marginal Means and Confidence Intervals

As we should expect given the result of the test of within-subjects effects, the confidence intervals for the two groups does not overlap. Let’s see the represented visually in the bar chart.

Bar Chart

The bar chart we requested is presented in Figure 6.32.

Figure 6.32

Unstyled Bar Chart Comparing Self-Esteem Scores Before and After Sessions

A few things to attend to:

The error bars do not overlap. This corroborates our conclusion from the “Within-Subjects Test of Effects” table.
The error bars are quite small. This is because these error bars are based on the standard error after removing the between-subjects variability. You will not get this same result using the Chart Builder.
The axes need to be appropriately labeled. The y-axis should reflect our dependent variable and the x-axis should contain the levels of our independent variable.

Presenting the Results in APA Format

Styling the Bar Char

To change the axes, you need to click once to select the text then click again to enter the “editing” mode. Once you’ve made the text edits you like, simply click on any part of the chart (outside of the text you’re editing).

Also, remember to remove any titles, notes, and grid lines.

The stylized version of the bar chart is in Figure 6.33.

Figure 6.33

Styled Bar Chart Comparing Self-Esteem Scores Before and After Sessions

Note. Error bars represent 95% confidence intervals.

Writing Up the Results

Remember the formula: Test + Interpretation of Results + (Summary of Stats)

For the test, we would write something like: “A repeated measures general linear model use used to perform a paired samples t-test.”

We woudl interpret the results with something like: “The effect of weekly sessions resulted in an average increase in self-esteem ratings.”

Finally, the statistical summary is going to follow the same format as the independent samples t-tes. That is, (F[\(df_{\textrm{effect}},df_{\textrm{error}}\)]=F-value,p = p-value; \(M_{\textrm{Group1}}\)=Mean, 95% CI [LL,UL], \(M_{\textrm{Group2}}\)=Mean, 95% CI [LL,UL]).

Let’s get the information regarding the F-test from the “Test of Within-Subjects” table. I’ve highlighted the needed information in FIgure 34.

Figure 6.34

Annotated Test of Within-Subjects Effects

Our F-test write up thus becomes (F[1,29]=680.414, p < .001).

Note that we do not write “p = .000” because the sampling distributions are asymptotic in the tails. That means that they stretch out to \(\inf\) and \(-\inf\). Although it may be very very unlikely, theoretically we could get any value. That means that no value can have a 100% or 0% chance of occuring. SPSS prints .000 but that is just rounding. What is more accurate is to write that the p-value is less than some number. I suggest rounding the last zero up to 1.

We’ll need to consult our estimated marginal means table to get the rest of the information. Figure 6.32 is reproduced for your convenience here.

Figure 6.32

Estimated Marginal Means and Confidence Intervals

The write-up for this portions would be as follows: “\(M_{\textrm{Before}}\)=34.833, 95% CI [33.510,36.157]; \(M_{\textrm{After}}\)=29.733, 95% CI [38.309,41.158]”

When we put everything together, our write-up becomes:

“A repeated measures general linear model use used to perform a paired samples t-test. The effect of weekly sessions resulted in an average increase in self-esteem ratings (F[1,29]=680.414, p < .001; \(M_{\textrm{Before}}\)=34.833, 95% CI [33.510,36.157]; \(M_{\textrm{After}}\)=29.733, 95% CI [38.309,41.158]).”

Summary

In this lesson, we’ve:

Compared the paired samples t-test to independent samples t-test
Related the within-subjects design to the paired samples t-test
Explained why the paired samples t-test has more statistical power than the independent samples t-test
Stated the assumption for the paired samples t-test
Set up the repeated measures GLM for a paired samples t-test
Interpreted the output from the repeated measures GLM
Presented the results of the GLM in APA style

In the next lesson, we’ll expand from comparing two groups to comparing more than two groups (e.g., more than two levels for an IV).