This chapter will help you to:
Humans like patterns. We will often impose patterns when they are not there. Paeidolia is the phenomenology of Gestalt psychology. That is, the experience of seeing faces in our toast is explained by the perceptual properties outlined by the gestalt psychologists. Remember that “the whole is other than the sum of the parts.” See figure 3.1 for an example.
Figure 3.1
Pareidolia Toast
What does seeing religious figures in our buttered toast have to do with statistics? Everything! Let me explain. When we receive information from our experiences, it is often an under-sampling of reality (i.e., not the whole picture). Furthermore, we like to put our own interpretation on that information when forming a conclusion. As such, as we summarize the information we receive from the world, we often form biased conclusions. People (statisticians included), are not very good at accurately summarizing reality.
What we need is a framework to help us consolidate the raw information into something that is both accurate and helpful in guiding our understanding. The “descriptive” branch of Statistics does just that. It provides us with a set of tools on how we can summarizing different kinds of distributions. These summarizations are crucial for the next step: inferential statistics.
Before we go on to the actual summaries, let’s review the scales of measurement. How we summarize a distribution is dictated, in part, by the scale of measurement for the variable.
Recall the four scales (ascending in amount of information contained)
How might we summarize the values in each of those scales of measurement? How do those summaries differ in the information they provide? Let’s walk through them.
In this scale of measurement, the values differ by kind. That is, the values are qualitatively different. The difference can’t be expressed by change in the amount of some quantity. We could imagine that we had a bag of Reese’s pieces (I imagine this often…) and we want to summarize information about the color of the pieces. I think, intuitively, we would want to sort the pieces into piles based on their colors. We could then tell others about how many different colors there are. While writing this, I realized that others may not share my exuberant liking of Reese’s products. For those in that camp, please see Figure 3.2 then procedure to your nearest food retailer and treat yourself to some chocolate, peanut butter joy.
Figure 3.2
Reese’s Pieces
Figure 3.2 indicates that we have 3 possible values for the color of a Reese’s Piece. That is a start but it doesn’t tell us a lot about the distribution of those values. That is, it doesn’t tell us much about what the composition of the bags looks like. What we need is a frequency distribution. A frequency distribution is a summary of how many of each possible value is present in a sample. For our example, we would report how many of the pieces from our bag were yellow, how many were brown, and how many were orange.
Frequency Distribution: a collection of value-count pairs that summarizes how often each value of a variable appears in a sample.
Sad story: I do not currently have any Reese’s Pieces to count. However, because we live in a very interesting time in human history, I was able to watch a YouTube video in which a young man and a young woman counted the number of Reese’s Pieces in an 8 oz bag. I skipped to the end to find that they counted 305 in total but they did not break down the number of each color. Although my search for information about the proportion of each color was unsuccessful, we can use the sample provided in Figure 3.2 to determine our counts. I count 8 yellow, 8 brown, and 15 orange candies. We would summarize this frequency distribution in a table like in Table 3.1
Table 3.1
Frequency Distribution of Reese’s Pieces Colors
Color | Count |
---|---|
Yellow | 8 |
Brown | 8 |
Orange | 15 |
Note: Dr. Weaver likes all the colors of Reese’s Pieces as they all taste the same.
We’ve summarized information from 31 Reese’s pieces down to 3 numbers. What if someone really pressed you for just one number; the best summarizing number? You should tell them to shove off because they are ignoring releveant information about the data. However, if someone asked you for the frequency distribution AND one summarizing number, you could offer the mode. The mode is the most frequently occurring variable value in the sample.
Mode: a summarizing statistic that represents the most commonly (i.e., highest count) occurring variable value in a sample.
The mode is helpful because it offers us a “best guess” about what we are likely to select out of our bag of Reese’s Pieces. That is because, in frequentist approaches, higher frequencies = higher probabilities. In our example, the modal value is “orange” so we would suspect that we would be most likely to randomly select an orange Reese’s Pieces from the bag.
To summarize our summarizing of data assessed on a nominal scale, we can count the number of observations that match the different possible variable values and we can refer to the modal value of that frequency distribution.
The ordinal scale is familiar to psychologists. It is more informative than the nominal scale and less informative than interval or ratio. If psychologists were on a dating site looking for the perfect scale of measurement, they would really like to be matched up with the ratio scale but those scales are too involved with physics it seems. Those physicists are the ones defining time, weight, and other variables that work so well with ratio scales. They even defined “absolute zero” for temperature!
Psychologists would settle for some attention from interval scales, really. They have that nice structure in their values. We like a scale that has a strong sense of values. Every once in a while, psychologists get time with ratio scales (i.e., reaction time) or interval scales (i.e., standardized test scores) but we are often stuck in that familiar dance with ordinal data.
The ordinal scale of measurement is marked by values that indicate more or less of some concept without providing the exact amount of change between values. Imagine if you were only given the placements of those who were in a race. Pepper O’Knee came in first, Sal Ami came in second, and Beau Loney came in third. Unfortunately, we don’t really know how much faster Frank is than Sal or Sal is than Beau. We don’t even know how long it took anyone to finish the race.
Cured meat puns aside, psychology is riddled with these measurement issues. I’m an emotions researcher and I have no idea how to know how happy someone is other than asking them to indicate it on some arbitrary scale. Without fixed intervals on this happiness scale, I don’t know how much more happy someone is when they rate themselves as a 7 versus a 9. Furthermore, without an absolute zero on the happiness scale, I can’t really compare individuals. Is all lost? should psychologist just stay off ScaleMatch.com? All is not lost, we can still make comparisons across groups and we can still try to anchor individuals on these ordinal scales. We can still summarize the patterns on these scales.
We can summarize with a frequency distribution and we can offer the mode. However, because we do have order information, we can incorporate the rank of the values into our summary. In general, we can summarize any number of percentiles. A percentile is the value at which a certain proportion of observations are below that value. For example, I could report that a happiness score of 8 is the 75 percentile. That is, 75% of happiness score are below 8. If you recall from previous classes, we have a summary statistic that we offer as a central marker of ordinal data. The median is the value at which 50% of observations fall below.
Median: the value at which 50% of observations are below.
We can and should include more summarizing information about ordinal data distributions. The median is our measure of central tendency, or the value which describes the center of the distribution, but that leaves out information about the values that are not at the center.
Central Tendency: values that indicate the center, or bulk, of the data in a distribution. Options for measures of central tendency include the mode, median, and mean.
We need something to summarize how different or spread out the values are and we have some options. The most basic measure of dispersion is the range. The range is the span between the largest value and the smallest value in the data set.
Dispersion: values that indicate spread or variability in the data.
Range: the difference of the largest and the smallest values in the data set.
Not bad, but we can do better! Rather than just incorporate the most extreme values, we can provide an more nuanced summary by looking at the middle 50% of the data. The interquartile range or IQR is the span from the 75th percentile to the 25th percentile. Whereas the range can be very large because of unusual data (for the distirbution, that is), the IQR nicely describes how different the bulk of the data are.
Interquartile Range (IQR): The difference of the 75th to the 25th percentiles.
The other advantage of the IQR is that, regardless of the shape of the distribution, it is still a useful summary of dispersion. That is, even when the data are skewed because of outliers (see Figure 3.3 for an example)
Figure 3.3
Income Distribution in the U.S.
The summary measures associated with ordinal data are informative and useful, especially for skewed distributions. We can provide a frequency distribution, report percentiles (especially the median for central tendency), and the interquartile range (for dispersion).
Quick recap. Psychologists want to work with interval and ratio data (and Pepper O’Knee is the fastest on the charcuterie board) because they have fixed intervals. The main reason that this is a positive characteristic is because of what we can say about the data. When someone scores a 5 on a math test and someone else scores a 7, we know that difference is the same difference as when one person scores a 9 and the other scores an 11. Taking it further, if someone scored a 5 and the other scored a 10, we can claim that student #2 earned twice as many points as student #1. We could not have really said this if we were assessing happiness scores. There is still the problem of whether we are actually assessing math ability or skill with our math test but this chapter isn’t about validity.
With the extra information in each scale, we can utilize some more arithmetic. Namely, we can meaningfully add and subtract values with interval data and we can multiply and divide with ratio data. These operations allow us to summarize the central tendency with the mean and dispersion with variance and standard deviation. The mean is the arithmetic mean, which is calculated by summing the values before dividing by the number of values. We really like to calculate the mean, when appropriate, because it includes all values in its calculation. That means (really not an attempt at a dad joke) it is the most information-rich measure of central tendency we can calculate. For a perfectly normally distributed sample, the mean is equal to the median and is equal to the median.
Mean: typically the arithmetic mean in which the values are summed before being divided by the number of observations.
Variance is a similarly rich summary of dispersion. Recall that the range and the IQR only used two values from the sample. Variance tells how different a value is from the mean, on average. The calculation involves subtracting each value from the mean, squaring the difference, adding up all the differences, then dividing the sum by the number of values. The problem with variance is that the units are now squared, which is weird. Squared degrees Celsius? Squared Thousands of Dollars per year? To get around this, we calculate standard deviation by simply taking the square root of variance. The general interpretation is the same: the larger the value, the less similar the scores (i.e., more spread out).
Variance: a summary of how different the values are from the mean, on average.
Standard Deviation: the square root of variance (to return dispersion to normal, unsquared units).
Table 3.2 compiles this information succinctly.
Table 3.2
Most informative Summary Statistics by Scale of Measurement
Measurement | Central Tendency | Dispersion |
---|---|---|
Nominal | Mode | Range |
Ordinal | Median | IQR |
Interval | Mean | Std. Dev. |
Ratio | Mean | Std. Dev |
In the service of our ultimate goal of understanding patterns in data, it is helpful to offer summaries of data. The summaries we can utilize may be dependent on the scale of measurement of our data. In general, we want to offer some value that describes the bulk / center of the data and some value that describes the spread of the data. We can actually describe the data a few more ways.
Whereas the mean tells us where the data are centered on the number line and variance (or standard deviation) tells us how spread ou the values are about the mean, we can also describe the shape of the distribution. Skewness refers to the unbalance of the distribution. Figure 3.4 presents positively skewed, symmetrical, and negatively skewed distributions.
Figure 3.4
Skewed and Symmetrical Distributions
Skew summarizes the amount and direction of the unbalance by changing the formula for variance slightly. Whereas variance finds the average squared distance (i.e., all values are positive) of each point from the mean, Skew finds the average cubed difference. This leaves the negative values negative and the positive values positive before adding them up. As such, if we have a negative value, there is more distance between the points below the mean than those above. For a positive skew value, there distance between the mean and those above the mean is greater than below the mean. If the skew value is zero, the distances are perfectly balanced. In general, we consider a distribution to be “normal” enough if skew is between -3 and +3.
Skew: summarizes the amount of imbalance in a distribution by adding the cubed differences between points and the mean. A general acceptable range of skewness is -3 to +3.
The other summary for the shape of a distribution is kurtosis. Kurtosis refers to the “tailedness” of a distribution. If you recall the empirical rule, we have certain assumptions about the proportion of observations that should fall in different ranges of our normal distribution. Kurtosis is a measure of deviation from that expectation. Kurtosis is another “moment” of the distribution, the fourth moment. It is calculated by comparing summing the distances between observations and the mean to the fourth power before dividing by the squared variance. All this really means is that this formula will enhance the potential outliers (i.e., values greater than a few standard deviations from the mean). There are three classifications of distributions according to kurtosis: mesokurtic, leptokurtic, and platykurtic. These roughly align to no kurtosis, positive kurtosis (heavy tails), and negative kurtosis (thin tails). The acceptable ranges for assuming normality for kurtosis is between -8 and +8.
Kurtosis: summarizes the tailedness of a distribution. A general acceptable range of kurtosis is -8 to +8.
Another approach to describing the shape of the data is to identify key “mile markers” in the data. That is, we can list the values below which certain proportions of the data are found (i.e., percentiles). We can pick any that we wish but the standard are the quartiles. If percentiles are found by dividing the distribution into 100, quartiles divide the distribution into quarters or fourths. As such, the first quartile (Q1) is the value at which 25% of observations fall below, Q2 is the median (50% below), Q3 has 75% below and Q4 is the maximum (100% below).
We can put these into a nice five number summary by including the minimum with the quartiles. Such a summary is presented in Table 3.3 below.
Table 3.3
Five Number Summary
Statistic | Min | Q1 | Median | Q3 | Max |
---|---|---|---|---|---|
Value | -3.24 | -0.7 | -0.06 | 0.65 | 3.63 |
There are variations on this theme, but generally we want to provide useful summaries regarding the center, spread, and shape of a distribution. Table 3.4 contains more of the typical parametric summaries. By parametric, I am referring to values related to the mean.
Table 3.4
Parametric Summaries
Statistic | Min | Mean | SD | Max | Skewness | Kurtosis |
---|---|---|---|---|---|---|
Value | -3.24 | -0.03 | 1 | 3.63 | 0.02 | -0.02 |
So far, we’ve presented our summaries in tables. Tables are great for presenting the values of your statistics. Charts and figure are nice for comparisons and quick judgments. We’ll use different figures to present the values and to make judgements regarding the normality of our distributions. For now, we’ll discuss the following:
Bar charts are ubiquitous because they do help to clearly communicate the comparison of groups. An example bar chart can be found in Figure 3.5 below.
Figure 3.5
An Excellent Bar Chart Example
Figure 3.5 is basic in that it only provides information about how often each rating showed up in the sample. We can get more information in a bar chart by including more information about samples. Figure 3.6 includes means and 95% confidence intervals for each sample being compared.
Figure 3.6
Comparing Samples with a Bar Chart
Wow, can you believe that my imaginary sample reported “Reese’s Pieces” to be the most enjoyable?! I can, because they are delicious! What is really nice about these kind of bar charts is that you can make some quick inferences about reliable differences between samples. Because the 95% confidence error bars do not overlap for the enjoyment ratings of licorice and Reese’s Pieces, we can claim that there is less than a 5% chance that our enjoyment ratings came from the same population. We didn’t even need a t-test for that. Nice!
Histograms are like bar charts but require special data and can only display certain information. Histograms are for continuous data and the bars represent frequency. Histograms are useful for judging the shape (e.g., normality) of a distribution. Figure 3.7 is a histogram.
Figure 3.7
A Boring Histogram
Clearly this sample did not come from my class as I’m sure that all of my students would indicate 0! Fantasies aside, we really use histograms to quickly judge if distribution is roughly normal or is skewed to the left or right.
Box plots are another way to represent information about a distirbution but it summarizes the quartiles rather than the counts of different ranges of variable values. Figure 3.8 demonstrates how boxplots can be used to compare sample distributions.
Figure 3.8
Box Plots to Compare Samples
The middle horizontal line of the box plot represents the median (Q2 or 50th percentile). The bottom horizontal line is the first quartile and the top horizontal line is the third quartile. The “whiskers” are the vertical lines. They extend to 1.5 times the interquartile range (IQR) above the 3rd quartile and to 1.5 times the IQR below the 1st quartile. Lastly, the dots represent outliers that are beyond the 1.5 IQR cutoff.
Should you have two continuous variables, you can check the direction and strength (i.e., judge correlation) by producing a scatter plot. Scatter plots have the values of one variable along the x-axis and the other variable’s values along the y-axis. A point is then plotted for each entity/individual at the intersection of their values for each variable. See Figure 3.9 for an example.
Figure 3.9
Scatter Plot Example
It can be difficult to judge reliable relationships from scatter plots alone, so I would recommend testing it formally through testing the correlation coefficient. Scatter plots are important because we are often interested in how two continuous variables are related. Unfortunately, correlation is outside of the scope of this course. If you would like to learn more about correlation and how to examine correlation in SPSS, you can check out this website:
One of the major goals of this course is to learn how to implement APA style when presenting statistical results. As the semester progresses, we will work on specific components required by different analyses, but I wanted to start with some general principles the guide APA writing and the creation of figures and tables.
The most important quality of scientific writing is clarity. To write clearly is to write in an unambiguous and complete manner. A reader should be able to read and understand your writing, with the appropriate background information. That means, in part, that you need to have an idea of what your readers may already know before they start to read your writing but it also means that you need to make clear the assumptions that you have. This is one of the reasons why we psychology professors are always harping on students about citations. It is more than just following the “rules” of APA writing, it is about providing resources for your reader should they need more information.
Here is an example of an unclear write-up:
“The analyses revealed that music impacted retention.”
Here is a clear revision of that write-up:
“The independent samples t-test revealed that those who listened to instrumental classical music retained 70% more information after a one hour retention period on the free-recall task than those who listened to lyrical pop music (t[298] = 5.32, p < .001, d = 1.3).”
The second example is more clear because it contains necessary information about which test was performed, the levels of the IV, details about methodology, how the DV was assessed, and supporting statistical information.
The same concept of clarity can be applied to tables. To have a clear table, you must ensure that
For tables, each column should have an identifying label for the values that are contained below. Each row should label the grouping of the values (e.g., by participant, by level of IV, etc.). If there are any abbreviations or special information, they should be fully explained in the Notes section below the table.
Finally, the table itself needs to be labeled. This has been demonstrated throughout this OER. Each table is preceded by a bold “Table #”, in which the # is updated sequentially for each new table. Separated by a blank line, the next label is the actual title of the table printed in capital case (i.e., each non-functional word is capitalized) and italicized. Each table should be uniquely identifiable by the table number and table title. The title should be a concise description of what the table contains.
I’ll start with a bad example.
Table
Statistics
Mean | SD | n | |
---|---|---|---|
Males | 85 | 2.1 | 57 |
Females | 91 | 2.8 | 60 |
Why is this not a “clear” table? Let’s start at the top!
Below is a corrected version.
Table 1
Descriptive Statistics for Friendliness Ratings by Sex
Sex | Mean | SD | n |
---|---|---|---|
Males | 85 | 2.1 | 57 |
Females | 91 | 2.8 | 60 |
This is still not a “good” table, but it has the labels worked out. Let’s move on to fixing the “readability” issues.
If any group of professionals should understand what makes something easy to ready, it is psychologists. Part of the ease of processing written words on a page has to do with the gestalt principles of grouping, especially figure-ground. This is not a cognitive or sensation & perception class so I’ll just move on to telling you how to make this table easier to read.
First, we need to take away the clutter of all the borders. We need to make the text pop by giving it some surrounding empty space. In APA style, we do that by remove all vertical (i.e., | ) lines. We will also remove all the horizontal (i.e., - ) lines from the body of the table, where the values are.
Table 1
Descriptive Statistics for Friendliness Ratings by Sex
Sex | Mean | SD | n |
---|---|---|---|
Males | 85 | 2.1 | 57 |
Females | 91 | 2.8 | 60 |
Wow! That still looks terrible, but less terrible than before. Now we need to increase the distance between the values to help them stand out.
Table 1
Descriptive Statistics for Friendliness Ratings by Sex
Sex | Mean | SD | n |
---|---|---|---|
Males | 85 | 2.1 | 57 |
Females | 91 | 2.8 | 60 |
A little breathing room does a lot for the readability for this table. Lastly, we need to address the colors to enhance contrast.
I like color but it can get sometimes be counter to the overarching goal of clarity. Color should serve a purpose of clarifying information; it should not be used for artistic styling of scientific tables and figures. If you do use color, it should have a color-blind friendly palette and should have high contrast.
In our example, the yellow background and purple text do not serve to clarify content. Futhermore, the contrast is too low. The easy an appropriate solution is to remove all the color from the table. That is, the background should be white and the text should be black.
Here is our final table.
Table 1
Descriptive Statistics for Friendliness Ratings by Sex
Sex | Mean | SD | n |
---|---|---|---|
Males | 85 | 2.1 | 57 |
Females | 91 | 2.8 | 60 |
We’ll actually follow the same rules for creating clear figures.
I know the saying (“a picture is worth a thousand words”), but we need actual words in our figures, too. We need a label for each axis.
Figure 3.10
Bad Figure
This will usually be the variable being represented on the axis. Sometimes, however, the y-axis will represent frequency or count in bar charts and histograms. If you have a legend or key, you’ll want to be sure that you have a label for that and what the pattern or colors represent.
What you do no want to have as part of your figure is a figure title. Per APA style guidelines, the title of the figure will be included in the text of your document, not the figure itself.
Figure 3.11 has some better labels.
Figure 3.11
Better Figure with Labels
Easy to read for a figure has several meanings. First, one should be able to actually read the words and values presented in the figure. That means that the font should be a basic font (a sans-serif) and should have an appropriate size. SPSS actually does a good job with the default font and size for readability.
The other interpretation of easy to read is “easy to understand.” That really boils down to simplicity. One should avoid overly busy figures. Think if two (although more) is easier for the reader to understand than one complex chart. Figure 3.12 is an example of a “too busy” chart.
Figure 3.12
Too Busy Figure
This figure has two y-axes, which means that the reader will have to decide what value corresponds to an element in the chart body. Although the chart is color coded, that is more information that the reader has to decipher. In general DO NOT USE TWO Y-AXES.
Another suggestion to make the figure easier to read is to utilize “negative space.” Just as we did in tables, we should remove extra lines that clutter the figure. We should, then, turn off any grid lines.
Colors should be used to convey information, not for aesthetics only. As such, you shouldn’t have colored backgrounds and bars, lines, and dots should only have color when they color indicates meaningful groupings.
By following the rules for easy reading (i.e., color, negative space), we may not need to make any changes for having “high contrast.” To be sure, however, you should have a white background and a medium to dark color for lines, bars, or points. You should ensure that if the chart was printed in grayscale, a reader would be able to gain the same information as if it were in color.
Figure 3.13 is a fully corrected figure.
Figure 3.13
Corrected Figure
Writing concisely does not mean writing briefly at the expense of clarity. Clarity should take priority over being concise. In most cases, however, you do not need to choose just one. This may be one of the few arenas in life in which you get to eat your cake and have it, too! In fact, cake recipes are a great example of how one should be clear and concise. You want a cake recipe that is clear in what ingredients to use, how much of each, and the procedure used to combine and bake those ingredients. As you’re working on your cake, you don’t want to read a short novel about the recipe writer’s experience in baking that cake. Short and sweet should describe their recipe and the cake!
In scientific writing, you should write first for clarity and then revise to be concise. Most of writing involves revision. Ideas evolve and are molded through drafting into something better. To avoid being blinded by self-serving bias (e.g., “this is the best thing ever written!”) or being heavily self-critical (e.g., “maybe I should just stick to reading”), I encourage you to have a friend read over your work. They can point out what they don’t understand or if you have some really awkward, long sentences that seem like the kind the founding fathers wrote in the Declaration of Independence with twenty-eight clauses, which seem so unrelated that it is hard to follow how the sentence is one connected thought. Those can easily sneak by us. We want to avoid such long sentences and overly verbose writing, in general, because the end result of writing is for the reader. We should want to make it easy for the reader to understand our writing. We should want the reader to learn the material as quickly as possible. Writing clearly and concisely helps to accomplish these goals.
What is true for writing is true for tables and figures. We should aim for clarity first, then ensure that we deliver that clarity in a concise manner. We can do this by removing redundant or unnecessary information. We can also consolidate information into common notes below the table or figure.
The following is not a concise table.
Table 2
Correlation Matrix for Selected Variables
Variable | Height | Weight | IQ | Income |
---|---|---|---|---|
Height r | 1.00 | .873 | .020 | .218 |
Height p | .000 | .001 | .599 | .031 |
Height n | 245 | 245 | 245 | 245 |
Weight r | .873 | 1.00 | .015 | -.003 |
Weight p | .001 | .000 | .738 | .119 |
Weight n | 245 | 245 | 245 | 245 |
IQ r | .020 | .015 | 1.00 | .427 |
IQ p | .599 | .738 | .000 | .013 |
IQ n | 245 | 245 | 245 | 245 |
Income r | .218 | -.003 | .427 | 1.00 |
Income p | .031 | .119 | .013 | .000 |
Income n | 245 | 245 | 245 | 245 |
The table seems like it is filled to the brim with useful information. There is a lot of information but some of it is not very useful and there is a lot of repeated information. We see an r value of 1.00 several times. This is an artifact of correlating a value with itself. This is not useful and we should remove the self-correlation. The sample size (n) is useful but it does not need to be repeated for each correlation. The p-value is useful for null hypothesis significance testing. As such, we can focus on the dichotomous decision of rejecting or failing to reject while leaving the specific values out of the table. Perhaps the biggest reduction in redundant information will occur when we remove half of the remaining table. Notice how the values are mirrored across the diagonal of the table that runs from top-left to bottom-right. This is because correlating height with IQ is the same as correlating IQ with height, and so forth for all variables. We don’t need to see each piece of information twice.
Here is the table again, with some revisions for conciseness.
Table 2
Correlation Matrix for Selected Variables
Variable | Height | Weight | IQ | Income |
---|---|---|---|---|
Height | – | .873 /* | .020 | .218 /* |
Weight | – | .015 | -.003 | |
IQ | – | .427 /* | ||
Income | – |
Note. Values represent Pearson product-moment correlation coefficients. N = 245. * p < .05
That makes a lot of difference. Whereas we had to skim through a host of numbers before deciding which contributed new information and what each meant, we now can easily discern the important values and their meaning. It is true that we added some notes to the bottom of the table that could increase read time but those notes are also concise and apply to the table as a whole.
Being concise in a figure is a little easier because charts are already summaries of tables. The danger with charts is the want to add in extra information because we can. For example, we may want to add different colors to our bars in a bar chart to add further separate them. This does not add any meaningful information to the chart, however. We can have different colored bars if the colors represent groups (as what happens in an interaction plot). You may see some graphs where the value for the height of a bar is printed on top of the bar in a bar chart. This is redundant with the information provided by the height of the bar and should not be included. Don’t get me wrong, complex charts and figures can be fine. We can have an information-dense figure that is still clear and concise. I think a good question to entertain, however, is if two figures would better serve being clear and concise than one which is more dense.
In general, a figure should present information only once and should not compete with other information.
Tables are part of the output of the analyses you perform and will be placed in the output window. In some analyses, the table is the result. In other analyses, different tables can be requested as your set up the procedure. In others still, you have to piece together several tables into one. If there is a generalizable rule for creating tables in SPSS, it is this: content then analysis. That is, you should think about what content you need for your table then decide which analysis will get you what you want. You may find that there are several ways to get what you want. For example, if you want the means and standard deviations of a set of variables, you can go through the “descriptive statistics” menu to either the “descriptives” or the “frequencies” options. You could also get the information form the “compare means” or “general linear model” menus. The best way to decide which way to go is probably to consider what other analyses you also need and the stage of your analytical procedure. If you are still checking assumptions and need information about the distributions, I would suggest the “frequencies” submenu of the “descriptive statistics” menu. If you are ready to test a model, use the “general linear model” procedure to request the information. We’ll cover these procedures in more detail later.
Let’s practice with a useful table of descriptive statistics. Download TableFigureExamples.sav. Double-click the downloaded file to open in SPSS. Remember that the SAV file contains the data and variable information so you won’t have to set anything up.
We’ll ask SPSS to give us minimum, maximum, quartiles, mean and standard deviation of weeklyExercise and weeklyCalories. The best menu to get all of this information is the “Frequencies” submenu of the “Descriptive Statistics” menu under “Analyze”.
Figure 3.14
Frequencies Submenu
In the main “Frequencies” menu, you’ll need to drag the variables you want to analyze from the box on the left to the box on the right. You can then tell SPSS which values you’d like by clicking on the “Statistics” button on the right-hand side.
Figure 3.15
Frequencies Main Window
You’ll see a few groupings for statistics (see Figure 3.16) including “percentile values”, “dispersion”, “central tendency,” and “characterize posterior distribution.” The last one gets it’s title from Bayesian statistical analyses but notice that it contains concepts we’re now familiar with (i.e., skewness and kurtosis).
For this example, let’s select quartiles, minimum, maximum, mean, and standard deviation as depicted in Figure 3.16. Click “Contiue” to return to the main Frequencies window.
Figure 3.16
Statistics Options in Frequencies Window
Now click “Paste” to generate the syntax. Navigate to the syntax editor if it did not open automatically (remember, you can get there by clicking on “Syntax” in the “Window” menu of the menu bar).
Highlight the recently generated code (see Figure 3.X) then click the big green run button in the toolbar. If the output window does not open automatically, navigate to it to see the new table. Figure 3.17 contains the original table from the output window.
Figure 3.17
Original Table from SPSS Output
Although SPSS does allow us to edit tables, it is often a frustrating experience with lackluster results. For a more fine-grained and controllable experience, we’ll move the table from SPSS to Microsoft Word. Formatting Tables in Word
Moving the table can occur in a few ways. The easiest, if it works, is to right-click on the table in the output and select “Copy”. You can then paste the table into word (e.g., CTRL + v or right-click then select “paste”).
If that does not work, you can right click on the table and select “Export”. This will open the “Export” window (see Figure 3.18). You’ll want to ensure that you are exporting as a “Word” document (select from the drop-down menu) and that you know where your file will be saved.
Figure 3.18
Export Window
In the file name box, click the “Browse” button to choose the folder and name your file. I suggest saving it to your “Documents” or “Desktop” if you don’t have a folder set up specifically for this class. You should always have a descriptive file name (e.g., “Worksheet1_APATable1”).
If it is a large table, you may also want to change some of the export options to help you table fit in a word document. To do so, click on the “Change Options” button that is directly above the “File Name” box and directly below the “Options” box. In the window that opens (see Figure 3.19), change the option under “Wide Pivot Tables” to “Do not adjust width”. Click “continue” to return to the main export window.
Figure 3.19
Export Output Options
When everything is set, click the “OK” button to export the table. Navigate to your folder and double click to open the document in word.
If you open the document and it is empty, you are likely on a Mac and you’ll need to use option 3.
I’m not sure where the breakdown occurs when SPSS produces a blank word file but one workaround seems to be exporting to Excel, then copying-and-pasting from Excel to Word. Follow the same export procedure as you did for Word (see above) but this time choose to export as an Excel file (*.xlsx). Navigate to the Excel file and double-click to open in Excel. Select the cells that make up the table and press CTRL + c on the keyboard (or choose Edit -> Copy from the menu bar) to copy the table. Navigate to Word and paste the table.
Once we have the table in word, we need to clear out the formatting from SPSS and ensure that we can clearly see the structure of the table. This will help us to style the table according to APA style guidelines.
To apply formatting changes to the whole table, we’ll need to select the full table. You can do this by hovering over the table with your mouse until the move icon appears in the upper-left corner of the table (see Figure 3.20). Click on this icon to select the full table.
Figure 3.20
Selecting Full Table
With the full table selected, we’ll want to 1) make all backgrounds white, 2) turn on all borders, and 3) make all text black.
To make all the backgrounds white, go to the ribbon at the top of the document where you see “Design” under “Table Tools.” Then click where it says “Shading” and select the white tile. This is shown in Figure 3.21.
Figure 3.21
Selecting White Background Color
Stay on the “Design” tab of the “Table Tools” ribbon but now click on the downward-pointing triangle of the “Borders” button to show the borders menu (See Figure 3.22). Select the “All Borders” option.
Figure 3.22
Turning on All Borders
We’re going to have to go to the “Home” tab in the ribbon to access the option to change font color. Click on the downward-facing arrow on the button with the letter “A” and red rectangle below it. Select the black color tile in the menu that opens (see Figure 3.23).
Figure 3.23
Selecting Black Text Color
Our table (see Figure 3.24) is now ready to convert to APA-styled.
Figure 3.24
Completed Unformatted Table
The table will require the following to fit APA guidelines
The only repeated information we have in our table is the sample size (N = 40). We’ll want to delete this from the table and include it in the text below the table as a note.
It is best to delete whole columns or rows in a table. If you try to delete just a cell, your table will become unbalanced (some rows will be wider than others or some columns will be longer than others). We can easily delete the row that contains information about the sample by clicking on the cell that has “N”, then navigating to the “Delete” button in the “Layout” table of the “Table Tools” ribbon (see Figure 3.25).
Figure 3.25
Delete Table Row
Wait…something weird happened. Figure 3.26 shows the table after deleting the “N” row. It seems that word only deleted the top part that corresponded to “Valid.”
Figure 3.26
Table after Row Deletion
Note. N = 40.
Split Cells
What will help to ensure that we are deleting what we intend to delete, I suggest that you split any merged cells. Merged cells are those that span several rows or columns (or both). Look again at our table and you’ll see that “Percentiles” is a merged cell as it spans three rows (those corresponding to the 25th, 50th, and 75th percentiles). To split that cell, click inside it then navigate to the “Layout” tab of the “Table Tools” ribbon an click on the “Split Cells” button (see Figure 3.27).
Figure 3.27
Split Cell Button in Layout Tab
In the window that opens (see Figure 3.28), split the cell so that the number of rows (or columns) is equal to that the cell original spanned. In our case, we want to split the cell back into 3 rows (and maintain the 1 column).
Figure 3.28
Split Cell Window
Click “OK” to split the cells and return to the table.
We’ll want to repeat this procedure for the rows that have the “Mean”, “Minimum”, “Maximum”, and the Empty cell to the left of “Minutes of Exercise per Week”. That is, we want to split each of these cells into 2 columns but maintain the 1 row. If the column does not line up with the rest of the cells (as in Figure 3.29), place your mouse over the border that is misaligned, when the drag cursor icon appears, click and slowly drag the border to align with the others. This should cause the border to snap in place. Now, when you move column, all the cells should move together.
Figure 3.29
Misaligned Columns
Figure 3.30 shows the table with split cells.
Figure 3.30
Table with Split Cells
Note. N = 40.
Let’s continue to move delete unwanted rows by removing the “Missing” row and the title row (i.e., “Statistics”). Figure 3.30 has all unnecessary rows removed.
Figure 3.30
Table after Deleting Rows
Note. N = 40.
Our table is partially labeled. The two variables have their labels at the top of the last two columns but we have two columns that are not labeled. We also have two rows that don’t seem to have labels (i.e., no text in the first cell of those rows). Before we can rectify this, we need to do some movement and clearing out.
First, let’s combine the first two columns by “cutting” the text for “Mean”, “Minimum”, and “Maximum” and pasting it into the cells to the right. Cutting is a combination of copying then deleting. It can be done easily by pressing CTRL + x on the keyboard after selecting text. Figure 3.31 shows the moved table row headers.
Figure 3.31
Moved Table Row Headers
Note. N = 40.
Now we can change the labels for our percentiles to more clearly reflect our goal of our table. Let’s replace the percentiles with their other terms. That is, 25 will now be “Q1” for the “First Quartile,” 50 will now be “Median,” and 75 will be “Q3.” Table 3.32 reflects these changes.
Figure 3.32
Renamed Row Headers
Note. N = 40.
With our row headers renamed, we can label the second column “Statistics” as the labels below indicate different summaries of our data. We should also delete the first column. You can see the changes depicted in Figure 3.33.
Figure 3.33
Table with Completed Labels
Note. N = 40.
With all of the content set, we need to work a little more on the presentation. We’ll actually need to undo one of our initial steps by turning off all the borders. Then, we’ll add a few back in. I know this seems ridiculous but after a few years of working with students on formatting these tables in Word, this seems to actually be the most efficient.
Select the whole table (click on the drag icon in the upper-left area of table), navigate to the “Design” tab of the “Table Tools” ribbon. Click the Borders button and select the “No Borders” option.
Now we can add in the APA-suggested borders (top, bottom, and bottom of header row). With the whole table still selected, click on the “Top Border” option in the border menu. Then click on the “Bottom Border” option. Your table should now look like Figure 3.34
Figure 3.34
Table with Top and Bottom Borders
Note. N = 40
The last border we need to add should separate the head of the table from the body of the table. That means we’ll need one below the column headers. Select the top row by clicking on one of the column headers, then navigate to the “layout” tab of the “Table Tools” ribbon and clicking on the “Select Row” option in the “Select” menu (see Figure 3.35)
Figure 3.35
Selecting Table Row
With the top row now selected, click on the “Design” tab of the “Table Tools” ribbon. Click on the borders button again to select “Bottom Border.” The table with completed borders is presented in Figure 3.36.
Figure 3.36
Bordered Table
Note. N = 40.
The last bit is a mix of making the table easier to read and aesthetics. We’ll want to stretch the table out to ensure that the values are easily to read.
Let’s stretch out the columns so that the row headers are only on one line. To do this easily, select the whole table then drag the column alignment tab to the right (see Figure 3.37)
Figure 3.37
Stretch Table Columns
Your table should look something like the table in Figure 3.38.
Figure 3.38
Table with Stretched Columns
Note. N = 40
The last thing we need to do is to fix the alignment of the values in the cells. Although there is no hard rule that numbers in the body of the table should be center-aligned and text should be left-aligned but it seems to be a convention. To change the alignment of any cell, click on the cell then click the “left”, “center”, or “right” align buttons in the center of the “Home” tab of the ribbon (see Figure 3.39). To spead up the process, you can select multiple cells at once by clicking and dragging across the cells you wish to include in the selection.
Figure 3.39
Align Text Buttons
After aligning the numerically valued cells in the body of the table, our finished table is presented in Figure 3.40.
Figure 3.40
Finished APA Table
Note. N = 40
It is a bit of a hassle to make nicely APA formatted tables from SPSS. You need to get the table from SPSS into Word where you can undue and then redo the formatting. You may need to delete rows, split rows, rearrange values, all before doing the basics of adding the appropriate borders. Although this requires ome effort, the resulting readability of the table is worthwhile.
Luckily for us, producing figures is much simpler and getting them into APA format requires much less effort.
In most instances, we will create charts directly through the “Chart builder.” Occassionaly, we will ask SPSS to produce some charts as a part of some analysis. To access the chart builder, go to the “Graphs” menu then click on the “Chart Builder” option (see Figure 3.41)
Figure 3.41
Accessing the Chart Builder
SPSS likes to “look out for you” by presenting a very annoying pop-up that asks you about variable properties (see Figure 3.42). Because we always set up our variables first, we can tell SPSS “Don’t show this dialog again” and click “OK”.
Figure 3.42
Variable Properties Pop-up
We will now proceed to the Chart Builder window. Figure 3.43 shows the default Chart Builder Window when you first access it in a new file. The chart builder, like most SPSS procedures, will remember what you’ve done last time you used the procedure. At any point, you can click the “Reset” button to go back to this default screen.
Figure 3.43
Default Chart Builder Window
To build a chart using the Chart Builder, you must first select a type of chart from the Gallery. We’ll create a bar chart so click on “Bar” from the list.
The area beside the list of types of charts will update to show variations of the bar chart. To choose one of these charts, double-click on the thumbnail. The Preview Area will update to fit that template and the Elements Properties side panel will open (see Figure 3.44)
Figure 3.44
Updated Preview Area for Bar Chart
Now that SPSS knows which chart we want, it will provide us with some guidance on how we can display our variables. You’ll see some boxes with dotted-line borders in the preview area. Figure 3.44 indicates three such boxes (i.e., “Y-Axis?”, “X-Axis?”, and "Filter?). Filter is a way to determine which observations will be included in the figure based upon the values of a variable. We won’t use this option. If we want this functionality, we’ll use other menus to select and filter the data.
We will, however, select a variable for the x-axis and another for the y-axis. Given that it is a bar chart, I would suggest using “Attending College?” as the variable on the x-axis. Drag that variable to the “X-Axis?” box. This will update the preview area. NOTE that the preview area is not accurate. It only shows you elements, not specifics, of your chart. If you got just one bar, like I did, don’t be too surprised. Next, drag “Minutes of Exercise” to the y-axis. It should update the y-axis to “Mean Minutes of Exercise…” We want the mean so this is fine. If you would rather have the median, Change the “Statistic” option in the elements properties side panel to “median.”
Your chart builder window should look something like Figure 3.45.
Figure 3.45
Updated Chart with Variables
When you’re ready to produce your figure, click the “Paste” button. Navigate to the syntax window where you’ll find a large chunk of syntax related to the chart builder. You’ll know it is for the chart because it will start with the comment “&ast Chart builder” (see figure 3.46).
Figure 3.46
Syntax for Chart Builder
Navigate to the output window to see the chart you’ve requested. Figure 3.47 presents the results.
Figure 3.47
Unedited Simple Bar Chart
Unlike the table we produced earlier, this bar chart isn’t too far off the mark. We will need to do just a few things to make it compliant.
Before we can do any of that, we need to get the chart into “edit mode” by double-clicking on the chart in the output window. The chart editor (see Figure 3.48) should open automatically.
Figure 3.48
Chart Editor Window
To remove the title, simply click on the title once then press the delete (i.e., “DEL”) key. Note, you need press the delete, not backspace key. If you keyboard does not have a delete key, you likely need to press the modifier key (e.g., FN or CMD) before pressing the backspace key. YOu can also just right-click and choose “Delete.”
Next, we can easily remove the grid lines by clicking on the “Hide Grid Lines” button in the toolbar (see Figure 3.49).
Figure 3.49
Hide Grid Lines Button
Our labels were already set in the variable view so our table is finished. Click the “X” button in the top-right corner to close the editor and return to the output viewer. I’m adding the appropriate APA styled figure number and title in this document.
Figure 1
Mean Minutes of Weekly Exercise by College Attendance Status
Making APA-styled figures in SPSS is a lot simpler than creating an APA-styled table. With a few clicks, SPSS can give you some drag-and-drop steps to produce nice looking figures. A few tweaks to remove the title and grid lines will get you 90% of the way. We’ll work on some more complex charts a long the way this semester, but the simple bar chart is a good start.
In this chapter, we’ve: