Science and Statistics

This chapter will help you to:

State the steps of the Scientific Method
Explain why Statistics do not yield Truth
Describe the goals of the two branches of Statistics
Compare scientific hypotheses to statistical hypotheses
Compare inferential statistical approaches
Explain how Statistics enhances Science

What is the goal of Science?

Science is the process of building knowledge with the goal of providing answers to questions about the universe and our experience therein.

How does Science accomplish this goal?

Science uses a systematic approach of asking questions, making observations, forming conclusions and testing interpretations. We call this the “scientific method.”

Figure 1.1

The Scientific Method

Scientific Method

Steps of the Scientific Method

Consult Theory
Determine Research Question
Form a hypothesis
Collect Data
Analyze Data
Interpret Results
Revise Theory

This method is cyclical and requires multiple iterations. That is, we are unable to give a full answer to any question through just one study or experiment. Why is this? Let us walk through a prototypical psychological experiment on memory.

Prototypical Memory Experiment

The researcher wants to know how different sensory qualities influence encoding into working memory. From her own experience, she believes that stimuli that are more intense (e.g., louder sounds, brighter lights, etc.), the more memorable the experience. To test her hypothesis that more intense stimuli will be encoded into working memory more easily and leave a longer trace than less intense stimuli, she recruits 100 introductory psychology students to participate in a study that has 4 conditions.

Table 1.1

Set up for prototypical experiment.

Condition	Stimuli	Outcome
Dim Image	15 Images presented at 50% brightness	Number of resentation until perfect recall
Bright Image	15 Images presented at 100% brightness	Number of resentation until perfect recall
Quiet Word	15 Words presented at 50% volume	Number of resentation until perfect recall
Loud Word	15 Images presented at 100% volume	Number of resentation until perfect recall

After running each participant through one condition each (i.e., each condition has 25 participants), the researcher reports that it took the fewest number of presentations for the quiet word group to have perfect recall of the 15 stimuli presented. Figure 1.2 summarizes the findings.

Figure 1.2

Summary of Results

Summary of Results

Although it is clear from this graph that the quiet word group required the fewest number of presentations to achieve perfect recall, it is only clear for that this was true for those in this experiment. What about those who were not in recruited for this study? Would they have the same result?

Science needs a companion

What this simple experiment demonstrates is the specificity of the results of any given experiment. That is, the results (and the conclusions based on the results) are only applicable to the individuals in that study. Although the methodology of each scientific study needs to be vetted for validity, we can claim the results to be true for that group of individuals in that study (if variables were assessed correctly, of course).

Validity: How well a measurment assesses what it is intended to assess.

We are often more concerned with the general truth (i.e., what is true across different groups). How can we go beyond any given sample in a study to have an idea about what happens in the population?

Population: The full collection of possible observations of interest.

Sample: A selection of observations from the population.

Statistics: taking science further

You can think of statistics as a mathematical toolbox scientists can use to better help them describe and model the phenomena they study. To understand how Statistics helps Science to achieve the goal of uncovering what may be happing in a larger population beyond a sample, we’ll need a breif overview of the two branches of Statistics that are used by scientists.

Two Branches of Statistics

Descriptive statistics are the set of procedures used to summarize the information about data that have been observed (e.g., sample data).

Inferential statistics are the set of procedures used to estimate data beyond that which has been observed (e.g., population data).

Descriptive Statistics

Descriptive statistics describe the data you actually have. By data, we mean the collection of observations you have. This typically refers to the sample but it is possible to apply descriptive statistics to a population if you have all possible observations for that population. This is unlikely. Instead, we will later discuss how we use inferential statistics to make guesses about the population.

What can we describe about our data? Generally speaking, we like to provide summaries of our data to help us understand patterns that may be present. Some summarization that we can provide are:

Frequencies: how many times a value appears
Central Tendency: a value that represents the bulk of the data
- Mode: most frequently occurring number.
- Median: value at which 50% of ordered observations are below.
- Mean: value at which observations below are balanced by observations above.
Variability: a value that represents the difference among observations.
- Range: The span (i.e., difference) between the minimum and maximum values.
- Interquartile Range: The span of the middle 50% of the observations.
- Variance: The average difference between each value and the mean
- Standard Deviation: The square root of variance

By providing these kinds of summaries of our data, we are really describing the distribution of the data.

Distribution: a description of how the observations are aligned with the possible values of a variable.

The shape of the distribution of data is something that we must always consider because certain shapes yield reliable patterns across different variables. You have likely heard of values being normally distributed for certain values. The description of “normal” means that the majority of the observations are in the center (i.e., mode = median = mean), that the the values below the center are balanced by those above the center, and that there are an expected amount of observations that fall between certain values. Figure 1.3 is an example of a normal distribution.

Figure 1.3

A normal distribution.

Normal Distribution

The Empirical Rule, so named because it can be checked for any normally distributed data, states that certain percentages of observations will fall into different ranges about the mean. See table 1.2 for a summary of the rule.

Table 1.2

Empirical rule

Standard Deviation Range	Percent of Observations
[ -1, 1 ]	68%
[ -2, 2 ]	95%
[ -3, 3 ]	99.7%

That is, we can expect almost all observations of a continuous variable to fall between 3 standard deviations below and 3 standard deviations above the mean. A value that is more than 3 standard deviations beyond the mean is possible but less likely. We might consider that value to be an outlier.

Outlier : A value that lies outside of an expected range of a variable and should be investigated further.

Standard Deviation in Review

Standard deviation is the square root of variance. We often work with the square root of variance because it make more sense to discuss variability in the same units as the variable (e.g., years, dollars, GPA , etc.). When we refer to a standard deviation, we are referring to a value that is the average distance of all values from the mean. We could just refer to the actual value (e.g., 1.5 years) but by referring to a standard deviation, we can more easily apply the Empirical Rule and related expected percentages.

Knowing how many observations we expect to find between (or beyond) any set of values for a normal distribution is the bedrock of generalizing beyond a sample.

Given our brief overview of descriptive statistics, we may now view Figure 1.1 (that depicts the results of the simple experiment) as a lacking in detail. It does represent the central tendency (i.e., mean) for the different conditions but it does not describe the variability for each group. Without a complete picture of the distributions of observations it is difficult for us to apply inferential techniques.

Inferential Statistics

If descriptive statistics are used to describe what is happening in a sample, inferential statistics are used to infer what is happening in a sample. By infer, I mean to make a guess based on available evidence. We estimate what may be occurring in the population given what we know about our sample.

Getting from sample to population with Sampling Distributions

Recall our dilemma with Science. We want to know what is generally true but we only know what is true for our sample. How can we get from our sample to the whole population? We need to appreciate that our sample was just one of many different possible samples. As such, the truth we uncovered may be different than what is true for other samples. If we could judge how different those other samples were from one another, we might be able to judge if our results were typical or atypical. Better yet, if we could compare all of the different possible samples, we might have a better picture of the population. We can gain some useful insights by consulting a sampling distribution.

Sampling distribution: the distribution of a sample characteristic (e.g., mean, median, variance, etc.) for a given sample size.

To create a sampling distribution, you would need to calculate a sample characteristic for each possible sample of size n for a given population. For example, we might calculate the mean height of each possible sample of 1000 SVSU graduates last year. Given that there were 1,749 graduates last year, we would need to calculate \(8.66 x 10^{516}\) means for the sampling distribution. Read those numbers again. For a population of N = \(1.749x10^3\), we would need to calculate \(8.66x10^{516}\) samples of size 1,000.

This seems ridiculous! Why would we calculate many more means than there are observations in the population? We wouldn’t and we don’t. Sampling distributions are theoretical constructs. We have only constructed them to extract general characteristics that can be applied over and over again. One really interesting characteristic of a sampling distribution is that the center (e.g., the mean) of the sampling distribution is equal to the population value. That is, the mean of all of the sample means is equal to the population mean. Sampling distributions are defined by more than their central tendency; we also need variability. The variability measure for sampling distributions is the same as any other normal distribution, standard deviation. We use a special term, however, to specifcy the relationship to the sampling distribution of sample means. We call it the standard error of the mean. The standard error of the mean is estimated by dividing the sample standard deviation by the square root of the sample size ( \(\frac{s}{\sqrt{n}}\) )

Standard Error of the Mean: the standard deviation of a sampling distribution of sample means estimated by ( \(\frac{s}{\sqrt{n}}\) ).

There are many many versions of sampling distributions but a few common are the z-distribution, t-distributions, F-distributions, \(\chi^2\)-distributons (often pronounced “kie squared” by non-greek speaking folks like myself), etc. We use each under different circumstance, which we will cover in later chapters. For now, know that we consult these distributions we when want to judge the probability of getting some sample value under certain assumptions about the population. Another important trait of these sampling distributions is that the shape stays the same even when we changed the assumed center (e.g., population value).

Let’s review for where we take the next step in explaining how Statistics enhances Science. A study gives a result for just one sample but we want to know about the more general population. How can we know if what we found has any reliability?

Reliability: how well a result can be replicated across samples.

Sampling distributions can give us an idea of other possible results from samples of the same size and how likely different results may be. This only works if we have an expectation for the the true result from the population. What should we expect? There are two perspectives that both lead to the same conclusions for inferential statistics: Null Hypothesis Significance Testing (NHST) and Confidence Intervals (CI)

Null Hypothesis Significance Testing (NHST)

In science, we tend to follow René Descartes who argued that one should start in a place of doubt and look for evidence to overturn that doubt. In statistics, we start with the null hypothesis and look for evidence to suggest that we should reject it. That is, you would expect that there is no effect of a treatment, no relationship between variables, zilch in the population. The other possible hypothesis is the alternative hypothesis, which states that there is some effect or a relationship in the population.

Null Hypothesis: The default assumption used in Null Hypothesis Significance Testing (NHST) that states there is no difference, effect, or relationship in the population.

Alternative Hypothesis: The hypothesis that is endorsed when the null hypothesis is rejected. It states that there is an effect in the popualtion.

Comparing Hypotheses

It is important to note the discrepancy between the statistical hypotheses and a scientific hypothesis. Whereas a scientific hypothesis is a specific expectation of outcomes, the null and alternative hypotheses are general and binary. That is, in statistics, our hypotheses are either that nothing is happening or that something is happening in the population.

If we assume that our population mean height of graduating SVSU seniors is 5 feet 9 inches but we get a mean result of 5 feet 2 inches from our sample, we might want to make claim that our sample did not come from that hypothetical population. That is, we might want to claim that our null hypothesis that the SVSU graduating senior mean height is 5’9". What if we just happened to have sampled from shorter individuals? We could be in error. Table 1.3 outlines the possible decisions we can make regarding the null hypothesis and the possible outcomes.

Table 1.3

Confusion Matrix for NHST.

		Decision Regarding H₀
		Reject	Maintain
True State of H₀	False	Correct Decision (Statistical Power: 1-β)	False Negative (Type II Error: β)
True State of H₀	True	False Positive (Type I Error: α)	Correct Decision

Did we conclude correctly that our sample is somehow different than the rest or is our sample just one of the many possible? We will never really know. That stings a bit. We can, however, try to manage our risk. We will want to minimize the risk of being wrong and maximize the chances of being right. How do we do this?

Focusing on Rejecting \(H_0\)

As indicated in table 1.3, we have only two decisions regarding \(H_0\): reject or maintain. We can be wrong in either case but we try to focus on the advice of Descartes. We should only claim something when there is sufficient evidence. In Null Hypothesis Significance Testing, we claim our results to be statistically significant when we believe our results to be particularly unlikely under the assumption of the null hypothesis.

Statistical Significance: a claim that a result is unlikely to due to random sampling error but reflects some effect in the population.

How unlikely is unlikely enough? That depends on your choice before you start your analyses but psychology has traditionally used α = 5% as an acceptable value. This 5% represents the probability of getting our result or more extreme (either larger or smaller) under the null hypothesis assumption. We judge the probability by referencing our sampling distribution. Most likely, you will look up or be given the probability value (better known as a sig. value or p-value).

Sig. Value or p-value: The probability of obtaining a value or more extreme under the assumption of the null hypothesis.

What the p-value really represents is the probability of making a Type I or False Positive error rate. When we set the acceptable error rate to 5% we are stating that our decision to reject the null hypothesis will have less than a 1 in 20 chance of actually coming from a population of results in which the null hypothesis is actually true. More simply stated: we are okay claiming to have found an effect that has a 5% (or less) chance of being wrong.

These probabilities are determined by the shape of the sampling distribution, which is determined by the center point (the hypothesized population value) and the spread (the standard error of the mean). Because the standard error of the mean is estimated by dividing the the sample standard deviation by the square root of the sample size, our sampling distribution will be tighter when we have larger and less variable samples. The real implication of this information is that we can detect smaller differences from the null hypothesized value with larger, less variable samples. This results in higher statistical power.

Statistical Power: the probability of rejecting the \(H_0\) when it is actually false (e.g., 1 - β)

Summarizing Null Hypothesis Significance Testing Goal:

To determine if the results of a sample reflect a true effect or are just from sampling error.

Procedure:

Compare results to sampling distribution that assumes null hypothesis is true.

Decision:

If the results have less than a 5% chance of coming from the null hypothesis distribution (i.e., p < .05), reject \(H_0\) and claim statistical significance

Confidence Intervals

Another approach to determining the generalizability of our findings is to calculate Confidence Intervals. Confidence intervals allow us to utilize the sampling distribution to generate lower and upper limits of possible population estimates for a given sample size.

Confidence Intervals: the lower and upper limits of expected population estimates from repeated sampling of the same size.

This approach is used in conjunction with the NHST approach because it yields more information. Although we can incorporate NHST into confidence intervals, we are providing possible population values rather than just a statement of whether a sample value is likely under the assumption of the null hypothesis.

Determining Confidence Intervals

Calculating confidence intervals relies on determining some of the characteristics of a sampling distribution. Rather than assuming that the mean (i.e., population value) is zero, we set that value to our sample mean. It may seem presumptious to assume that our sample mean is equal to the population mean, remember that our sample mean is an unbiased estimator of the population mean. That is, our sample mean is just as likely to overestimate the population mean as it is to underestimate the population mean. As such, we can also assume (correctly or not, we’ll never know) that our mean is in the center of our sampling mean.

With the mean of our sampling distribution set, we need to determine the bounds of the interval. Recall that the confidence interval reports the lowest and highest expected population estimate. As such, we need to set a level of expectation. We refer to this as the confidence level. Although you can choose any confidence level you wish, we often choose 95% confidence.

Why 95% and not 100%? Surely we would like to have 100% confidence in our results. Let’s relate that 100% to the possible estimates represented in our sampling distribution. If we want 100% confidence that our interval will include the population mean, we would have to include 100% of the sampling distribution. The sampling distribution is a theoretical distribution whose values range from \(-\infty\) to \(+\infty\). It is not very informative to tell someone that the actual average height of SVSU graduates could be between \(-\infty\) and \(+\infty\).

So why 95%? To answer this, we need to revisit a concept from NHST: \(\alpha\). This is the type I error rate, or the false positive rate, which is set to .05. If the probability of getting a sample mean when assuming a certain population mean (i.e., the null hypothesis) is less than 5%, we claim that the sample is unlikely enough to have come from a population with the hypothesized mean and thus reject \(H_0\). Stated another way, we do not think that values that have less than a 5% chance of coming from a sampling distribution are good candidates for estimates.

Confidence and Precision

There is an inverse relationship between confidence and precision. That is, the more confidence (i.e., certainty) we have that our interval will contain the population value, the less precise (i.e., more wide) our interval.

Now we know that the center of our confidence interval is the sample mean and that the interval should span 95% of our sampling distribution. How do we get the lower and upper limits? We will need to calculate the margin of error for our sampling distribution, which is equal to a critical value times the standard error of the mean.

Critical Value: the value of a sampling distribution that corresponds with \(\alpha\).

We usually get the critical value for a sampling distribution from a table or computer software. If you are working with z-scores, the critical value is 1.96. We would then multiply the critical value by the standard error of the mean, which is the standard deviation of our sample divided by the square root of our sample size (\(\frac{s}{\sqrt{n}}\)).

Margin of Error: the distance between the center of our sampling distribution to the bounds of the confidence interval.

M.E. = \(crit \cdot \frac{s}{\sqrt{n}}\)

To get the lower bound, we subtract the margin of error from the mean. To get the upper bound, we add the margin of error to the mean.

CI Lower Bound: Mean - (\(crit \cdot \frac{s}{\sqrt{n}}\))

CI Upper Bound: Mean + (\(crit \cdot \frac{s}{\sqrt{n}}\))

Interpreting Confidence Intervals

Confidence intervals are usually reported as “95% CI [4.33,8.52],” which should be read as “95% of samples will likely yield a value between 4.33 and 8.52.” That is, if we repeated our sampling (of the same size), we should expect all of the values to fall within that range. We can also use this information to make a judgment regarding \(H_0\). If the null hypothesized value is included in our confidence interval, we would fail to reject \(H_0\). If the null value is not included in our confidence interval, we would reject \(H_0\).

Recall that the 95% CI gives us the expected population estimates. If the null hypothesized value is not included in those expected values, we should not consider it to be a likely population value and should reject it. On the other hand, if it is included, then we would have a greater than 5% chance of making a type I error.

Summarizing Confidence Intervals Goal:

To determine the bounds of expected population estimates from samples of the same size

Procedure:

M.E. = \(Mean \pm(crit \cdot \frac{s}{\sqrt{n}})\)

Decision:

If the null hypothesized value is outside of CI, reject \(H_0\), otherwise, fail to reject \(H_0\)

Science and Statistics

Scientists want to understand the world and they do this by systematically observing how it functions. In each round of observations, scientists can only know what is true for their current study (i.e., a sample) but hope to generalize to similar settings (i.e., the population). Statistics is a branch of mathematics that provides tools for scientists to summarize the data they collect and make estimates about the population data from which they are sampling.

Statistics allow scientists to construct models of the variables they study and test those models using data collected in studies. This course will outline some common models used in experimental paradigms.