October 25, 2017

Descriptive Statistics: Theory and Logic

In Unit 1, you read about the difference between descriptive statistics and inferential statistics in Chapter 1 of
your Warner text. For the next two units, we will focus on the theory, logic, and application of descriptive
statistics. This introduction focuses on scales of measurement, measures of central tendency and dispersion, the
visual inspection of histograms, and the detection and processing of outliers.
An important concept in understanding descriptive statistics is the scales of measurement. The Warner (2013)
text defines four scales of measurement—nominal, ordinal, interval, and ratio:
• Nominal data refer to numbers arbitrarily assigned to represent group membership, such as gender
(male = 1; female = 2). Nominal data are useful in comparing groups, but they are meaningless in terms
of measures of central tendency and dispersion.
• Ordinal data represent ranked data, such as coming in first, second, or third in a marathon. However,
ordinal data do not tell us how much of a difference there is between measurements. The first-place and
second-place finishers could finish 1 second apart, whereas the third-place finisher arrives 2 minutes later.
Ordinal data lack equal intervals.
• Interval data refer to equal intervals between data points. An example is degrees measured in
Fahrenheit. Interval data lack a “true zero” value (freezing at 32 degrees Fahrenheit).
• Ratio data do have a true zero, such as heart rate, where “0” represents a heart that is not beating. This
is often seen as “count” data in social research. For example, how many days did an employee miss from
work? Zero is a meaningful unit in this example.
These four scales of measurement are routinely reviewed in introductory statistics textbooks as the classic way
of differentiating measurements. However, the boundaries between the measurement scales are fuzzy. For
example, is intelligence quotient (IQ) measured on the ordinal or interval scale? Recently, researchers have
argued for a simpler dichotomy in terms of selecting an appropriate statistic: categorical versus continuous
measures.
• A categorical variable is a nominal variable. It simply categorizes things according to group membership
(for example, apple = 1, banana = 2, grape = 3).
• A continuous measure represents a difference in magnitude of something, such as a continuum of “low
to high” statistics anxiety. In contrast to categorical variables designated by arbitrary values, a
quantitative measure allows for a variety of arithmetic operations, including equal (=), less than (<),
greater than (>), addition (+), subtraction (−), multiplication (* or ×), and division (/ or ÷). Arithmetic
operations generate a variety of descriptive statistics discussed next.
Measures of Central Tendency and Dispersion
Chapter 2 of Warner (2013) reviews descriptive statistics that measure central tendency (mean, median, mode)
and dispersion (range, sum of squares, variance, standard deviation). To visualize central tendency and
dispersion, refer to Figure 2.5 on page 46 of the Warner text for an illustration of how heart rate data are
represented in a histogram. The horizontal axis represents heart rate (“hr”). The vertical axis represents the total
number of people who were recorded at a particular heart rate (“Frequency”). Measures of centrality summarize
where data clump together at the center of a distribution of scores. (For example, in Figure 2.5 this occurs
around hr = 74.)
Unit 2 – Descriptive Statistics: Theory and Logic
INTRODUCTION
To simplify, consider the following measured heart rates: 65, 70, 75, 75, 130.
The simplest measure of central tendency is the mode. It is the most frequent score within a distribution of
scores (for example, two scores of hr = 75). Technically, in a distribution of scores, you can have two or more
modes. An advantage of the mode is that it can be applied to categorical data. It is also not sensitive to
extreme scores.
The median is the geometric center of a distribution because of how it is calculated. All scores are arranged in
ascending order. The score in the middle is the median. In the five heart rates above, the middle score is a 75. If
you have an even number of scores, the average of the two middle scores is used. The median also has the
advantage of not being sensitive to extreme scores.
The mean is probably what most people consider to be an average score. In the example above, the mean
heart rate is (65 + 70 + 75 + 75 + 130) ÷ 5 = 83. Although the mean is more sensitive to extreme scores (such as
130) relative to the mode and median, it can be more stable across samples, and it is the best estimate of the
population mean. It is also used in many of the inferential statistics studied in this course, such as t tests and
analysis of variance (ANOVA).
In contrast to measures of central tendency, measures of dispersion summarize how far apart data are spread on
a distribution of scores. The range is a basic measure of dispersion quantifying the distance between the lowest
score and the highest score in a distribution (for example, 130 − 65 = 65). A deviance represents the difference
between an individual score and the mean. For example, the deviance for the first heart rate score (65) is 65 −
83, which is −18. By calculating the deviance for each score above from a mean of 83, we arrive at −18, −13,
−8, −8, and +47. Summing all of the deviances equals 0, which is not a very informative measure of dispersion.
A somewhat more informative measure of dispersion is sum of squares ( SS), which you will see again in Units 9
and 10 in the study of analysis of variance (ANOVA). To get around the problem of summing to zero, the sum of
squares involves calculating the square of each deviation and then summing those squares. In the example
above, SS = [(−18)2 + (−13)2 + (−8)2 + (−8)2 + (+47)2] = [(324) + (169) + (64) + (64) + (2209)] = 2830. The
problem with SS is that it increases as data points increase (Field, 2013), and it still is not a very informative
measure of dispersion.
This problem is solved by next calculating the sample variance ( s2), which is the average distance between the
mean and a particular score (squared). Instead of dividing SS by 5 for the example above, we divide by N − 1, or
4; see pages 56–57 of your Warner text for an explanation. The variance is therefore SS ÷ ( N − 1), or 2830 ÷ 4
= 707.5. The problem with interpreting variance is that it is the average distance of “squared units” from the
mean. What is, for example, a “squared” heart rate score?
The final step is calculating the sample standard deviation ( s), which is simply calculated as the square root of
the sample variance, or in our example, √707.5 = 26.60. The sample standard deviation represents the average
deviation of scores from the mean. In other words, the average distance of heart rate scores to the mean is 26.6
beats per minute. If the extreme score of 130 is replaced with a score closer to the mean, such as 90, then s =
9.35. Thus, small standard deviations (relative to the mean) represent a small amount of dispersion; large
standard deviations (relative to the mean) represent a large amount of dispersion (Field, 2013). The standard
deviation is an important component of the normal distribution.
Visual Inspection of a Distribution of Scores
An assumption of the statistical tests that you will study in this course is that the scores for a dependent variable
are normal (or approximately normal) in shape. This assumption is first checked by examining a histogram of the
distribution. Figure 4.19 in the Warner text (p. 147) represents a distribution of heart rate scores that are
approximately normal in shape and visualized in terms of a bell-shaped curve. Notice that the tails of the
distribution are approximately symmetrical, meaning that they are near mirror images to the left and right of the
mean. This distribution technically has two modes at hr = 70 and hr = 76, but the close proximity of these
modes suggests a unimodal distribution.
Departures from normality and symmetry are assessed in terms of skew and kurtosis. Skewness is the tilt or
extent a distribution deviates from symmetry around the mean. A distribution that is positively skewed has a
longer tail extending to the right (the “positive” side of the distribution) as shown in Figure 4.20 of the Warner
text (p. 148). A distribution that is negatively skewed has a longer tail extending to the left (the “negative” side
of the distribution) as shown in Figure 4.21 of the Warner text (p. 149). In contrast to skewness, kurtosis is
defined as the peakedness of a distribution of scores. Figure 4.22 of the Warner text (p. 150) illustrates a
distribution with normal kurtosis, negative kurtosis (a “flat” distribution; platykurtic), and positive kurtosis (a
“sharp” peak; leptokurtic).
The use of these terms is not limited to your description of a distribution following a visual inspection. They are
included in your list of descriptive statistics and should be included when analyzing your distribution of scores.
Skew and kurtosis scores of near zero indicate a shape that is symmetric or close to normal respectively. Values
of −1 to +1 are considered ideal, whereas values ranging from −2 to +2 are considered acceptable for
psychometric purposes.
Outliers
Outliers are defined as extreme scores on either the left of right tail of a distribution, and they can influence the
overall shape of that distribution. There are a variety of methods for identifying and adjusting for outliers.
Outliers can be detected by calculating z scores (reviewed in Unit 4) or by inspection of a box plot. Once an
outlier is detected, the researcher must determine how to handle it. The outlier may represent a data entry
error that should be corrected, or the outlier may be a valid extreme score. The outlier can be left alone,
deleted, or transformed. Whatever decision is made regarding an outlier, the researcher must be transparent
and justify his or her decision.
References
Field, A. (2013). Discovering statistics using IBM SPSS (4th ed.). Thousand Oaks, CA: Sage.
Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand
Oaks, CA: Sage.
OBJECTIVES
To successfully complete this learning unit, you will be expected to:
1. Analyze the strengths and limitations of descriptive statistics.
2. Identify previous experience with and future applications of descriptive statistics.
3. Analyze the purpose and reporting of confidence intervals.
4. Discuss standard error and confidence intervals.
Unit 2 Study 1- Readings
Use your Warner text, Applied Statistics: From Bivariate Through Multivariate Techniques , to complete
the following:
• Read Chapter 2, “Basic Statistics, Sampling Error, and Confidence Intervals,” pages 41–80. This
reading addresses the following topics:
◦ Sample mean ( M).
◦ Sum of squared deviations ( SS).
◦ Sample variance ( s2).
◦ Sample standard deviation ( s).
◦ Sample standard error ( SE).
◦ Confidence intervals (CIs).
• Read Chapter 4, “Preliminary Data Screening” pages 125–184. This reading addresses the following
topics:
◦ Problems in real data.
◦ Identification of errors and inconsistencies.
◦ Missing values.
◦ Data screening for individual variables.
◦ Data screening for bivariate analysis.
◦ Data transformations.
◦ Reporting preliminary data screening.
SOE Learners – Suggested Readings
Young, J. R., Young, J. L., & Hamilton, C. (2014). The use of confidence intervals as a meta-analytic lens to
summarize the effects of teacher education technology courses on preservice teacher TPACK. Journal
of Research on Technology in Education, 46(2), 149–172.
Unit 2 Study 2 – Assignment Preparation
This unit provides context for an upcoming assignment on histograms and descriptive statistics in Unit 3.
Look ahead at the instructions and scoring guide for the Unit 3 assignment so that you have it in mind as you
study the materials and complete the activities in this unit.
Software Installation
Make sure that IBM SPSS Statistics Standard GradPack is fully licensed, installed on your computer, and
running properly. It is important that you have either the Standard or Premium version of SPSS that includes
the full range of statistics. Proper software installation is required in order to complete your first SPSS data
assignment in Unit 3.
Next, click grades.sav to download the file to your computer.
• Important: Do not use the original George and Mallery grades.sav file, as the course room grades.sav
is modified for 7864.
You will use grades.sav throughout the course. The definition of variables in the grades.sav data set are
found in Chapter 1 of your IBM SPSS Statistics Step by Step text. Understanding these variable definitions is
necessary for interpreting SPSS output.
Next week, you will define values and scales of measurement for all variables in your grades.sav file

Statistics

Found something interesting ?

We don't just promise. Here is what we guarantee!

• On-time delivery guarantee
• PhD-level professional writers
• Free Plagiarism Report

• 100% money-back guarantee
• Absolute Privacy & Confidentiality
• High Quality custom-written papers

Descriptive Statistics: Theory and Logic

Found something interesting ?

We don't just promise. Here is what we guarantee!

Related Model Questions

An employee is suspected of operating his llama business with a company computer. It’s been alleged that he’s tracking the sales price of the wool and the cost of feed and upkeep on spreadsheets. What should the employer do? Explain the tasks a computer forensic investigator should perform

Feminism, and Marxism

US Healthcare Timeline

ESSAYBUREAU.COM

Sitemap

Grab your Discount!