Microsoft Excel 2010 : Measuring Variability with the Range

12/15/2014 8:32:25 PM

Just as there are three primary ways to measure the central tendency in a frequency distribution, there’s more than one way to measure variability.

A third way of measuring variability is the range: the maximum value in a set minus the minimum value. It’s usually helpful to know the range of the values in a frequency distribution, if only to guard against errors in data entry. For example, suppose you have a list in an Excel worksheet that contains the body temperatures, measured in Fahrenheit, of 100 men. If the calculated range, the maximum temperature minus the minimum temperature, is 888 degrees, you know pretty quickly that someone dropped a decimal point somewhere. Perhaps you entered 986 instead of 98.6.

The range as a statistic has some attributes that make it unsuitable for use in much statistical analysis. Nevertheless, in part because it’s much easier to calculate by hand than other measures of variability, the range can be useful.

Note

Historically, particularly in the area of statistical process control (a technique used in the management of quality in manufacturing), some well known practitioners have preferred the range as an estimate of variability. They claim, with some justification, that a statistic such as the standard deviation is influenced both by the underlying nature of a manufacturing system and by special events such as human errors that cause a system to go out of control.

It’s true that the standard deviation takes every value into account in calculating the overall variability in a set of numbers. It doesn’t follow, though, that the range is sensitive only to the occasional problems that require detection and correction.

The use of the range as the sole measure of variability in a data set has some drawbacks, but it’s a good idea to calculate it anyway to better understand the nature of your data. For example, Figure 1 shows a frequency distribution that can be sensibly described in part by using the range.

Figure 1. The distribution is approximately symmetric, and the range is a useful descriptor.

Because an appreciable number of the observations appear at each end of the distribution, it’s useful to know that the range that the values occupy is 34. Figure 2 presents a different picture. It takes only one extreme value for the range to present a misleading picture of the degree of variability in a data set.

Figure 2. The solitary value at the top of the distribution creates a range estimate that misdescribes the distribution.

The size of the range is entirely dependent on the values of the largest and the smallest values. The range does not change until and unless there’s a change in one or both of those values, the maximum and the minimum. All the other values in the frequency distribution could change and the range would remain the same. The other values could be distributed more homogeneously, or they could bunch up near one or two modes, and the range would still not change.

Furthermore, the size of the range depends heavily on the number of values in the frequency distribution. See Figure 3 for examples that compare the range with the standard deviation for samples of various sizes, drawn from a population where the standard deviation is 15.

Figure 3. Samples of sizes from 2 to 20 are shown in columns B through F, and statistics appear in rows 22 through 24.

Notice that the mean and the standard deviation are relatively stable across five sample sizes, but the range more than doubles from 27 to 58 as the sample size grows from 2 to 20. That’s generally undesirable, particularly when you want to make inferences about a population on the basis of a sample. You would not want your estimate of the variability of values in a population to depend on the size of the sample that you take.

The effect that you see in Figure 3 is due to the fact that the likelihood of obtaining a relatively large or small value increases as the sample size increases. (This is true mainly of distributions such as the normal curve that contain many of their observations near the middle of the range.) Although the sample size has an effect on the calculated range, its effect on the standard deviation is much less pronounced because the standard deviation takes into account all the values in the sample, not just the extremes.

Excel has no RANGE() function. To get the range, you must use something such as the following, substituting the appropriate range address for the one shown:

=MAX(A2:A21) − MIN(A2:A21)

Others

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 6)

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 5)

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 4)

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 3)

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 2)

- Microsoft Project 2010 : Setting Up Project for Your Use - Defining Environment Options (part 1)

- Microsoft Visio 2013 : Creating a New Diagram - Replacing shapes

- Microsoft Visio 2013 : Creating a New Diagram - Using AutoAdd and AutoDelete

- Microsoft Visio 2013 : Creating a New Diagram - Identifying 1-D shapes and types of glue, Using AutoConnect and Quick Shapes

- Microsoft Visio 2013 : Creating a New Diagram - Connecting shapes with dynamic connectors