Microsoft Excel 2010 : Understanding Frequency Distributions (part 2) - Building a Frequency Distribution from a Sample

1/12/2013 11:38:02 AM

2. Building a Frequency Distribution from a Sample

Conceptually, it’s easy to build a frequency distribution. Take a sample of people or things and measure each member of the sample on the variable that interests you. Your next step depends on how much sophistication you want to bring to the project.

Tallying a Sample

One straightforward approach continues by dividing the relevant range of the variable into manageable groups. For example, suppose you obtained the weight in pounds of each of 100 people. You might decide that it’s reasonable and feasible to assign each person to a weight class that is ten pounds wide: 75 to 84, 85 to 94, 95 to 104, and so on. Then, on a sheet of graph paper, make a tally in the appropriate column for each person, as suggested in Figure 6.

Figure 6. This approach helps clarify the process, but there are quicker and easier ways.

The approach shown in Figure 1.15 uses a grouped frequency distribution, and tallying by hand into groups was the only practical option as recently as the 1980s, before personal computers came into truly widespread use. But using an Excel function named FREQUENCY(), you can get the benefits of grouping individual observations without the tedium of manually assigning individual records to groups.

Grouping with FREQUENCY()

If you assemble a frequency distribution as just described, you have to count up all the records that belong to each of the groups that you define. Excel has a function, FREQUENCY(), that will do the heavy lifting for you. All you have to do is decide on the boundaries for the groups and then point the FREQUENCY() function at those boundaries and at the raw data.

Figure 7 shows one way to lay out the data.

Figure 7. The groups are defined by the numbers in cells C2:C8.

In Figure 7, the weight of each person in your sample is recorded in column A. The numbers in cells C2:C8 define the upper boundaries of what this section has called groups, and what Excel calls bins. Up to 85 pounds defines one bin; from 86 to 95 defines another; from 96 to 105 defines another, and so on.

Note

There’s no special need to use the column headers shown in Figure 1.16, cells A1, C1, and D1. In fact, if you’re creating a standard Excel chart as described here, there’s no great need to supply column headers at all. If you don’t include the headers, Excel names the data Series1 and Series2. If you use the pivot chart instead of a standard chart, though, you will need to supply a column header for the data shown in Column A in Figure 1.16.

The count of records within each bin appears in D2:D8. You don’t count them yourself—you call on Excel to do that for you, and you do that by means of a special kind of Excel formula, called an array formula. Y

1.	Select the range of cells that the results will occupy. In this case, that’s the range of cells D2:D8.
2.	Type, but don’t yet enter, the formula =FREQUENCY(A2:A101,C2:C8) which tells Excel to count the number of records in A2:A101 that are in each bin defined by the numeric boundaries in C2:C8.
3.	After you have typed the formula, hold down the Ctrl and Shift keys simultaneously and press Enter. Then release all three keys. This keyboard sequence notifies Excel that you want it to interpret the formula as an array formula.

Note

When Excel interprets a formula as an array formula, it places curly brackets around the formula in the formula box.

The results appear very much like those in cells D2:D8 of Figure 1.16, of course depending on the actual values in A2:A101 and the bins defined in C2:C8. You now have the frequency distribution but you still should create the chart. Here are the steps, assuming the data is located as in Figure 1.16:

1.	Select the data you want to chart—that is, the range C1:D8.
2.	Click the Insert tab, and then click the Column button in the Charts group.
3.	Choose the Clustered Column chart type from the 2-D charts. A new chart appears, as shown in Figure 8. Because columns C and D on the worksheet both contain numeric values, Excel initially thinks that there are two data series to chart: one named Bins and one named Frequency. Figure 8. Values from both columns are charted as data series at first because they’re all numeric.
4.	Fix the chart by clicking Select Data in the Design tab that appears when a chart is active. The dialog box shown in Figure 9 appears. Figure 9. You can also use the Select Data dialog box to add another data series to the chart.
5.	Click the Edit button under Horizontal (Category) Axis Labels. A new Axis Labels dialog box appears; drag through cells C2:C8 to establish that range as the basis for the horizontal axis. Click OK.
6.	Click the Bins label in the left list box shown in Figure 9. Click the Remove button to delete it as a charted series. Click OK to return to the chart.
7.	Remove the chart title and series legend, if you want, by clicking each and pressing Delete.

At this point you will have a normal Excel chart that looks much like the one shown in Figure 7.

Tip

You can use the same range for the Data argument and the Bins argument in the FREQUENCY() function: for example, =FREQUENCY(A1:A101,A1:A101). Don’t forget to enter it as an array formula. This is a convenient way to get Excel to treat every recorded value as its own bin, and you get the count for every unique value in the range A1:A101.

Grouping with Pivot Tables

Another approach to constructing the frequency distribution is to use a pivot table. A related tool, the pivot chart, is based on the analysis that the pivot table does. I prefer this method to using an array formula that employs FREQUENCY() because once the initial groundwork is done, I can use the same pivot table to do analyses that go beyond the basic frequency distribution. But if all I want is a quick group count, FREQUENCY() is usually the faster way.

Building the pivot table (and the pivot chart) requires you to specify bins, just as the use of FREQUENCY() does, but that happens a little further on.

Note

A reminder: When you use the FREQUENCY() method described in the prior section, a header at the top of the column of raw data is helpful but not required. When you use the pivot table method, the header is required.

Begin with your sample data in A1:A101, just as before. Select any one of the cells in that range and then follow these steps:

1.	Click the Insert tab. Click the PivotTable drop-down in the Tables group and choose PivotChart from the drop-down list. (When you choose a pivot chart, you automatically get a pivot table along with it.) The dialog box in Figure 10 appears. Figure 10. If you begin by selecting a single cell in the range containing your input data, Excel automatically proposes the range of adjacent cells that contain data.
2.	Click the Existing Worksheet option button. Click in the Location range edit box and then click some blank cell in the worksheet that has other empty cells to its right and below it.
3.	Click OK. The worksheet now appears as shown in Figure 11. Figure 11. With one field only, you normally use it for both Axis Fields (Categories) and Summary Values.
4.	Click the Weight field in the PivotTable Field List and drag it into the Axis Fields (Categories) area.
5.	Click the Weight field again and drag it into the Σ Values area. Despite the uppercase Greek sigma, which is a summation symbol, the Σ Values in a pivot table can show averages, counts, standard deviations, and a variety of statistics other than the sum. However, Sum is the default statistic for a numeric field.
6.	The pivot table and pivot chart are both populated as shown in Figure 12. Right-click any cell that contains a row label, such as C2. Choose Group from the shortcut menu. Figure 12. The Weight field contains numeric values only, so the pivot table defaults to Sum as the summary statistic. The Grouping dialog box shown in Figure 13 appears. Figure 13. This step establishes the groups that the FREQUENCY() function refers to as bins.
7.	In the Grouping dialog box, set the Starting At value to 81 and enter 10 in the By box. Click OK.
8.	Right-click a cell in the pivot table under the header Sum of Weight. Choose Value Field Settings from the shortcut menu. Select Count in the Summarize Value Field By list box, and then click OK.
9.	The pivot table and chart reconfigure themselves to appear as in Figure 14. To remove the field buttons in the upper- and lower-left corners of the pivot chart, select the chart, click the Analyze tab, click the Field Buttons button, and select Hide All. Figure 14. This sample’s frequency distribution has a slight right skew but is reasonably close to a normal curve.

Others

- Microsoft Excel 2010 : Understanding Frequency Distributions (part 1) - Using Frequency Distributions

- Microsoft Excel 2010 : Charting Numeric Variables in Excel

- Microsoft Word 2010 : Working with Themes

- Microsoft Word 2010 : Copying Formatting, Working with Lists

- Microsoft Access 2010 : Using Forms to Enter and Edit Table Data (part 3) - Copying Records Within a Form

- Microsoft Access 2010 : Using Forms to Enter and Edit Table Data (part 2) - Using a Form to Delete Records from a Table

- Microsoft Access 2010 : Using Forms to Enter and Edit Table Data (part 1) - Moving from Record to Record in a Form, Undoing Changes Made Within a Form

- Microsoft OneNote 2010 : Using Page Templates

- Microsoft OneNote 2010 : Working with Subpages

- Microsoft Visio 2010 : Printing Basics (part 2) - Print Preview, Experimenting with Printing Without Wasting Trees

Microsoft Excel 2010 : Understanding Frequency Distributions (part 2) - Building a Frequency Distribution from a Sample

2. Building a Frequency Distribution from a Sample

Tallying a Sample

Figure 6. This approach helps clarify the process, but there are quicker and easier ways.

Grouping with FREQUENCY()

Figure 7. The groups are defined by the numbers in cells C2:C8.

Figure 8. Values from both columns are charted as data series at first because they’re all numeric.

Figure 9. You can also use the Select Data dialog box to add another data series to the chart.

Grouping with Pivot Tables

Figure 10. If you begin by selecting a single cell in the range containing your input data, Excel automatically proposes the range of adjacent cells that contain data.

Figure 11. With one field only, you normally use it for both Axis Fields (Categories) and Summary Values.

Figure 12. The Weight field contains numeric values only, so the pivot table defaults to Sum as the summary statistic.

Figure 13. This step establishes the groups that the FREQUENCY() function refers to as bins.

Figure 14. This sample’s frequency distribution has a slight right skew but is reasonably close to a normal curve.