Five marbles must use groups of equal size


The five marbles exercise used in MS 150 Statistics is intended to demonstrate that the mean of a set of sample means in a closed system is equal to the population mean. A closed system is where the sample sizes add to the population size: the whole population under consideration is sampled by samples of equal sample size. The exercise is also used to show that the standard error of the mean is smaller than the standard deviation of the data. I start the exercise by giving each student five marbles. After the marbles are distributed I ask what is the average number of marbles each student has. The answer is five, all students have five marbles thus the average is self-evidently five marbles per person.


On the board I multiply the number of students by five to calculate the total number of marbles that have been distributed in the room and then divide the total number of marbles by the total number of students to get back five.

I then tell the students that they can keep all, give away some, or give away all. They do not have to wind up with five marbles at the end of marble trading. This morning there were twenty students with five marbles each for one hundred marbles in circulation. After a few minutes I pass out a slip of paper for calculating the average number of marbles per group.


The groups might have four or five in a group, this term there were five students in each group, four groups in all. I have long thought that the groups should be of equal size, but had not worked out what might happen if I used groups of unequal size. For my class this means either using groups of four students or five students if there is a multiple of four or five. The exercise has been fortunate in usually starting off with a class size which has been a multiple of either four or five. 


The above is the data before and after for twenty students in the 9:00 section of MS 150 statistics. Note that the average for each group after trading is NOT five: that is part of the point of the demonstration. The sample mean is not necessarily equal to the population mean. 

Too, the larger spread in the data post-trading than in the group sample means post-trading can be seen. Data values vary from 0 to 15, while the means run from 3.4 to 7.6. The mean of the means, however, is still five. 

After trading marbles I ask the students what is the average number of marbles per person now. This puzzles them - their gut sense is that the average must have changed as they are often no longer holding five marbles. Then I ask the class how many marbles are out there? In the above case, 100. I then ask how many students are there. In the above case, 20. 100 divided by 20 remains five. On average they each still have five marbles - even though their individual group means are not five. I then show that the mean of the means is still five: the population mean. 

I realized, however, that this exercise depends on the groups being of equal size. I had thought this might be the case, but I had not really looked at the situation. With groups of equal size, even if one group "gives all" or "takes all" marbles, the mean of the means is still the population mean.


If the groups are not of equal size however, there is no guarantee that the sample means will work out to be the original population mean.


If the smallest group takes all of the marbles, or gives away all of their marbles, the mean of the means is no longer five. Thus this example depends on groups of equal size. And on not losing any marbles. One student curiously set aside three marbles and did not include them in the after count, which left the average below five. By adding up the marbles after I can tell if marbles were lost, something that must be remedied for the example to work. 

The one puzzle I have yet to comprehend is why the standard deviation of the sample means does not land closer to the calculated standard error of the mean - the standard deviation of the means, whether the sample standard deviation formula or population standard deviation formula is used - is always larger than the calculated standard error of the mean based on the data. I have long presumed that the issue is the small number of sample means: the number of sample means would have to be equal to the population size so that the n values in the formula denominators are the same. At least the standard deviation of the four sample means is less than the standard deviation of the post-trading marble data which provides a hand-waving argument that the standard error of the mean must be less than the "standard error of the data" (which is the standard deviation of the data).

Comments

Popular posts from this blog

Plotting polar coordinates in Desmos and a vector addition demonstrator

Setting up a boxplot chart in Google Sheets with multiple boxplots on a single chart

Traditional food dishes of Micronesia