### Marbles as demonstrators of distribution of the sample mean and confidence intervals

On Monday the tenth of October I introduced the normal distribution as a random distribution by scattering plastic beads and showing that the mess on the floor was distributed normally. On Wednesday I gave each student five marbles.

Marble trading in progress

I told them to keep all, trade away some, or trade away all - their choice. Afterwards some students had none, some still had five, and some had many marbles.

Before and after trading marbles per student and quintet averages

The data above from the 9:00 section shows that while many student wound up with five after trading marbles, or choosing not to trade, some students have more marbles and a few had no marbles. I asked what the average number of marbles each student had post-trading, most did not realize that the population mean for the class remained five.

Statistics before and after marble trading and for the quintets

That the population mean number of marbles per student remains five can been from the first two rows of the table above: the number of marbles and students in the room did not change.

Data recording sheet for the quintets

I then had the students form groups of five students and calculate the average number of marbles per student in each group of five. This provided the basis for showing that the distribution of the means is narrower than the distribution of the data, and that the distribution of the means is centered on the known population mean. Note that the standard deviation of the means does not equal the standard error of the mean. I suspect but cannot show that this related to the small number of means. A resampling spreadsheet of paper aircraft distances and FiboBelly data with a hundred resamples of a data set argues that the standard error for the original data set is roughly equivalent to the standard deviation of the 100 resampled means and provides the basis for my belief that the number of means drives the difference between the standard deviation of the means and the standard error of the mean calculated from the original data.

Distribution of data and means

For any one section, which might have twenty students and five means, the resulting histograms as line charts are not the most convincing demonstration that the distribution of the means is narrower than the distribution of the data, which is the focus of exercise and of chapter eight in the text.

Multi-term data

A multi-term data set of 105 students and 24 groups still demonstrates that the spread of the means is narrower than the spread of the data, although the standard deviation of the means is still not the standard error of the data. Again, my sense is that the difference in the sample size of the data and the number of means remains problematic.

On Monday I mention that the area under the normal curve provides the probability of a value being within the corresponding domain for that area. In the past I spent far more time on this concept, devoting much of chapter seven to these calculations. The Guidelines for Assessment and Instruction in Statistics Education from the American Statistical Association pushed me to alter my curriculum and include more data exploration, more "hands-on" time with data and data analysis along with presentations by students. Something had to give, and chapter seven was the most expendable. Confidence intervals can be approached without calculating areas under the curve, there is some hand waving arguments that have to be made, but the introduction of ordinary and extraordinary z-scores in chapter two makes this possible.

Thus on Friday I move into ±2SE confidence intervals for n ≥ 30 and use an exercise in throwing paper aircraft from the porch to show that a sample can capture a previously predicted sample mean. In reality this experiment almost always works because of the extreme variation in the flight distances. The large standard deviation leads to a large confidence interval that helps compensate for the wind being a confounding factor term after term. Only during an El Niño drought are all bets off on obtaining the population mean flight distance (strong easterly winds are more common during an El Niño spring).

On the following Monday I want to go after small sample sizes, less than 30, and tackle "replacing" the ±2 with ±t-critical using the TINV function. This term I used a modified return to five marbles. I gave pairs of students five marbles and a data sheet.

Marble mass data sheet

The students then massed each marble, calculated the mean mass per marble, and used their data to calculate a 95% confidence interval for the population mean mass of all of the marbles. I noted that in this case, unlike the Friday paper aircraft exercise, the actual population mean is not known.

The complication of course is exactly that the population is not known, and there are some three hundred marbles in the container. At this point a multi-section data sheet perhaps provides a pseudo-population mean mass for the marbles. After class, once the data was entered, I was able to construct 95% confidence intervals and found that 23 of 24 including this population mean for all of the marbles massed on Monday, which somewhat coincidentally, somewhat not coincidentally, is roughly 95%. The catch with doing this in real time in class is the length of time to mass the marbles and have students do their calculations fills the class time.

Wednesday after covering homework I should return to the marble mass data to show the class that their 95% confidence intervals include the "population mean" 95% of the time - or 23 out of 34 times.

Data from the class and confidence intervals