### Normal distribution beads, marbles, and planes

Although weather fall 2016 moved the paper aircraft exercise to the Monday after probability wrapped up, weather spring 2018 including passing rain showers and wind inveighed against repeating this sequence.

This will return the beads to the schedule and push paper aircraft back to be used either to demonstrate that the distribution of the sample means is narrower than the distribution of the data or used as a vehicle to demonstrate how a sample can be used to construct an interval for which a 95% probability exists that the population mean is captured by the interval. The data currently includes 28 samples from 28 sections of MS 150 Statistics dating back to spring 2012. The result is that the data can be effectively used to show the narrower "taller" distribution of sample means or whether a sample can capture the population mean. There are issues with the data, the most significant being that wind can have a significant effect on the sample mean. The planes perform rather randomly, with the normal distribution clipped only in the left tail where the building prevents planes from flying back to the south.

With randomly tossed beads returning to the position immediately after probability, there is a need to start with "Where will one bead land?" Predictable? Any one bead is unpredictable. Calculating the probability that a bead will land in a particular row is not possible until after a sample of beads has been thrown. Only when many beads are thrown can one attach probabilities as to where a bead will land.

8:00 after counting

I started with the question, "Where will this heart land?" holding up one heart? "Or this star" I tossed a few and they landed in random positions on the floor. I noted that one could not predict where they would land, let alone assign a probability. One student, noting the row numbers, suggested that there is a one in eleven probability of the shape landing in a particular row. I noted that this seemed unlikely: landing in row 1 or 11 seemed less likely that row six or seven - where I was standing on the table.

9:00 after counting (grouped by fives)

At 9:00 I combined the eight and nine o'clock data to increase the sample size and then used the normdist function in Google Sheets to model the bead distribution with a normal curve.

A tad lumpy, and skewed left, a known by-product of right hand throws from the sixth row.

On Wednesday I gave every student five marbles. First I established that the average number of marbles per student was five. Then I instructed the students that they could either hold all five, give one, two three, four, or all five away. To any number of other students in the class. As many times as they chose. At eight o'clock there were 15 students present at the time of the marble distribution, at nine o'clock there were 18 students present at the time of the marble distribution.

After the marble give and take, I asked what the average was now. Some students in the two classes thought the average had changed. I then noted that we still had 75 marbles and 15 students in the eight o'clock, 90 marbles and 18 students in the nine o'clock, class. The average was still five. The population mean was still five marbles.

The plastic knife is undoubtedly due a game of "assassin" running among a group of friends

I then asked the 8:00 students to form groups of five students each, at nine a student suggested groups of three each. The groups of three each were small, but three groups of six each would generate only three means so I went with the six groups of three each.

Each group recorded the number of marbles each student had and the group average. The group averages were not five, except in one group. Yet, again, the population mean was five. The sample means ought to distribute normally around that population mean. With the known issue that groups of five students are really too small to ensure normality. The t-distribution will be tackled next week.

The data was transferred to a Google Sheets spreadsheet, a portion of which is seen above with the 9:00 section data.  Combining the data from all sections across all terms to produce a relative frequency histogram combining the distribution of the marbles after trading and the distribution of sample means serves to demonstrate the narrower and taller distribution of the sample means.

Although the underlying data distribution was bimodal and skwed, for the intents and purposes of the introduction statistics, the distribution of the sample means against the same classes (intervals, bins, buckets) is arguably narrower, taller, appears normal, and is centered on the population mean of five marbles. There are many issues with the data including that each sample has a very small sample size (three to five students), a small number of samples, and other issues.

As a result, the standard error of the mean is a rather poor estimator of the standard deviation of the sample means. That said, the demonstration provides the opportunity to at least argue that the standard error of the means is going to be smaller than the standard deviation of the data. A hand waving argument of sample sizes being too small (n less than 30) and the number of samples being too small (perhaps these too ought to exceed 30) is about all I can muster to explain the difference between the standard error of the mean and the standard deviation of the means.  Underneath is the issue that the standard error of a mean is for many samples taken from a population and, in theory, uses the population standard deviation, not the sample standard deviation.

Paper aircraft has always depended on a fair weather day, dry enough to fly, and spring 2018 has been devoid of those types of days. A relatively weak La Niña has persisted since late 2017. The fall rains gave way to the winter rains and now the spring rains. There was no less rainy season this year.

Wednesday morning

Wednesday morning provided perhaps the best opportunity to fly the planes, although there was some wind. The complication is that the paper aircraft exercise makes a rather weak segue from randomly tossed beads to the idea that something purely random yields a normally distributed set of sample means. The paper aircraft exercise only generates one mean per class, and then a hand waving argument is used to drag in all of the sample means back to 2012.  The marble exercise at least produces the illusion of multiple samples producing means that distribute normally. Ultimately this is in part an illusion produced by the mean typically landing in a single interval of the frequency distribution and the use of a smoothed line chart in Google Sheets. I am keenly aware of this sleight of hand, but getting this right would take far more samples, perhaps thirty or more, and making a separate histogram for the sample means.

Friday dawns gray, rainy, wet.

The paper aircraft are a much better vehicle for demonstrating the capture of a known population mean, and thus serve as a good introduction to confidence intervals.

The ground was swampy at best. Although I have always worn gardening shoes for this exercise, today I knew that barefoot with rolled up pants would be the only option.

Students make their own paper aircraft design.

Students usually know how to make a paper airplane, but a few do not. To avoid biasing the data, or appearing to bias the data, I do not instruct students in how to make a paper airplane. Preflight I have told the class that the average distance will be 570 cm, I do not explain the source of that number.

The aircraft are thrown from the porch, distances will be measured perpendicular to the building.

Students get excited when their aircraft flies well - the students become invested in the outcome of the data.

At the top of the screen an aircraft heads west as a breeze picks up. There was more wind at 8:00 than at 9:00, with conditions improving throughout the morning.

This term, in expectation of rain, I built a tally sheet for distances. This proved faster and more efficient than writing the distances. The tally sheet has since been revised to perhaps perform better in the field for extreme values. The morning section saw a 2500 cm flight.

A towel was brought along but not needed. A 30 meter tape makes measuring relatively quick.

Wet, muddy conditions underfoot necessitated paper towel on tape rewind.

Planes scatter out on the lawn.

Prior to the 8:00 section the population mean stood at 570 cm. The 496 cm average flight distance was no 570 cm. Thus I explain to the class that I was wrong, the average was not 570 cm. This provides an opportunity to explain that sample averages will usually be wrong, thus the shift to a confidence interval. In this system the large standard deviations usually assure that a sample mean will capture the population mean.

At 9:00 the average distance was 503 cm for 29 aircraft against the new population mean of 569 cm.

I left the population mean at 569 until late in the period, not wishing to confuse matters by noting that the 9:00 aircraft were now also part of the population, or should be. With the addition of the 9:00 data the number of aircraft thrown rose to 626 for 30 sections since 2012.

The flight data remains skewed right with a long right tail, the left tail effectively truncated by the wall on the porch below. The means are at least arguably distributed normally about the population mean.

The actual distribution of the 30 sample means may itself be slightly skewed.

Today the goal is the introduction of the confidence interval. For this I use a return to perhaps the earliest back-of-envelope work by Pearson where in practice they used two standard errors rather than 1.96. This then parallels the ordinary z-score interval in chapter 2.4 and allows me to refer back to that. I do note to the class that "plus or minus two" works only for a truly random sample and then only for a sample size of thirty or more, which neither class achieved. Next week I will introduce the TINV function, this will adjust that "two" based on the degrees of freedom. At 8:00 the t-critical value will be 2.179, thus two is a reasonable first crack at the 95% confidence interval.

I encourage my students to take pictures of the board, even helpfully suggesting that they take a selfie with the board.