### 8.2 standard error to 9.2 t-distribution confidence intervals without 9.12

This is really a note to my future self about transitioning from section 8.2 standard error of the mean to 9.2 t-distribution confidence intervals without passing through 9.12 confidence intervals from minus two standard errors to plus two standard errors.

This confidence interval material is approached from 7.1 normal distribution shape using thrown foam beads. The current iteration of the course then skips the NORMDIST and NORMINV functions to provide more time for open data exploration, analysis, and student presentation exercises. One learns a skill by performing the skill, not by listening and watching. The course now emphasizes student pairs analyzing data and presenting results to the class, punctuated by lectures on material.

The confidence interval unit essentially begins with a demonstration based argument that randomness distributes normally. Then an exercise using five marbles per student is used to demonstrate that sample means for multiple samples from a population distribute more narrowly around the population mean. In this example the population mean is known. This demonstration is used to provide support for a "standard deviation of the means" being smaller than the standard deviation of the underlying data. The demonstration also intends to show that the means distribute normally even when the underlying data is not normally distributed. This actually only holds for larger samples sizes of more samples than are used in the demonstration. A spreadsheet that includes multi-term results provides perhaps better support for the normality of the distribution of the means. I do not derive and thus prove the formula as this really does not provide increased understanding for my students. Abstract mathematical proofs provide no cognitive hooks for them and are not an explanation in the way that it would be for a statistician or mathematician. I leave that to higher classes.

The usual path from five marbles and the standard error is build on a concept from chapter 2.4 that z-scores out beyond an absolute value of two are unusual. More than two standard errors away from the mean should encompass 95% of the area under the curve, and the area is the probability. The complication is that in skipping the bulk of chapter seven, the idea that areas are probabilities is all but absent from the course structure at present. I once saw a text that simply jumped into confidence intervals in chapter one essentially without reference to any distributions. That said, the shape of the distribution and areas under the curve as probabilities still appeals to my visual side.

Chapter 8.2 is usually then used to segue to a two standard error confidence interval for the mean using thrown paper aircraft. Using a prior term's population mean, I start the demonstration by telling the class that they will make paper airplanes, thrown them from the balcony, and, on average, the planes will fly a distance of 560 cm as measured perpendicularly from the porch.

The average is never 560 cm, or whatever the current population mean might be. I use this to show the students that being a statistician means being wrong. Often. As in 22% of the time even when using a 95% confidence interval and the t-distribution. With 34 trials, one might be surprised that I am not closer to 95% but there are external factors that mean the trials are not occurring in the same conditions. The primary factor is the wind. There are terms when a strong headwind drops the average distance below the lower bound for the windless 95% confidence intervals.

Once I have shown that I am wrong, on the more typical windless days I am usually very close. This term the averages were 592 and 521: a difference of 32 centimeters and 39 centimeters, distances small enough to be impressive given the range (-50 cm to 2440 cm where negative is a plane that flew back under the upper porch). The average for both sections was 554, a mere six centimeters from 560 and only eight away from the 562 that was on the board at the start of the 9:00 class (once thrown, the new data becomes part of the population and this shifts the mean).

In the past I would then note that while I am wrong, the 95% confidence interval for the mean includes the predicted population mean (this is where I have that 88% success rate: I still get to be wrong even when using a confidence interval). I go from being wrong when I use the sample mean to being right when I use the confidence interval. What do I do when the confidence interval does not include my predicted population mean? Fall back on the argument that I will be wrong 5% of the time no matter what I do. I usually do not try to explain that I am actually wrong 22% of the time. But the wind is a real issue for this exercise.

I used to note that my interval was a little too narrow as no section ever throws 30 or more aircraft: the sample means are not going to distribute normally unless my sample size is at least 30 aircraft. The next class I then introduce the t-distribution. The complication is that from 9.12 onwards, the students steadfastly use two standard errors, even for sample sizes as small as 5. This is the most common error. The "two" sticks in their head, and I cannot get them to shift to t-critical. I have tried. Many ways. And once introduced, I cannot shake the students free from using a two standard error confidence interval every time for everything. I finally recommended dropping 9.12 and going straight into 9.2, never mentioning the factor of two. Historically two was useful: easy to calculate in the predigital age. As long as sample sizes were large, this was a safe estimate. But the class is usually working with small samples where two is inappropriate.

This term I skipped 9.12 and went straight into 9.2 using the paper aircraft exercise and presenting the formulas mean ± TINV(1-c,n-1)*SE where c is 0.95 and n is the sample size. I did not mention that the TINV will produce a value near two for sufficiently large sample sizes - I did not want "two" to get stuck again in the heads of the students.

At 8:00 I used the new Desmos normaldist capabilities to attempt to show that 95% of the area under the curve is produced by the above formula, but this is still a hand waving argument. Ultimately I ran out of time to develop this further in the 8:00 section, and the 9:00 section ran even tighter. More aircraft to measure.

Dropping 9.12 altered the syllabus. I realized almost immediately that I could "fix" an awkward Wednesday start to a Monday presentation of 12.1 by also dropping test three and shifting FiboBelly back from behind 9.2 to in back of 10.3. This will also mean that FiboBelly could be resolved using hypothesis testing, p-values, and effect size. Test three then drops in behind chapter eleven. All of this will impact chapter ten work which used the FiboBelly results in class and the paper aircraft exercise as homework.

Recommendations include adding more specific coverage of area under the foam bead curve on Monday as the probability of a bead being in that interval. Possibly using Desmos to illustrate the bead curve and the CDF capability to show the percentage of beads between two values. The need to cover homework and the shortening of periods to 50 minutes starting last fall puts time pressure on exercises such as five marbles and paper aircraft.

The bead distribution introduction to the normal distribution is worked primarily in a Google Sheets spreadsheet although the new capabilities of Desmos might be profitably exploited to show that areas under the curve tell one the number of beads in an interval.

Katchugo ready to throw her aircraft on Friday, Shiro looking on, in the exercise that supports 9.2

Deina and Jacqueline present an analysis to the class

The confidence interval unit essentially begins with a demonstration based argument that randomness distributes normally. Then an exercise using five marbles per student is used to demonstrate that sample means for multiple samples from a population distribute more narrowly around the population mean. In this example the population mean is known. This demonstration is used to provide support for a "standard deviation of the means" being smaller than the standard deviation of the underlying data. The demonstration also intends to show that the means distribute normally even when the underlying data is not normally distributed. This actually only holds for larger samples sizes of more samples than are used in the demonstration. A spreadsheet that includes multi-term results provides perhaps better support for the normality of the distribution of the means. I do not derive and thus prove the formula as this really does not provide increased understanding for my students. Abstract mathematical proofs provide no cognitive hooks for them and are not an explanation in the way that it would be for a statistician or mathematician. I leave that to higher classes.

Five marbles exercise on Wednesday

The usual path from five marbles and the standard error is build on a concept from chapter 2.4 that z-scores out beyond an absolute value of two are unusual. More than two standard errors away from the mean should encompass 95% of the area under the curve, and the area is the probability. The complication is that in skipping the bulk of chapter seven, the idea that areas are probabilities is all but absent from the course structure at present. I once saw a text that simply jumped into confidence intervals in chapter one essentially without reference to any distributions. That said, the shape of the distribution and areas under the curve as probabilities still appeals to my visual side.

Stania throws her paper aircraft

Chapter 8.2 is usually then used to segue to a two standard error confidence interval for the mean using thrown paper aircraft. Using a prior term's population mean, I start the demonstration by telling the class that they will make paper airplanes, thrown them from the balcony, and, on average, the planes will fly a distance of 560 cm as measured perpendicularly from the porch.

Paper aircraft have a high variance

The average is never 560 cm, or whatever the current population mean might be. I use this to show the students that being a statistician means being wrong. Often. As in 22% of the time even when using a 95% confidence interval and the t-distribution. With 34 trials, one might be surprised that I am not closer to 95% but there are external factors that mean the trials are not occurring in the same conditions. The primary factor is the wind. There are terms when a strong headwind drops the average distance below the lower bound for the windless 95% confidence intervals.

Once I have shown that I am wrong, on the more typical windless days I am usually very close. This term the averages were 592 and 521: a difference of 32 centimeters and 39 centimeters, distances small enough to be impressive given the range (-50 cm to 2440 cm where negative is a plane that flew back under the upper porch). The average for both sections was 554, a mere six centimeters from 560 and only eight away from the 562 that was on the board at the start of the 9:00 class (once thrown, the new data becomes part of the population and this shifts the mean).

In the past I would then note that while I am wrong, the 95% confidence interval for the mean includes the predicted population mean (this is where I have that 88% success rate: I still get to be wrong even when using a confidence interval). I go from being wrong when I use the sample mean to being right when I use the confidence interval. What do I do when the confidence interval does not include my predicted population mean? Fall back on the argument that I will be wrong 5% of the time no matter what I do. I usually do not try to explain that I am actually wrong 22% of the time. But the wind is a real issue for this exercise.

I used to note that my interval was a little too narrow as no section ever throws 30 or more aircraft: the sample means are not going to distribute normally unless my sample size is at least 30 aircraft. The next class I then introduce the t-distribution. The complication is that from 9.12 onwards, the students steadfastly use two standard errors, even for sample sizes as small as 5. This is the most common error. The "two" sticks in their head, and I cannot get them to shift to t-critical. I have tried. Many ways. And once introduced, I cannot shake the students free from using a two standard error confidence interval every time for everything. I finally recommended dropping 9.12 and going straight into 9.2, never mentioning the factor of two. Historically two was useful: easy to calculate in the predigital age. As long as sample sizes were large, this was a safe estimate. But the class is usually working with small samples where two is inappropriate.

This term I skipped 9.12 and went straight into 9.2 using the paper aircraft exercise and presenting the formulas mean ± TINV(1-c,n-1)*SE where c is 0.95 and n is the sample size. I did not mention that the TINV will produce a value near two for sufficiently large sample sizes - I did not want "two" to get stuck again in the heads of the students.

Desmos' new normal distribution capabilities

At 8:00 I used the new Desmos normaldist capabilities to attempt to show that 95% of the area under the curve is produced by the above formula, but this is still a hand waving argument. Ultimately I ran out of time to develop this further in the 8:00 section, and the 9:00 section ran even tighter. More aircraft to measure.

Dropping 9.12 altered the syllabus. I realized almost immediately that I could "fix" an awkward Wednesday start to a Monday presentation of 12.1 by also dropping test three and shifting FiboBelly back from behind 9.2 to in back of 10.3. This will also mean that FiboBelly could be resolved using hypothesis testing, p-values, and effect size. Test three then drops in behind chapter eleven. All of this will impact chapter ten work which used the FiboBelly results in class and the paper aircraft exercise as homework.

Recommendations include adding more specific coverage of area under the foam bead curve on Monday as the probability of a bead being in that interval. Possibly using Desmos to illustrate the bead curve and the CDF capability to show the percentage of beads between two values. The need to cover homework and the shortening of periods to 50 minutes starting last fall puts time pressure on exercises such as five marbles and paper aircraft.

Bead distribution on Monday

The bead distribution introduction to the normal distribution is worked primarily in a Google Sheets spreadsheet although the new capabilities of Desmos might be profitably exploited to show that areas under the curve tell one the number of beads in an interval.

Five marble board on Wednesday

Paper aircraft Friday board: the jump to t-distribution confidence intervals

## Comments

## Post a Comment