### Assessing Learning in Introductory Statistics

MS 150 Statistics is an introductory statistics course with a focus on statistical operations and methods. The course is guided by the 2007 Guidelines for Assessment and Instruction in Statistics Education (GAISE), the spring 2016 draft GAISE update, and the ongoing effort at the college to incorporate authentic assessment in courses.

In the fall of 2012 the statistics curriculum was adjusted to include a couple of weeks of open data exploration exercises at the end of the term. These exercises were submitted as assignments and marked by the instructor. Spring 2015 the last open data exploration in a set of three was assigned as a presentation to the class. This changed the stakes from an assignment seen only by the instructor to a presentation seen by all of the students in the class. That shift would ramp up the level of effort students put into their open data exploration exercises. The arrival of improved presentation technology fall 2015 in the form of a brighter flat panel display would lead to the decision to repeat having the last open data exploration as a presentation to the class.

By now the benefit of having the students do presentations was clear. Spring 2016 the statistics syllabus trimmed and compressed the content yet again, producing a twelve week "traditional" lecture-quiz-test structure followed by three open data exploration presentations in the last three weeks of the course.

I often noted to the class during the spring 2016 term that the curriculum structure was essentially learning to play the game of statistics for twelve weeks and then playing the game in the final three weeks of the course. The difficulty with this approach is the reality that very little real learning and comprehension occurs in those first twelve weeks. When the students engage in open data exploration beginning in week thirteen, the students are often at a loss as to what to do. During those final three weeks the students demonstrated active engagement with the material.

The challenge was how to start the term with the students engaged in an open data exploration leading to a presentation on a day one zero knowledge start. While out on a run I realized that what I wanted to do was to walk into class on the first day of class and have the first thing I say be, "Your statistical analysis presentations will be on Friday. Any questions?"

The break through came with learning of the Mars and Murrie option. With a way to start the term on a zero day open data exploration presentation, I rearranged the curriculum yet again. The goal was to retain the current content of the outline - continue to meet the content goals of the outline and course - while expanding the number of presentations and then having the presentations occur throughout the fall 2016 term. There would still be "traditional" lecture style presentation of information and content coverage, but these would be "punctuated" by presentations.

The result would a fall 2016 term with eight presentations. The data explorations would not be entirely unguided open data explorations, some of the data explorations would be more tightly guided. Assessment of the fall 2016 term suggested the the presentations were both challenging and contributed most to the students self-reported learning. This approach was repeated in the spring term of 2017. The structure of the spring holidays 2017 permitted only seven presentations during the spring term.

Three course level student learning outcomes currently guide MS 150 Introduction to Statistics:

The first twenty-two questions are course material. Questions 23 and 24 are material not covered in the course, questions 25 to 28 were conceptually covered but were based on the results of 23 and 24.

On core course material, the students performed exceptionally well. Only the calculation of the margin of error and the upper and lower bounds fell short of expectations.

Question 23 presented the students with a formula to calculate t-critical for paired data, TINV(alpha,n-2). The students performed poorly only because on question number 18 students took the paired sample size to be 40, not the correct value of 20. The students double-counted the paired data, a common error.

In question 24 the students were given a formula to calculate the t-statistic based on the correlation r and the sample size. This was new material. The students knew how to calculate the correlation r, although their success rate was a tad low at 66%. The most common error was using the value of r-squared. Graphs still display r-squared rather than r.

Questions 25 to 28 were conceptually introduced during the course, with the result this time based on the new calculations in 23 and 24. The intent here was to see whether the students could go beyond the edge of the material specifically covered in the course and reach a statistical conclusion. Note that the miscalculation of t-critical does not affect the decisions in questions 25 through 28. Thus success rates lifted. The upshot is that on the order of a third of the students not only showed mastery of statistics, but showed an ability to take that knowledge into new statistical terrain and apply the principles of statistics. This appears to be evidence that for that one third, the knowledge is not short term memorized information that will be lost at term's end, but a deeper understanding of statistics.

Over the long haul, regression to the mean is as inescapable in statistics as entropy is in physics. Means return to long term means and those long term means return to even longer term means. And in the world of final examination averages, moving those longer term means is very difficult. Term-on-term fluctuations are almost meaningless and should not be viewed as calls to action. Since 2012 the final examination percent has moved in a narrow range of plus or minus four percent from 74%. Spring 2017 was no exception with a 73.87% overall average on the final examination. While performance improved on basic statistics, the overall average across all questions.

Regina

In the fall of 2012 the statistics curriculum was adjusted to include a couple of weeks of open data exploration exercises at the end of the term. These exercises were submitted as assignments and marked by the instructor. Spring 2015 the last open data exploration in a set of three was assigned as a presentation to the class. This changed the stakes from an assignment seen only by the instructor to a presentation seen by all of the students in the class. That shift would ramp up the level of effort students put into their open data exploration exercises. The arrival of improved presentation technology fall 2015 in the form of a brighter flat panel display would lead to the decision to repeat having the last open data exploration as a presentation to the class.

Nancy

By now the benefit of having the students do presentations was clear. Spring 2016 the statistics syllabus trimmed and compressed the content yet again, producing a twelve week "traditional" lecture-quiz-test structure followed by three open data exploration presentations in the last three weeks of the course.

Leah, Melissa

I often noted to the class during the spring 2016 term that the curriculum structure was essentially learning to play the game of statistics for twelve weeks and then playing the game in the final three weeks of the course. The difficulty with this approach is the reality that very little real learning and comprehension occurs in those first twelve weeks. When the students engage in open data exploration beginning in week thirteen, the students are often at a loss as to what to do. During those final three weeks the students demonstrated active engagement with the material.

Juliet

Summer of 2016 I sought to integrate a more problem based learning into the course and to find a way to bring the presentations forward in the course. One does not learn a sport by first learning every single rule and strategy. One usually starts by goofing around with the equipment, playing with a few friends, becoming interested in the sport and then learning the rules, strategies, and nuances. The desire was to have statistics mimic this approach. Start by playing the game and cover rules and strategies as the players progressed.
Reinhardt, Dannia

The challenge was how to start the term with the students engaged in an open data exploration leading to a presentation on a day one zero knowledge start. While out on a run I realized that what I wanted to do was to walk into class on the first day of class and have the first thing I say be, "Your statistical analysis presentations will be on Friday. Any questions?"

The break through came with learning of the Mars and Murrie option. With a way to start the term on a zero day open data exploration presentation, I rearranged the curriculum yet again. The goal was to retain the current content of the outline - continue to meet the content goals of the outline and course - while expanding the number of presentations and then having the presentations occur throughout the fall 2016 term. There would still be "traditional" lecture style presentation of information and content coverage, but these would be "punctuated" by presentations.

The result would a fall 2016 term with eight presentations. The data explorations would not be entirely unguided open data explorations, some of the data explorations would be more tightly guided. Assessment of the fall 2016 term suggested the the presentations were both challenging and contributed most to the students self-reported learning. This approach was repeated in the spring term of 2017. The structure of the spring holidays 2017 permitted only seven presentations during the spring term.

- Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
- Perform basic statistical calculations for paired correlated variables.
- Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests.

Although some faculty opt to measure these during the term, my own work on the loss of mathematical knowledge among beginning of the term physical science students suggests that in term measurement of learning could generate inflated success rates. With the end of the term in statistics, the students in the spring 2017 run of the statistics course have not had course material for five weeks and two major spring holiday sequences prior to the final examination. During this five weeks the students have been analyzing data and preparing presentations on that data. There has been sufficient time for specific learning outcomes knowledge to be lost. Thus an item analysis of the final examination may provide some insight into retained learning.

Thirty-eight students sat the final examination. The chart depicts the percent success rate on each item on the final examination.

The first twenty-two questions are course material. Questions 23 and 24 are material not covered in the course, questions 25 to 28 were conceptually covered but were based on the results of 23 and 24.

On core course material, the students performed exceptionally well. Only the calculation of the margin of error and the upper and lower bounds fell short of expectations.

Question 23 presented the students with a formula to calculate t-critical for paired data, TINV(alpha,n-2). The students performed poorly only because on question number 18 students took the paired sample size to be 40, not the correct value of 20. The students double-counted the paired data, a common error.

In question 24 the students were given a formula to calculate the t-statistic based on the correlation r and the sample size. This was new material. The students knew how to calculate the correlation r, although their success rate was a tad low at 66%. The most common error was using the value of r-squared. Graphs still display r-squared rather than r.

Questions 25 to 28 were conceptually introduced during the course, with the result this time based on the new calculations in 23 and 24. The intent here was to see whether the students could go beyond the edge of the material specifically covered in the course and reach a statistical conclusion. Note that the miscalculation of t-critical does not affect the decisions in questions 25 through 28. Thus success rates lifted. The upshot is that on the order of a third of the students not only showed mastery of statistics, but showed an ability to take that knowledge into new statistical terrain and apply the principles of statistics. This appears to be evidence that for that one third, the knowledge is not short term memorized information that will be lost at term's end, but a deeper understanding of statistics.

There was effectively no difference in performance by gender, neither in the overall course average nor final examination average.

The MS 150 Statistics course spring 2017 consisted of two sections, a total of 41 students, 24 female and 17 male. The two sections are kept in synch during the term. Both sections covered the same material, worked the same assignments, and gave presentations on the same topics. The sections met at 8:00 and 9:00 on a Monday-Wednesday-Friday schedule.

The final examination in MS 150 Statistics has tracked performance data against the three course learning outcomes since the fall of 2012.

Final examination performance on the first course learning outcome, basic statistics, has been stable around 80% since fall 2012. Spring 2017 saw an uptick to an 84% success rate. This improvement is not necessarily significant, not at this point. Performance tends to return to the long term mean of 80%. Although not significant, an uptick on material that was covered in January and February (two to three months ago) is encouraging. The

On paired data calculations, performance has remained stable near 70%. This term's 71% average success rate is in line with this long term success rate. There is far more inertia in a value than I suspect those in education comprehend. There is a penchant in education for "continuous improvement." These success rates are simply stable over long periods of time and reflect both the difficulty of the material as well as the many reasons students do not succeed on the material. For every student who did not do well, there is a complex back story.

The third subsection of the final examination cannot be compared on a term-on-term basis. The nature of the rubrics and scoring systems used to mark this section have changed over the terms. Prior to fall 2016 an open data exploration exercise on the final examination was answered as an essay question with the analysis marked by a rubric.

Fall 2016 term the existence of eight data exploration presentations provided a rich tapestry of data and insight into student comprehension. By the end of the term an observer could have sat in on the presentations and determined the level of mastery of statistics for each presenter. The presentation rubrics included student learning outcomes from the outline. By the end of the term I felt I had good data on the third course level student learning outcome. As a result, the final included only two questions in this area and did not demand an essay analysis.

The students were aware that the final examination would not have a significant impact on their grade. The students knew that the presentations carried the most weight in their grade, and that the final examination would weigh in at only slightly more than any one test or any one presentation. High stakes tests do not measure what the student is likely to retain but rather what the student could cram, memorize, regurgitate, forget. Projects are what students remember, and the presentations are the projects in statistics.

Spring 2017 the average for the "above and beyond" material on the final examination (questions 23 to 28) were used as a proxy for the ability of the students to engage in open data exploration and analysis. An attempt to use the last presentation generated data that would appear inconsistent. Over 90% of the students met or exceeded course level outcome number three on that presentation. To chart that value would be confusing against the historic average on this.

Performance by section on the final and in the course

The 8:00 section is impacted by transportation difficulties some students encounter. The students in the 8:00 section tend to have more absences and more frequently arrive late to class. The 8:00 class starts the term with fewer students and also a higher attrition rate than the 9:00 class. By term end I tend to attribute historically weaker performance in the 8:00 section to the higher rate of absences and late arrivals. There is also potentially a pre-selection factor. More organized students may register earlier and preferentially fill the later class - the 8:00 section is the last to fill during registration. Performance on the final examination and in the course by section shows weaker performance for the 8:00 section. The differences are on the order of one grade, with the 9:00 averaging one grade higher than the 8:00 section.

The final examination in MS 150 Statistics has tracked performance data against the three course learning outcomes since the fall of 2012.

Final examination performance by course learning outcome

On paired data calculations, performance has remained stable near 70%. This term's 71% average success rate is in line with this long term success rate. There is far more inertia in a value than I suspect those in education comprehend. There is a penchant in education for "continuous improvement." These success rates are simply stable over long periods of time and reflect both the difficulty of the material as well as the many reasons students do not succeed on the material. For every student who did not do well, there is a complex back story.

The third subsection of the final examination cannot be compared on a term-on-term basis. The nature of the rubrics and scoring systems used to mark this section have changed over the terms. Prior to fall 2016 an open data exploration exercise on the final examination was answered as an essay question with the analysis marked by a rubric.

Fall 2016 term the existence of eight data exploration presentations provided a rich tapestry of data and insight into student comprehension. By the end of the term an observer could have sat in on the presentations and determined the level of mastery of statistics for each presenter. The presentation rubrics included student learning outcomes from the outline. By the end of the term I felt I had good data on the third course level student learning outcome. As a result, the final included only two questions in this area and did not demand an essay analysis.

The students were aware that the final examination would not have a significant impact on their grade. The students knew that the presentations carried the most weight in their grade, and that the final examination would weigh in at only slightly more than any one test or any one presentation. High stakes tests do not measure what the student is likely to retain but rather what the student could cram, memorize, regurgitate, forget. Projects are what students remember, and the presentations are the projects in statistics.

Spring 2017 the average for the "above and beyond" material on the final examination (questions 23 to 28) were used as a proxy for the ability of the students to engage in open data exploration and analysis. An attempt to use the last presentation generated data that would appear inconsistent. Over 90% of the students met or exceeded course level outcome number three on that presentation. To chart that value would be confusing against the historic average on this.

*y-axis does not start at zero! vertical range is exaggerated!*

Over the long haul, regression to the mean is as inescapable in statistics as entropy is in physics. Means return to long term means and those long term means return to even longer term means. And in the world of final examination averages, moving those longer term means is very difficult. Term-on-term fluctuations are almost meaningless and should not be viewed as calls to action. Since 2012 the final examination percent has moved in a narrow range of plus or minus four percent from 74%. Spring 2017 was no exception with a 73.87% overall average on the final examination. While performance improved on basic statistics, the overall average across all questions.

Long term course average and standard deviation

The course average since 2007 has also remained stable and has tended to remain within four percent of 78%. This term's 83.8% course average is likely a reflection that higher averages are attained on the presentations than on quizzes. Providing more space in the curriculum for the presentations came at the expense of weekly quizzes. This curricular shift may lead to a rise in the long term course average.

The standard deviation of the students' individual course averages is also relatively stable around 15%. The amount of internal variation in student scores is fairly consistent term-on-term.

In general, given a list of numbers and spreadsheet software, students show a strong mastery of basic statistics, good capabilities with linear regressions, and more moderate abilities with confidence intervals and basic hypothesis testing. The presentations may be providing a boost in the ability of the students to use the statistics that they have learned.

The standard deviation of the students' individual course averages is also relatively stable around 15%. The amount of internal variation in student scores is fairly consistent term-on-term.

In general, given a list of numbers and spreadsheet software, students show a strong mastery of basic statistics, good capabilities with linear regressions, and more moderate abilities with confidence intervals and basic hypothesis testing. The presentations may be providing a boost in the ability of the students to use the statistics that they have learned.