### Assessing learning in introductory statistics

MS 150 Introduction to Statistics has utilized an outline based in part on the 2007 Guidelines for Assessment and Instruction in Statistics Education (GAISE), the spring 2016 draft GAISE update, and the ongoing effort at the college to incorporate authentic assessment in courses. The three course level student learning outcomes currently guiding MS 150 Introduction to Statistics are:

In the above chart the centers of the yellow topmost circles are located at the average success rate for the students on final examination questions under the first course learning outcome - basic single variable statistics. The chart reports results from 2012 to present. The radii are the standard deviations. The middle blue circles track performance under the second course level learning outcome, paired dependent data. The orange bottom-most circles track performance on the open data exploration and analysis.

Nineteen of twenty students enrolled in MS 150 Statistics summer 2016 sat the final examination.

The first course learning outcome focuses on basic statistics. Twenty-one questions on the final examination required the students to perform basic single variable statistical calculations on a small sample. Based on the item analysis the average success rate on this material was 81.7%, not significantly different from the 78.0% success rate of the 39 students who completed the final examination spring 2016. Of note is that the final examination for summer 2016 was identical to the final examination for spring 2016. Over the past four years average success rate on this material has been 80.6% and can be expected to vary by as much as 5%.

Performance on the second course learning outcome, linear regression statistics, was measured by six questions on the final examination. Student performance on this section was 65.8%. The four year average is 69.2%. This success rate varies by 6%, the difference of -2.1% is not significant. Student success on linear regressions has remained lower than success rates on basic statistical calculations. The stability of these values suggests that increased success rates would be challenging to achieve.

Performance on the third course learning outcome, open data exploration and analysis, is not comparable term-on-term. The scoring system for the open data exploration section of the final examination varies term-on-term. Performance is always weaker on this open data exploration and analysis section than on the first two learning outcomes. Students perform strongly when asked to calculate a specific statistic, students struggle when raw data and open ended questions are posed about the data. The students responded to this section with a single essay question set up using Schoology. This one question was then marked by the instructor.

The 48.7% student success rate seen on the third learning outcome this term represents the average score. Of the 19 students who took the final examination, only one made a fully correct analysis of the data, measuring the means and running a test for a significance difference in those means. The open data exploration this term explored two samples where the optimal solution would have been to calculate the means and then test for a significant difference in the means.

In the data provided this term, the sample means were different but not significantly different at a five percent risk of a type I error. Anecdotally, the students have more difficulty failing to reject a difference in the means than rejecting a difference in the means. The students see any difference as being real, the idea that the variation can effectively eliminate the reality of the difference is difficult for them to grasp. Bear in mind that in this portion of the exam the students are presented with raw data and an open question, they are not told how to analyze the data to answer the question.

Only one student realized that the solution was an independent samples t-test for a difference of the means, ran the test, and then correctly failed to reject a null hypothesis of no significant difference in the means. Another five students reported the means, noted that the means differed, ran a t-test for a difference in the means, and then incorrectly rejected the null hypothesis. Seven students calculated the means, noted that the means were different without running any statistical test for a difference of the means.

Five students did not attempt to utilize a measure of the middle to compare the samples. Some tried to cite differing maximum values as evidence of a difference.

In an educational world where a common goal is "continuously improving" best practices, the inert stability of the success rate above might be seen as a failure to continuously improve. The effort to continuously improve mathematics education overall goes back not to the new math of the 1960's but much, much further. Ultimately there are long term average success rates, and statistics assures us that numbers tend to return to long term averages. A look at the running cumulative mean success rate on the final examination since 2005 suggests that the longer term mean to which terms return might be improving, but even this statistic is subject to a tendency to return to an even longer term mean.

The course wide average has a long term average of 77.7%, the current term average is 79.2% . The radii of the circle is proportional to the standard deviation of the student averages in all three sections of the course. The standard deviation is fairly constant over time at about 15%.

This term the open data exploration exercises were each capped off with a presentation rather than a quiz. Performance on the open data explorations was marked using rubrics. Each rubric consisted of five to seven criteria and generated twenty to twenty-eight points. Three to five criteria were content oriented, one focused on the presentation software, and the final criteria rated on the presentor.

- Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
- Perform basic statistical calculations for paired correlated variables.
- Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests.

*Average success rate based on an analysis of the three sections of the final examination*

In the above chart the centers of the yellow topmost circles are located at the average success rate for the students on final examination questions under the first course learning outcome - basic single variable statistics. The chart reports results from 2012 to present. The radii are the standard deviations. The middle blue circles track performance under the second course level learning outcome, paired dependent data. The orange bottom-most circles track performance on the open data exploration and analysis.

Nineteen of twenty students enrolled in MS 150 Statistics summer 2016 sat the final examination.

The first course learning outcome focuses on basic statistics. Twenty-one questions on the final examination required the students to perform basic single variable statistical calculations on a small sample. Based on the item analysis the average success rate on this material was 81.7%, not significantly different from the 78.0% success rate of the 39 students who completed the final examination spring 2016. Of note is that the final examination for summer 2016 was identical to the final examination for spring 2016. Over the past four years average success rate on this material has been 80.6% and can be expected to vary by as much as 5%.

*Success rates on individual final exam items for 19 students*

Performance on the second course learning outcome, linear regression statistics, was measured by six questions on the final examination. Student performance on this section was 65.8%. The four year average is 69.2%. This success rate varies by 6%, the difference of -2.1% is not significant. Student success on linear regressions has remained lower than success rates on basic statistical calculations. The stability of these values suggests that increased success rates would be challenging to achieve.

Performance on the third course learning outcome, open data exploration and analysis, is not comparable term-on-term. The scoring system for the open data exploration section of the final examination varies term-on-term. Performance is always weaker on this open data exploration and analysis section than on the first two learning outcomes. Students perform strongly when asked to calculate a specific statistic, students struggle when raw data and open ended questions are posed about the data. The students responded to this section with a single essay question set up using Schoology. This one question was then marked by the instructor.

The 48.7% student success rate seen on the third learning outcome this term represents the average score. Of the 19 students who took the final examination, only one made a fully correct analysis of the data, measuring the means and running a test for a significance difference in those means. The open data exploration this term explored two samples where the optimal solution would have been to calculate the means and then test for a significant difference in the means.

In the data provided this term, the sample means were different but not significantly different at a five percent risk of a type I error. Anecdotally, the students have more difficulty failing to reject a difference in the means than rejecting a difference in the means. The students see any difference as being real, the idea that the variation can effectively eliminate the reality of the difference is difficult for them to grasp. Bear in mind that in this portion of the exam the students are presented with raw data and an open question, they are not told how to analyze the data to answer the question.

*Breakdown of solution quality for open data exploration on the final examination*

Only one student realized that the solution was an independent samples t-test for a difference of the means, ran the test, and then correctly failed to reject a null hypothesis of no significant difference in the means. Another five students reported the means, noted that the means differed, ran a t-test for a difference in the means, and then incorrectly rejected the null hypothesis. Seven students calculated the means, noted that the means were different without running any statistical test for a difference of the means.

Five students did not attempt to utilize a measure of the middle to compare the samples. Some tried to cite differing maximum values as evidence of a difference.

Overall success rate on the final examinations has been exceptionally stable over the past three years, and generally stable for the past decade. The long term average success rate is 73.8%, the current term saw a 77.0% success rate on basic and linear regression statistics.

*Final examination average since 2005*

In an educational world where a common goal is "continuously improving" best practices, the inert stability of the success rate above might be seen as a failure to continuously improve. The effort to continuously improve mathematics education overall goes back not to the new math of the 1960's but much, much further. Ultimately there are long term average success rates, and statistics assures us that numbers tend to return to long term averages. A look at the running cumulative mean success rate on the final examination since 2005 suggests that the longer term mean to which terms return might be improving, but even this statistic is subject to a tendency to return to an even longer term mean.

In general students who complete the course are able to successfully make basic statistical calculations on 74% of the questions posed.

The course average over time includes performance on homework, quizzes, and tests. Course level performance underlies course completion rates. Data on course level performance is available from 2007 forward.

This term the open data exploration exercises were each capped off with a presentation rather than a quiz. Performance on the open data explorations was marked using rubrics. Each rubric consisted of five to seven criteria and generated twenty to twenty-eight points. Three to five criteria were content oriented, one focused on the presentation software, and the final criteria rated on the presentor.

Criteria |
4 Excellent |
3 Good |
2 Satisfactory |
1 Needs improvement |

Basic statistics: Appropriate basic statistics calculated correctly and reported meaningfully. | All appropriate statistics reported in a meaningful manner | Appropriate basic statistics reported and cited in report | Some basic statistics reported | A few basic statistics cited. |

Nitrogen storage: Do nitrogen fixing trees store significantly more carbon in the soil than non-nitrogen fixing trees? | Answer is correct and supported by a fully appropriate statistical analysis | Answer is correct supported by statistics which do not provide evidence that answer is correct | Answer is correct but unsupported by numeric values | Result is incorrect. |

Strength of the difference: How strong is the effect size for this study? | Answer is correct and supported by fully appropriate statistical analysis | Answer is correct supported by statistics which do not provide evidence that answer is correct | Answer is correct but unsupported by numeric values | Result is incorrect. |

Presentation software: Original work submitted as presentation software, presentation is appropriate to the material and subject matter, presentation generally follows guidelines for a good presentation. | Presentation that heeds general presentation guidelines, avoids distracting visual extras, and is appropriate to the subject matter | presentation with only a few areas in which the presentation as a visual aid could be improved | Presentation with more than a few issues. Transitions distract from the content, timing is inappropriate, or other issues such that the visual aid becomes a distraction | Submission of a spreadsheet or other fundamental fault in the submission. |

Presentation mechanics: Presentor delivered clearly, concisely, demonstrated familiarity with the contents. | Well delivered exhibiting preparation and knowledge of the presentation. Spoke clearly and always towards the audience | Presentor showed evidence of preparation and some familiarity with the content of presentation. Usually faced the audience | Presentor was able to read the slides, sometimes with their back to the audience | Little evidence of preparation, unfamiliar with the slide contents, spoken facing the display panel." |

*A typical presentation rubric*
The open data exploration assignments were structured as assignments in the Schoology learning management system. Students had to submit by midnight on the day prior to the presentation, the assignment locking system in Schoology permitted this functionality.

*Schoology assignment editing screen, locking set at the bottom*

The average score on the four in-class presentations was 80.7%. During the previous term the average had been 80.6%.

*Students presenting in MS 150 Statistics*

The presentations were downloaded as a batch using the download all functionality of Schoology.

The students then presented using native Microsoft PowerPoint or LibreOffice.org Impress software.

An earlier article examined authentic assessment in the statistics course.

The item analysis of the twenty-seven final examination questions also provides insight on the success rate against the two general education program learning outcomes served by the course.

Program Learning Outcomes |
PLO |
PLO sum |
PLO n |
PLO% |

3.1 Demonstrate understanding and apply mathematical concepts in problem solving and in day to day activities | 3.1 | 14.526 | 17 | 85.4 |

3.2 Present and interpret numeric information in graphic forms | 3.2 | 6.579 | 10 | 65.8 |

Overall performance remains stable in this mature but evolving course.