Assessing learning in introductory statistics

MS 150 Introduction to Statistics has utilized an outline based in part on the 2007 Guidelines for Assessment and Instruction in Statistics Education (GAISE), the spring 2016 draft GAISE update, and the ongoing effort at the college to incorporate authentic assessment in courses. The three course level student learning outcomes currently guiding MS 150 Introduction to Statistics are:
  1. Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
  2. Perform basic statistical calculations for paired correlated variables.
  3. Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests.
The first two outcomes involve basic calculation capabilities of the students and are assessed via an item analysis of the final examination (original was a test inside Thirty-nine students in two sections took the final examination.

Average success rate based on an analysis of the three sections of the final examination

In the above chart the centers of the yellow topmost circles are located at the average success rate for the students on final examination questions under the first course learning outcome - basic single variable statistics. The chart reports results from 2012 to present. The radii are the standard deviations. The middle blue circles track performance under the second course level learning outcome, paired dependent data. The orange bottom-most circles track performance on the open data exploration and analysis.

Nineteen of twenty students enrolled in MS 150 Statistics summer 2016 sat the final examination.

The first course learning outcome focuses on basic statistics. Twenty-one questions on the final examination required the students to perform basic single variable statistical calculations on a small sample. Based on the item analysis the average success rate on this material was 81.7%, not significantly different from the 78.0% success rate of the 39 students who completed the final examination spring 2016. Of note is that the final examination for summer 2016 was identical to the final examination for spring 2016. Over the past four years average success rate on this material has been 80.6% and can be expected to vary by as much as 5%.

Success rates on individual final exam items for 19 students

Performance on the second course learning outcome, linear regression statistics, was measured by six questions on the final examination. Student performance on this section was 65.8%. The four year average is 69.2%. This success rate varies by 6%, the difference of -2.1% is not significant. Student success on linear regressions has remained lower than success rates on basic statistical calculations. The stability of these values suggests that increased success rates would be challenging to achieve.

Performance on the third course learning outcome, open data exploration and analysis, is not comparable term-on-term. The scoring system for the open data exploration section of the final examination varies term-on-term. Performance is always weaker on this open data exploration and analysis section than on the first two learning outcomes. Students perform strongly when asked to calculate a specific statistic, students struggle when raw data and open ended questions are posed about the data. The students responded to this section with a single essay question set up using Schoology. This one question was then marked by the instructor.

The 48.7% student success rate seen on the third learning outcome this term represents the average score. Of the 19 students who took the final examination, only one made a fully correct analysis of the data, measuring the means and running a test for a significance difference in those means. The open data exploration this term explored two samples where the optimal solution would have been to calculate the means and then test for a significant difference in the means.

In the data provided this term, the sample means were different but not significantly different at a five percent risk of a type I error. Anecdotally, the students have more difficulty failing to reject a difference in the means than rejecting a difference in the means. The students see any difference as being real, the idea that the variation can effectively eliminate the reality of the difference is difficult for them to grasp. Bear in mind that in this portion of the exam the students are presented with raw data and an open question, they are not told how to analyze the data to answer the question.

Breakdown of solution quality for open data exploration on the final examination

Only one student realized that the solution was an independent samples t-test for a difference of the means, ran the test, and then correctly failed to reject a null hypothesis of no significant difference in the means. Another five students reported the means, noted that the means differed, ran a t-test for a difference in the means, and then incorrectly rejected the null hypothesis. Seven students calculated the means, noted that the means were different without running any statistical test for a difference of the means.

Five students did not attempt to utilize a measure of the middle to compare the samples. Some tried to cite differing maximum values as evidence of a difference.

Overall success rate on the final examinations has been exceptionally stable over the past three years, and generally stable for the past decade. The long term average success rate is 73.8%, the current term saw a 77.0% success rate on basic and linear regression statistics. 

Final examination average since 2005

In an educational world where a common goal is "continuously improving" best practices, the inert stability of the success rate above might be seen as a failure to continuously improve. The effort to continuously improve mathematics education overall goes back not to the new math of the 1960's but much, much further. Ultimately there are long term average success rates, and statistics assures us that numbers tend to return to long term averages. A look at the running cumulative mean success rate on the final examination since 2005 suggests that the longer term mean to which terms return might be improving, but even this statistic is subject to a tendency to return to an even longer term mean.

Note that the y-axis does not start at zero: exaggerated vertical scale

In general students who complete the course are able to successfully make basic statistical calculations on 74% of the questions posed.

The course average over time includes performance on homework, quizzes, and tests. Course level performance underlies course completion rates. Data on course level performance is available from 2007 forward.

Course average over the past eight years

The course wide average has a long term average of 77.7%, the current term average is 79.2% . The radii of the circle is proportional to the standard deviation of the student averages in all three sections of the course. The standard deviation is fairly constant over time at about 15%.

This term the open data exploration exercises were each capped off with a presentation rather than a quiz. Performance on the open data explorations was marked using rubrics. Each rubric consisted of five to seven criteria and generated twenty to twenty-eight points. Three to five criteria were content oriented, one focused on the presentation software, and the final criteria rated on the presentor.

Criteria 4 Excellent 3 Good 2 Satisfactory 1 Needs improvement
Basic statistics: Appropriate basic statistics calculated correctly and reported meaningfully. All appropriate statistics reported in a meaningful manner Appropriate basic statistics reported and cited in report Some basic statistics reported A few basic statistics cited.
Nitrogen storage: Do nitrogen fixing trees store significantly more carbon in the soil than non-nitrogen fixing trees? Answer is correct and supported by a fully appropriate statistical analysis Answer is correct supported by statistics which do not provide evidence that answer is correct Answer is correct but unsupported by numeric values Result is incorrect.
Strength of the difference: How strong is the effect size for this study? Answer is correct and supported by fully appropriate statistical analysis Answer is correct supported by statistics which do not provide evidence that answer is correct Answer is correct but unsupported by numeric values Result is incorrect.
Presentation software: Original work submitted as presentation software, presentation is appropriate to the material and subject matter, presentation generally follows guidelines for a good presentation. Presentation that heeds general presentation guidelines, avoids distracting visual extras, and is appropriate to the subject matter presentation with only a few areas in which the presentation as a visual aid could be improved Presentation with more than a few issues. Transitions distract from the content, timing is inappropriate, or other issues such that the visual aid becomes a distraction Submission of a spreadsheet or other fundamental fault in the submission.
Presentation mechanics: Presentor delivered clearly, concisely, demonstrated familiarity with the contents. Well delivered exhibiting preparation and knowledge of the presentation. Spoke clearly and always towards the audience Presentor showed evidence of preparation and some familiarity with the content of presentation. Usually faced the audience Presentor was able to read the slides, sometimes with their back to the audience Little evidence of preparation, unfamiliar with the slide contents, spoken facing the display panel."
A typical presentation rubric

The open data exploration assignments were structured as assignments in the Schoology learning management system. Students had to submit by midnight on the day prior to the presentation, the assignment locking system in Schoology permitted this functionality.  

Schoology assignment editing screen, locking set at the bottom

The average score on the four in-class presentations was 80.7%. During the previous term the average had been 80.6%. 

Students presenting in MS 150 Statistics

The presentations were downloaded as a batch using the download all functionality of Schoology.

The students then presented using native Microsoft PowerPoint or Impress software. 

An earlier article examined authentic assessment in the statistics course

The item analysis of the twenty-seven final examination questions also provides insight on the success rate against the two general education program learning outcomes served by the course.

Program Learning Outcomes PLO PLO sum PLO n PLO%
3.1 Demonstrate understanding and apply mathematical concepts in problem solving and in day to day activities 3.1 14.526 17 85.4
3.2 Present and interpret numeric information in graphic forms 3.2 6.57910 65.8

Overall performance remains stable in this mature but evolving course. 


Popular posts from this blog

Box and whisker plots in Google Sheets

Areca catechu leaf sheaf petiole plates

Setting up a boxplot chart in Google Sheets with multiple boxplots on a single chart