Assessing Learning in Introductory Statistics

MS 150 Statistics is an introductory statistics course with a focus on statistical operations and methods. The course is guided by the 2007 Guidelines for Assessment and Instruction in Statistics Education (GAISE), the spring 2016 draft GAISE update, and the ongoing effort at the college to incorporate authentic assessment in courses. A history of the evolution open data open data exploration exercises and associated presentations as authentic assessment in the course was covered in a May 2017 report.

Maileen and Aziela present basic statistics on anonymous blood glucose levels 

Three course level student learning outcomes currently guide MS 150 Introduction to Statistics:
  • Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
  • Perform basic statistical calculations for paired correlated variables.
  • Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests.
The course wrapped up coverage of content four weeks prior to the end of the term. This was a week earlier than prior terms, but one of the weeks in the final four was the Easter holidays. The content was compressed by one day over prior terms by the dropping of the material in section 9.12 in the textbook. The material in 9.12 repeatedly led to students making errors in subsequent calculations of confidence intervals for sample sizes less than 30. 

The calculation of the margin of error for the mean was also dropped from the curriculum. Confidence intervals for the mean were calculated from t-critical multiplied by the standard error. The margin of error was not adding anything to the course, and the linkage of two standard errors to the margin of error is more an artifact of historic simplification of calculations of a 95% confidence interval. The students would glom onto the two and even after learning about t-critical would return to using two in subsequent calculations. This term the option of using two for sample sizes above thirty was never mentioned. This then disconnects confidence intervals from ordinary and extraordinary z-scores, but that connection was tenuous at best.

In the final four weeks, the students engaged in a series of three data analysis, exploration, and presentation exercises. The course then ended with a final examination which was completed as an online test inside Schoology


In the fall of 2018 the spring 2018 final examination was posted as a practice test. That was positively received by the students and saw broad engagement by the students. Spring 2019 final examinations from fall 2017, spring 2018, and fall 2018 were all posted as practice tests. The ending of the term on a partial week meant that the students had two in class days to work on the practice finals in addition to any practice they may have done outside of class time. 

Statistics spring 2019 final examination

The use of prior final examinations as a practice tests was made possible in part due to the use of Google Sheets for data sharing starting in spring 2017 and the continued use of Google Sheets in subsequent terms. Spring 2017 was the first term to use Google Sheets as the supporting software for the course from day one.


Spring 2018 saw the adoption of Schoology Institutional and the further deepening of integration between the course and Google Sheets via the Google Drive Assignments application in Schoology.

The basic structure of the final examination has been fairly stable over time, and performance on an item-by-item basis is also fairly stable over time.  

Thirty-three students sat the final examination spring 2018. The table depicts the percent success rate on each item on the final examination across seven terms.

Databars for the data

The first twelve questions covered basic one variable statistics. Questions 13 to 16 involved constructing a 95% confidence interval. Questions 17 to 21 covered statistics of two variable dependent variables. Note the removal of the calculation of the margin of error above. The intent of the dropping of section 9.12 was to improve performance on calculation of the 95% confidence interval lower and upper bounds. While there was term-on-term improvement, there was not year-on-year improvement.

Students evidenced strength in calculating basic statistics of both one and two variables. Performance on basic statistics was the strongest in the history of the course. Whether this was a result of this being the first term with three practice final exams available is not knowable. Strength in basic statistical calculations and the removal of the relatively lower performing margin of error calculation lifted the final examination overall average to the highest performance on the final since 2012. This is detailed further below.

The MS 150 Statistics course spring 2019 consisted of two sections, a total of 54 students enrolled as of term end, 33 females and 21 males. The two sections are kept in curricular synchronization during the term. Both sections covered the same material, worked the same assignments, and gave presentations on the same topics. The sections met at 8:00 and 9:00 on a Monday-Wednesday-Friday schedule.

Performance by section on the final and in the course

This term the 8:00 and 9:00 section showed no significant difference in performance on the final examination. The difference in the overall course average between the two sections was significant this term, fall 2018 this difference did not rise to significance. Underperformance by the 8:00 section is anecdotally attributed to transportation and attendance issues that are more pronounced in the 8:00 section than the 9:00 section.

There were 33 females and 21 males in the two sections of statistics.
Performance differences by gender in the course and on the final exam

Gender differences were statistically significant for the course (p=0.045) but not for the final examination (p=0.083). The effect size for the gender differential in the course was medium.

Gender versus section via http://threegraphs.com/ Vertical axis does not start at zero.

The gender differentials interact with the section. The males at 8:00 have a strongly depressed performance against males at 9:00 or either of the female averages in the two sections. Data can drive one to some curious places, such as a recommendation that males avoid the 8:00 AM section of the course.

Underlying this data is a hidden bias. During early registration the first students to sign up for classes fill the 9:00 section. Students who are perhaps less organized and who do not accomplish early registration wind up in the 8:00 section. Perhaps these are students who are inherently less able to plan ahead, which may impact their academic performance.

The performances of the males in the 9:00 section and of the females in both sections averages above 80%, a strong performance in a course as challenging as statistics. As taught, the course makes statistics accessible.

Course performance over time

The introduction of Schoology Institutional in January 2018 has made possible tracking of performance based on student learning outcomes. Prior to January 2018 Schoology Basic permitted the entering of student learning outcomes, but the Basic version does not provide access to the Mastery screen. Once the college adopted the institutional version, however, Mastery data from as far back as the instructor measured against student learning outcomes becomes available. Data across six terms is reported for the following three learning outcomes:

1.0 Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
2.0 Perform basic statistical calculations for paired correlated variables.
3.3 Draw conclusions based on statistical analyses and tests, obtain answers to questions about the data, supported by appropriate statistics

Performance on student learning outcome over five terms. 

Final examination performance on the first course learning outcome, 1.0 basic statistics, has been known to be generally stable around 80% since fall 2012. A success rate of 83% for spring 2019 is statistically identical to fall 2018 and on par for this particular outcome. The item analysis average on the final for questions in this area was 95%, providing strong support for the result from the student learning outcome data reported by Schoology. Note that the Schoology result is an aggregation of performances on 24 assignments during the term.

Performance on the second student learning outcome, 2.0 paired data calculations, has remained stable near 78%. A success rate of 84% for fall 2018 is an improvement for performance on student learning outcome two. The average on the final for paired data calculations was 80%, in good agreement with the student learning outcome success rate. The performance on the final examination is slightly inflated by the removal of two predictive questions on the final examination this term, questions on which the students had lower success rates.

Student learning outcome, 3.0 Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests, includes specific learning outcomes that unavoidably overlap student learning outcome one material. Students still must demonstrate basic statistical competencies when working on an open data exploration exercise.  Hence only performance on 3.3 is reported because drawing conclusions based on statistical analyses and tests and obtaining answers to questions about the data is the core intent of the third learning outcome.

Multi-term performance on outcome 3.3 averages 72% and this term's 60% average success rate across seven assignments during the course is well below that longer term average. This learning is not tested on the final examination.

As an educator, I am aware that there is a penchant in education for "continuous improvement." The reality is that there is far more inertia in a value than I suspect those in education comprehend. These success rates tend to be stable over long periods of time and reflect both the difficulty of the material as well as the many reasons students do not succeed on the material. For every student who did not do well, there is a complex back story. Data for success on the final examination demonstrates this longer term stability and the tendency to return to the long term average.


Long term lack of a trend in final examination averages Fall 2005 - Spring 2019
y-axis does not start at zero nor end at 100! vertical range is exaggerated! 

Perhaps for every rule there is an exception. Over the long haul, regression to the mean is as inescapable in statistics as entropy is in physics. Means return to long term means and those long term means return to even longer term means. And in the world of final examination averages, moving those longer term means is very difficult. That said, performance on the final examination saw an uptick fall 2018 and an even stronger term-on-term improvement spring 2019. The final examination average this term of 85% is the strongest performance since data was first tracked in 2006. Fall 2018 had a single practice final examination, spring 2019 deployed three practice final examinations. The spring 2019 final examination was also missing three questions which had not performed well in the past (margin of error with a 62% success rate, predict a y value given an x with a 37% success rate, and infer an x value given a y value with a 32% success rate).


Long term course average and standard deviation

The course average since 2007 has also remained stable and has tended to remain within six percent of 78%. This term's 83% course average remains within the historic range. Continuous improvement is a laudable goal, the reality is that values return to long term means.

The standard deviation of the course averages is also relatively stable remaining within four percent either way from the long term average of 15%. Spring 2019's 13% continues this stability in variation.

Overall, given a list of numbers and spreadsheet software, students show a strong mastery of basic statistics, good capabilities with linear regressions, and more moderate abilities with confidence intervals and open data exploration.

Comments

Popular posts from this blog

Box and whisker plots in Google Sheets

Setting up a boxplot chart in Google Sheets with multiple boxplots on a single chart

Creating histograms with Google Sheets