### Assessing learning in introductory statistics

MS 150 Introduction to Statistics has utilized an outline based in part on the 2007 Guidelines for Assessment and Instruction in Statistics Education (GAISE) and on the ongoing effort at the college to incorporate authentic assessment in courses. The three course level student learning outcomes currently guiding MS 150 Introduction to Statistics are:

The first two outcomes involve basic calculation capabilities of the students and are assessed via an item analysis of the final examination. 67 students in three sections took the final examination.

The first course learning outcome focuses on basic statistics. Twenty-one questions on the final examination required the students to perform basic single variable statistical calculations on a small sample. Based on the item analysis, 82.5% of the items were answered correctly by the students. In the fall term 80.2% of the items were answered correctly. In general basic single variable statistical calculations are an area of strength for the students and performance tends to be stable term-on-term.

This term, spring 2015, the final examination was delivered using an on line test in Schoology. This is the first term that electronic testing has been utilized. Schoology permitted the use of fill in the blank with multiple correct answers possible. This allowed the test design to accommodate different results due to student rounding choices. Schoology also permitted an essay answer for the final section of the examination.

Fall 2014 the final examination was administered on paper. In both terms students were free to use Gnumeric, LibreOffice, or Excel to make calculations, and the final examination is open book. Students have a two hour time limit, the open book structure permits them to look up a forgotten formula much as a practicing statistician is permitted to do. That basic statistical performance was stable term-on-term suggests that the use of on line testing had a neutral impact on performance. An earlier affective domain assessment found that students had a positive reaction to taking tests on line.

Performance on the second course learning outcome was measured by nine questions on the final examination. Student performance on this section was lower at 69.6%. Fall 2014 the average was 68.2%. This section of the final examination has historically been weaker than the basic single variable statistics section, and that weakness was seen again spring 2015. The term-on-term performance, however, is stable and the use of an on line examination again shows no significant impact on performance.

Performance on the third course learning outcome, open data exploration and analysis, as measured by points awarded is not comparable term-on-term. The scoring system for the open data exploration section of the final examination varies term-on-term. Performance is always weaker on this open data exploration and analysis section than on the first two learning outcomes. Students perform strongly when asked to calculate a specific statistic, students struggle when raw data and open ended questions are posed about the data. The students responded to this section with a single essay question set up using Schoology. This one question was then marked by the instructor. This term, due more to the vagaries of the scoring rubric, performance was improved term-on-term on a percentage basis.

In the above chart the centers of the yellow topmost circles are located at the average success rate for the students on questions under the first course learning outcome - basic single variable statistics. The chart reports results from 2012 to present. The radii are the standard deviations. The middle blue circles track performance under the second course level learning outcome, paired dependent data. The orange bottom-most circles track performance on the open data exploration and analysis. This open data exploration and analysis section was introduced in 2012.

The third student learning outcome, open data analysis, was separately assessed using a simple rubric that looked at whether a student made an appropriate statistical analysis with a correct conclusion. Optimally the students would find that the means are different and then run a test for a difference of means either using confidence intervals or a t-test for a difference of independent sample means.

Optimal statistical analysis, correct conclusion: 0.22

Optimal statistical analysis, incorrect conclusion: 0.06

Minimal statistical analysis, correct conclusion: 0.12

Minimal statistical analysis, incorrect conclusion: 0.26

Inappropriate statistical analysis: 0.09

No statistical analysis: 0.14

Blank: 0.11

The course wide average has varied from 72% to 83% with a long term average of 77.8%. Spring 2015 the course average was 74.5%, down an insignificant 1.6% term-on-term. The radii of the circle is proportional to the standard deviation of the student averages in all three sections of the course. The standard deviation is fairly constant over time at about 15%.

That performance as measured across homework, quizzes, tests, midterm, and final was stable term-on-term was a surprise. In prior terms the total possible points on quizzes and tests was set post-hoc based on the highest score obtained. The highest score was often but not always the total possible. The result was that even if there were no perfect papers, someone would score 100%.

This spring term the use of Schoology meant that the total possible points was preset prior to deploying the quiz or test. The result was a more challenging course from a student performance point of view. Only perfectly correct quizzes and tests earned 100% of the possible points. Thus rigor was increased without a decrease in performance.

The quiz and test settings were set so that students could see which questions they obtained right and wrong immediately after submitting their quiz or test. From the reaction of the students, this was a liked feature and a potentially motivating feature. Due to the layout of the computer laboratory, there was a risk that neighbor students would use the revealed wrong answers to potentially correct their answers. Observation in the classroom did not indicate that students were doing this. The showing of which were right and wrong (without showing the correct answers to those that were wrong) appears to have been a motivating factor and may have contributed to the term-on-term stability of the overall course average. While some instructors would likely be reluctant to run quizzes and tests this way, the motivating impact of immediate feedback is worth the risk.

Each term the curriculum is adjusted and modified. Sometimes the adjustments are planned in advance, such as the introduction of open data exploration in 2012. At other times the modification is made on the fly. This spring term the open data exploration section of the course was wrapped up with having the students present an analysis of a data set to the class.

This assignment mimicked the sort of assignment one might receive in an institutional or corporate setting. One is given data to analyze and has 48 to prepare a presentation for a group of colleagues on that data. For students who are still struggling to understand the basic concepts, this was a very challenging and authentic assessment.

The students had a hard and fast deadline for submission of their data presentation, this was made possible by the use of a preset locking time in Schoology. The assignments were left unmarked until after the presentations. In the image above two students are presenting their analysis posted in Schoology. No comments on the correctness of an analysis were shared in class. Each presentation was permitted to stand on its own without critique. Questions from the other students were encouraged but generally not asked.

- Perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.
- Perform basic statistical calculations for paired correlated variables.
- Engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests.

The first two outcomes involve basic calculation capabilities of the students and are assessed via an item analysis of the final examination. 67 students in three sections took the final examination.

The first course learning outcome focuses on basic statistics. Twenty-one questions on the final examination required the students to perform basic single variable statistical calculations on a small sample. Based on the item analysis, 82.5% of the items were answered correctly by the students. In the fall term 80.2% of the items were answered correctly. In general basic single variable statistical calculations are an area of strength for the students and performance tends to be stable term-on-term.

This term, spring 2015, the final examination was delivered using an on line test in Schoology. This is the first term that electronic testing has been utilized. Schoology permitted the use of fill in the blank with multiple correct answers possible. This allowed the test design to accommodate different results due to student rounding choices. Schoology also permitted an essay answer for the final section of the examination.

Fall 2014 the final examination was administered on paper. In both terms students were free to use Gnumeric, LibreOffice, or Excel to make calculations, and the final examination is open book. Students have a two hour time limit, the open book structure permits them to look up a forgotten formula much as a practicing statistician is permitted to do. That basic statistical performance was stable term-on-term suggests that the use of on line testing had a neutral impact on performance. An earlier affective domain assessment found that students had a positive reaction to taking tests on line.

Performance on the second course learning outcome was measured by nine questions on the final examination. Student performance on this section was lower at 69.6%. Fall 2014 the average was 68.2%. This section of the final examination has historically been weaker than the basic single variable statistics section, and that weakness was seen again spring 2015. The term-on-term performance, however, is stable and the use of an on line examination again shows no significant impact on performance.

Performance on the third course learning outcome, open data exploration and analysis, as measured by points awarded is not comparable term-on-term. The scoring system for the open data exploration section of the final examination varies term-on-term. Performance is always weaker on this open data exploration and analysis section than on the first two learning outcomes. Students perform strongly when asked to calculate a specific statistic, students struggle when raw data and open ended questions are posed about the data. The students responded to this section with a single essay question set up using Schoology. This one question was then marked by the instructor. This term, due more to the vagaries of the scoring rubric, performance was improved term-on-term on a percentage basis.

In the above chart the centers of the yellow topmost circles are located at the average success rate for the students on questions under the first course learning outcome - basic single variable statistics. The chart reports results from 2012 to present. The radii are the standard deviations. The middle blue circles track performance under the second course level learning outcome, paired dependent data. The orange bottom-most circles track performance on the open data exploration and analysis. This open data exploration and analysis section was introduced in 2012.

The third student learning outcome, open data analysis, was separately assessed using a simple rubric that looked at whether a student made an appropriate statistical analysis with a correct conclusion. Optimally the students would find that the means are different and then run a test for a difference of means either using confidence intervals or a t-test for a difference of independent sample means.

Optimal statistical analysis, correct conclusion: 0.22

Optimal statistical analysis, incorrect conclusion: 0.06

Minimal statistical analysis, correct conclusion: 0.12

Minimal statistical analysis, incorrect conclusion: 0.26

Inappropriate statistical analysis: 0.09

No statistical analysis: 0.14

Blank: 0.11

Twenty-eight percent of the students performed an optimal analysis, yet even after having performed an appropriate and complete analysis, only twenty-two percent reached the correct conclusion that the difference was not statistically significant. In general, the students tend to see any difference in means as indicative of a change. Students tend to obtain higher success rates on questions for which two means are significantly different. The students still have difficulties looking at two unequal means and understanding that despite the difference, the two means are not significantly different.

A minimally supported answer was typically one in which the difference in the means was noted. Some students buttressed their conclusion by noting the difference was small relative to the standard deviations, a reasonable observation. Twelve percent considered the difference in means to not be sufficient to be significant, but ran no statistical test to confirm this conclusion. Twenty-six percent looked only at the difference in the means and pronounced that there was a significant difference in the means.

A few students argued from irrelevant statistics such as a calculation of the slope (the samples were independent samples). Other students drew a conclusion but cited no statistics, no numeric support for their statements. I had cautioned the students a number of times in regard to the need for numeric statistical support: statements without numeric support, numeric values, statistical reasoning, would be marked as a zero. This tends to artificially deflate the average on the open data exploration section - there were twenty scores of zero due to either a numerically unsupported analysis or a blank answer. Eleven percent of the students left the open data exploration essay question on the final blank.

In summary performance against the three student learning outcomes might be characterized as:

82.5% of the students are able to perform basic statistical calculations for a single variable up to and including graphical analysis, confidence intervals, hypothesis testing against an expected value, and testing two samples for a difference of means.

69.6% of the students are able to perform basic statistical calculations for paired correlated variables.

28% of the students are able to engage in data exploration and analysis using appropriate statistical techniques including numeric calculations, graphical approaches, and tests; 22% reach a correct conclusion based on that analysis.

Overall success rate on the final examinations has been generally stable over the past ten years. The long term average success rate is 72.3%, the current term saw a 75.9% success rate on basic and linear regression statistics. Prior to 2012 open data exploration was not included in the final examination, that section is excluded from this longer term analysis.

While the term-on-term final examination success rate average is down slightly, the effect of regression to the long term mean cannot be discounted. The phrase "continuously improving" is often heard in the hallways of education, a phrase that seems almost blithely ignorant of the tendency of systems to return towards a long term mean. A look at the running cumulative mean success rate on the final examination since 2005 suggests that the longer term mean to which terms return might be improving, but even this statistic is subject to a tendency to return to an even longer term mean.

In general students who complete the course are able to successfully make basic statistical calculations on 72% to 74% of the questions posed.

The course average over time includes performance on homework, quizzes, and tests. Course level performance underlies course completion rates. Data on course level performance is available from 2007 forward.

That performance as measured across homework, quizzes, tests, midterm, and final was stable term-on-term was a surprise. In prior terms the total possible points on quizzes and tests was set post-hoc based on the highest score obtained. The highest score was often but not always the total possible. The result was that even if there were no perfect papers, someone would score 100%.

This spring term the use of Schoology meant that the total possible points was preset prior to deploying the quiz or test. The result was a more challenging course from a student performance point of view. Only perfectly correct quizzes and tests earned 100% of the possible points. Thus rigor was increased without a decrease in performance.

The quiz and test settings were set so that students could see which questions they obtained right and wrong immediately after submitting their quiz or test. From the reaction of the students, this was a liked feature and a potentially motivating feature. Due to the layout of the computer laboratory, there was a risk that neighbor students would use the revealed wrong answers to potentially correct their answers. Observation in the classroom did not indicate that students were doing this. The showing of which were right and wrong (without showing the correct answers to those that were wrong) appears to have been a motivating factor and may have contributed to the term-on-term stability of the overall course average. While some instructors would likely be reluctant to run quizzes and tests this way, the motivating impact of immediate feedback is worth the risk.

Each term the curriculum is adjusted and modified. Sometimes the adjustments are planned in advance, such as the introduction of open data exploration in 2012. At other times the modification is made on the fly. This spring term the open data exploration section of the course was wrapped up with having the students present an analysis of a data set to the class.

This assignment mimicked the sort of assignment one might receive in an institutional or corporate setting. One is given data to analyze and has 48 to prepare a presentation for a group of colleagues on that data. For students who are still struggling to understand the basic concepts, this was a very challenging and authentic assessment.

The students had a hard and fast deadline for submission of their data presentation, this was made possible by the use of a preset locking time in Schoology. The assignments were left unmarked until after the presentations. In the image above two students are presenting their analysis posted in Schoology. No comments on the correctness of an analysis were shared in class. Each presentation was permitted to stand on its own without critique. Questions from the other students were encouraged but generally not asked.