### Performance on MS 150 Statistics final examination

Assessment is sometimes about repeating a set of measurements term-after-term to look at long term trends. In SC 130 Physical Science I have run an assessment of a program learning outcome focused on numeric information presented graphically for the past ten course terms. This provides longer term trend analysis and permits insight into term-on-term natural variability. These longer term studies are important and crucial to assessment.

Assessment is sometimes seeing something tantalizing and swinging around to chase down a gut sense that something just happened inside a data set.

MS 150 Statistics consists of three sections with up to thirty students per section. Final examinations stretch out over three days. By happenstance of the final exam calendar, each section is on a different consecutive day. The 8:00 section (m08) has their final at 8:00 on the first day of final exams. The 10:00  section (m10) is on day two at 10:05. The 9:00 section (m09) is on the third day at 8:00 in the morning.

A new final examination is crafted every term, with this term's final examination centered on turf grass. The first 32 questions tested very basic statistical skills. The last 20 points were tied up in an open data exploration analysis where the students are given data and a set of questions to answer. Marking is based on an appropriate analysis with correctly done numeric calculations in support of their answers.

 Student performance on the first 32 basic statistics questions

Old finals are posted on line and left on line. The first thirty-some questions are highly predictable and easy to study for. The students generally do well on this section. The thirty-two questions include calculating the mode, median, mean, standard deviation, standard error of the mean, the quartiles, making a box plot, and generating a histogram. Each question is marked right or wrong. There were 32 possible points on this section. The lowest score was 15 correct in the 10:00 class, the 9:00 class had the highest score with a 32 perfect score on this section. The medians were 26, 26, and 25 respectively. The means on this section were 25.8, 24.8, and 25.1. An analysis of variance confirmed that these means were not significantly different. The overall average of 25.23 represented a 79% success rate, which is a strong performance in a course such as statistics.

The open data exploration is less predictable. In general, there are five different analysis paths that might be taken with the data. Chapter twelve in the text book includes a flow chart of the five options. Note that the course does not cover analysis of variance for situations with more than two variables.

Having advance knowledge of the open data exploration question could give sections that take the final on the second and third day a study advantage. Knowing only the layout could still reduce the number of possible options.

Yet when marking the final examinations I had the sense that the 10:00 and 9:00 sections were each successively weaker in their performance on the open data exploration. Among other things, this suggested that students in the sections that took the test on the second and third day did not seek out useful information from the students who took the test on the first day. I do not do anything to try to firewall the information, and I do not counsel the students to not share information. That may have the opposite effect. Rather I expect that students in the later sections will try to suss out the nature of the open data exploration and the rest of the final. To not do so would be almost a form of poor study habits on the part of a student.

On the 20 point open data exploration the medians fell from 10 on the first day to 8 on the second to 5 on the third day. Note that the medians represent performance at or below 50%. Due to differences in the distributions, the means followed a slightly different pattern of 10.2, 6.9, and 7.1. The differences in the means were not significant at an alpha of 0.05, the p-value was 0.07. That said, this much separation would be seen in identical data sets only seven percent of the time. Given that an alpha of 0.10 is not unreasonable, there is the suggestion that the three sections did not perform identically. A test of a difference in medians between the first day and the third day also yields a p-value of 0.07. There is certainly no suggestion that there was a benefit to taking the final on the second and third day, no evidence that the material leaked in a form that was useful to the students in the 10:00 and 9:00 sections.

One obvious question to ask might be whether the 8:00 section was an academically stronger section. Bearing in mind that the course average now includes the final examination, the means by section in the same order as the results above were 0.77, 0.74, and 0.77. The 10:00 section was slightly lower, but the differences were not significant. In theory the three sections were equally capable entering the final examination.

With differences prior to the final examination not being present, I am left wondering to what extent the position of the 9:00 final on the third day impacts performance. Bear in mind that the final examination is open book. No one walks into my office, takes away my books and computer and says, "Analyze this!" I want to know what a student can do when equipped with the resources that they are likely to have in the work place. The final is not some sterile white room exercise. Thus studying for the final is probably less of a factor than it would be if the exam were closed book.

That leaves the possibility of some form of "end of finals" impact, some type of "examination exhaustion" where the students are simply anxious to get what is likely their last final done so they can start their summer break. At the end of two hours only three of twenty-seven students were in the 9:00 section exam room. Students had started to leave after 70 minutes of the 120 minute examination. I had the impression that they were tossing in the towel and hitting the showers as they left.

Stepping back from the details of the sectional differences, overall the performance difference between "calculate this" section (79%) and the open data exploration (41%)  is stark. A similar difference was seen last term.

Performance term-on-term on the basic statistics section of final examination saw no change from 78.8% to 79.8% correct.

Performance term-on-term on the open data exploration section of the final examination dropped from 57.2% to 40.5%. This drop was an artifact of strongly different marking rubrics. The marking rubric is not revealed in advance to the students as the rubric would give away information on the correct approach to the analysis.

The drop in the open data exploration section dropped the overall average on the final examination from 70.2%  fall 2013 to 64.1% spring 2014. This 64.1%represents the lowest average in the recent history of the course. In theory the open data exploration should have been an easy exercise as "unequal columns" (two samples of a different sample size) usually leads only to a test for a difference of means between two independent samples. In the text this is the only option mapped out for "unequal columns."

 A graphical look at performance on the final, blue overall mean, yellow basic stats mean, orange open data exploration mean

The open data exploration has been a part of the final examination for only four terms, and only for last term and this term have statistics for the two portions of the final been separately tracked. The term marks a fourth consecutive terms of a downward trend in the average, although performance on basic statistical calculations remains high.

I am able to teach a student to use a spreadsheet to calculate a mean when directed to do so, I am still struggling to move the students towards knowing what do when the questions are more open ended and less structured. But then I can teach and have taught elementary and high school students to use a spreadsheet to make basic calculations when directed to do so. Synthesizing their knowledge and using it higher on Bloom's taxonomy is still a challenge for the students. And that is an assessment I can use to work on improving the course.

Although the performance on the final was weaker this term than in the past, overall performance in the course, while down from last term, was on par with the historical average since the course went open book in 2007. Understanding the tendency of values to return to the mean requires knowing that long term mean, which in statistics stands at 77.9% for the course. This term the course average was 76.1%. With an average long term standard deviation of 3.3% for the means, that 1.8% difference is not significant. The course is performing on par with historic average performance levels. Only the final examination is low, and that only due to the open data exploration.

 MS 150 Statistics course average, current term on the far right

Assessment means being a researcher in your own classroom. Exploring data, especially when learning does not occur at the desired level. Meta-data and institutional level forms are important to program and institutional level assessment. At the bottom of the pyramid, however, assessment has to rely on each instructor being their own learning assessment researcher.

Post-script. For reference, the final examination open data exploration questions were:

• Does Bermuda grass or Zoysiagrass provide a higher average green coverage?
• Is the difference significant?
• Looking only at green coverage, does it matter which the college uses and, if so, why?

The data is provided on the examination. Students were instructed to provide numeric support for their answers. The class had spent just over two weeks on open data exploration and the students had completed two quizzes with feedback.

Pts
4    Calculating the Cynodon green coverage average correctly
4    Calculating the Zoysia green coverage average correctly
2    Identifying Zoysia as having the higher green coverage   [not yes/no]
4    Running a ttest for a difference of two independent mean
2    Answering the second question correctly [yes/no]
4    Answering the third question correctly [yes/no]

Note that the final answer was worth four points even without numeric support. This was because students were generating data on their computers and may have had numbers in front of them that they did not record on their paper. Hence credit for obtaining the correct statistical answer even in the absence of supporting numbers.

Note that while this report focuses on averages rather than specific learning outcomes, other work is being done on specific concepts learned. The ability to handle "data in the wild" which the open data exploration represents, is part of my effort to bring authentic assessment into the statistics course. These exercises are more "authentic" than the traditional "Calculate the mean of the following values" type of question. Thus this report is also an assessment of performance on a student learning outcome.