### Student learning in MS 150 statistics

That which students will be able to demonstrate and do in statistics usually centers on phrases such as "Calculate basic statistical measures of the middle, spread, and relative standing" or "Given a data set, calculate the 95% confidence interval for the mean." A student either can perform these calculations or they cannot. At the introductory level are sets of basic skills. For MS 150 Statistics at the College of Micronesia-FSM these skills are described on the course outline.

At the end of the course a student should be able to demonstrate the skill or not. The final examination, in conjunction with test three near the end of the term, are aligned with the outline. An item analysis of performance on these instruments provides information on performance against the specific student learning outcomes. The data derived from this analysis is of most direct use to the instructor.

The specific student learning outcomes are aggregated into course level student learning outcomes. At the course level there is less specific information for the individual instructor, but provides a broader and easier to comprehend picture. If the specific student learning outcomes are trees, then the course learning outcomes are a patch of forest.

Course learning outcome performance data from test three and the final examination has been tracked since 2005, providing a long term view of performance against the course learning outcomes. MS 150 Statistics has three course learning outcomes:

CLO 1. Perform basic statistical calculations.

CLO 2. Obtain results using normal and t-distributions.

CLO 3. Perform linear regressions.

The following chart tracks the average performance on the course learning outcomes as measured by item analysis of test three and the final examination from fall 2005 to spring 2012. Click on a chart to enlarge the chart.

Individual term performance varies rather randomly over time. For example the improvement in performance on CLO 2 from fall 2011 to spring 2012 might look like this is an important gain, but it is only a return to the performance levels seen in fall 2008 and fall 2006. There were no curricular differences to which to attribute the change from fall 2011 to spring 2012, thus these fluctuations in performance are simply the movement of the average within a range.

The item analysis data was also used to evaluate average performance on the general education program student learning outcomes that are listed on the course outline. The program student learning outcomes served are:

GE PSLO 3.1 Demonstrate understanding and apply mathematical concepts in problem solving and in day-to-day activities

GE PSLO 3.2 Present and interpret numeric information in graphic forms.

The sample size for the spring 2012 final examination was 75 students. Test three, which measured some of the specific student learning outcomes under CLO 2, had a sample size of 70. The course manifest listed 78 students enrolled at term end, three students had stopped attending class prior to test three.

Average performance on the comprehensive final examination across the terms is shown in the following bubble chart. The x-coordinate of the center of the circle is the date of the final examination. The y-coordinate of the center of the circle is the average performance on the final examination. The radius is the standard deviation on the final examination (not the standard error of the mean).

Green circles were closed book examinations. Blue circles are proctored open book examinations. The orange circle was a take home examination necessitated by the loss of power in the computer laboratory.

Although performance is up term-on-term from fall 2011 to spring 2012, the average remains within the performance band in which the average randomly varies.

As a means of providing performance data on the final examination by major, the following chart indicates the average by major on the final examination.

Bear in mind that the underlying samples sizes are small and that these averages will vary randomly. Also bear in mind that smaller sample sizes tend to produce more extreme values - there is more variability possible for the mean in smaller samples.

Ultimately learning is not measured by averages but by the skills and abilities of each and every student. While an item analysis provides insight into what the students are mastering on aggregate, only a grid of individual students against items would provide the information as to what the individual student has mastered.

For the 78 students in MS 150 and the 54 questions on test three and the final, this would be a 4212 cell grid. The result would be very detailed, fine grained information on what each students can and cannot do. But like describing individual grains of sand on the beach, the level of detail would prevent understanding what a beach looks like.

The item analysis provides actionable information for the instructor.

Whether using tests for assessment purposes constitutes authentic assessment is a more complex issue. The data in test three and the final examination is authentic data. For example, the hypothesis test on the final derives from an actual experiment. That said, if someone dragged a set of data into the class that was substantively different in nature from data the students have seen in the course and said, "Analysis this," the students would likely be stumped.

When given a specific task such as "calculate the mean" the students can do this. Asked to calculate the mean three different times on the final examination, 70, 71, and 74 students of 75 respectively answered the mean question correctly. No student missed all three questions, thus one could argue that 75 of 75 have the ability to calculate a mean.

When given simply raw data and asked to determine the most appropriate analysis, the students have more difficulty. The basis of this judgement are student mini-projects that the students complete during the term where they gather data and analyze that data. The projects indicate difficulty with understanding what the appropriate statistical procedure is for their data.

Providing more opportunities to work with data with less instructional scaffolding should be of benefit to the students. This would require, however, trimming the curriculum to provide space for data exploration and open ended analysis. Spending less time on the normal distribution material in chapter seven and the normdist, norminv functions, would provide time for unstructured data exploration and analysis while retaining the key statistical concepts.

Adjustments to the fall 2012 curriculum are being planned in a one term experiment with doing more data exploration activities.

At the end of the course a student should be able to demonstrate the skill or not. The final examination, in conjunction with test three near the end of the term, are aligned with the outline. An item analysis of performance on these instruments provides information on performance against the specific student learning outcomes. The data derived from this analysis is of most direct use to the instructor.

The specific student learning outcomes are aggregated into course level student learning outcomes. At the course level there is less specific information for the individual instructor, but provides a broader and easier to comprehend picture. If the specific student learning outcomes are trees, then the course learning outcomes are a patch of forest.

Course learning outcome performance data from test three and the final examination has been tracked since 2005, providing a long term view of performance against the course learning outcomes. MS 150 Statistics has three course learning outcomes:

CLO 1. Perform basic statistical calculations.

CLO 2. Obtain results using normal and t-distributions.

CLO 3. Perform linear regressions.

The following chart tracks the average performance on the course learning outcomes as measured by item analysis of test three and the final examination from fall 2005 to spring 2012. Click on a chart to enlarge the chart.

Individual term performance varies rather randomly over time. For example the improvement in performance on CLO 2 from fall 2011 to spring 2012 might look like this is an important gain, but it is only a return to the performance levels seen in fall 2008 and fall 2006. There were no curricular differences to which to attribute the change from fall 2011 to spring 2012, thus these fluctuations in performance are simply the movement of the average within a range.

The item analysis data was also used to evaluate average performance on the general education program student learning outcomes that are listed on the course outline. The program student learning outcomes served are:

GE PSLO 3.1 Demonstrate understanding and apply mathematical concepts in problem solving and in day-to-day activities

GE PSLO 3.2 Present and interpret numeric information in graphic forms.

The sample size for the spring 2012 final examination was 75 students. Test three, which measured some of the specific student learning outcomes under CLO 2, had a sample size of 70. The course manifest listed 78 students enrolled at term end, three students had stopped attending class prior to test three.

Average performance on the comprehensive final examination across the terms is shown in the following bubble chart. The x-coordinate of the center of the circle is the date of the final examination. The y-coordinate of the center of the circle is the average performance on the final examination. The radius is the standard deviation on the final examination (not the standard error of the mean).

Green circles were closed book examinations. Blue circles are proctored open book examinations. The orange circle was a take home examination necessitated by the loss of power in the computer laboratory.

Although performance is up term-on-term from fall 2011 to spring 2012, the average remains within the performance band in which the average randomly varies.

As a means of providing performance data on the final examination by major, the following chart indicates the average by major on the final examination.

Bear in mind that the underlying samples sizes are small and that these averages will vary randomly. Also bear in mind that smaller sample sizes tend to produce more extreme values - there is more variability possible for the mean in smaller samples.

Ultimately learning is not measured by averages but by the skills and abilities of each and every student. While an item analysis provides insight into what the students are mastering on aggregate, only a grid of individual students against items would provide the information as to what the individual student has mastered.

For the 78 students in MS 150 and the 54 questions on test three and the final, this would be a 4212 cell grid. The result would be very detailed, fine grained information on what each students can and cannot do. But like describing individual grains of sand on the beach, the level of detail would prevent understanding what a beach looks like.

The item analysis provides actionable information for the instructor.

Whether using tests for assessment purposes constitutes authentic assessment is a more complex issue. The data in test three and the final examination is authentic data. For example, the hypothesis test on the final derives from an actual experiment. That said, if someone dragged a set of data into the class that was substantively different in nature from data the students have seen in the course and said, "Analysis this," the students would likely be stumped.

When given a specific task such as "calculate the mean" the students can do this. Asked to calculate the mean three different times on the final examination, 70, 71, and 74 students of 75 respectively answered the mean question correctly. No student missed all three questions, thus one could argue that 75 of 75 have the ability to calculate a mean.

When given simply raw data and asked to determine the most appropriate analysis, the students have more difficulty. The basis of this judgement are student mini-projects that the students complete during the term where they gather data and analyze that data. The projects indicate difficulty with understanding what the appropriate statistical procedure is for their data.

Providing more opportunities to work with data with less instructional scaffolding should be of benefit to the students. This would require, however, trimming the curriculum to provide space for data exploration and open ended analysis. Spending less time on the normal distribution material in chapter seven and the normdist, norminv functions, would provide time for unstructured data exploration and analysis while retaining the key statistical concepts.

Adjustments to the fall 2012 curriculum are being planned in a one term experiment with doing more data exploration activities.