Open data exploration

When given data and asked to make a specific statistical calculation, the students in MS 150 Statistics answer correctly an average 80% of the time based on the final examination. Three-quarters of the students exceed a 70% correct answer rate. For simpler statistics such as calculating a median, mode, or mean, the success rate climbs to 94%. Calculating a 95% confidence interval for the mean has a lower 56% success rate. In general, the students can make requested statistical calculations.

When presented with data and asked questions about the data, the students are far less successful. When not specifically told what to calculate, the students flounder and flail. Only 9 of 78 (12%) of the students generate and cite the appropriate statistical analysis to support their answers. Another 13% generate statistics that provide some relevant support for their answer. The remaining 75% generally cite irrelevant statistics or use an analysis wholly inappropriate to the data.

This schism between the high performance on specific operations versus the low performance on less structured, more open ended questions is seen in the following chart. The blue bubbles track the final average since 2005 including the current fall 2013 term. The bubble chart also includes a yellow bubble for the performance on specific operations and an orange bubble for the open ended data analysis questions for fall 2013.

Note that on the open ended data analysis a rubric was used to mark the solutions provided by the students. Making any sort of stab at calculating basic statistics yielded a score of 50%, even if the statistics did not help formulate an answer to the questions asked.

0 Blank
10 A few random statistics or an incorrect analysis, no basis for decision, none made.
OR an answer of yes with no statistical support for the position, no numeric analysis.
12 A few basic statistics but no basis for a decision and none made
13 A few basic statistics but no basis for a decision and wrong answer given
OR an answer of "yes" given based on random essentially irrelevant statistics
OR some basis for a decision but incorrect answer arrived at
14 Some basis for a decision and an answer of "yes", but not fully complete analysis.

A box plot of percentage scores also shows the same schism.

Note that the bottom of the lower whisker and the first quartile for the open data exploration are both 0.50, fifty percent, hence the absence of a whisker. There are low outliers at zero: students who chose not to even attempt to answer.

This gap between the ability to perform a specific calculation and the ability to apply that knowledge in a more authentic situation first became apparent to me about four years ago. At that time I was integrating term long data projects into the course as a way of getting the students to write, engage in data collection, and analysis. The students tended to gather trivial data sets. The nature of the course, the enrollment, and students in the course did not lend themselves to the more intense guidance that might have generated more meaningful data sets. Ultimately the projects approach was discarded.

A year and half ago the course curriculum was altered to provide more time on open data exploration and analysis. The final two to three weeks are consist of the students being given data and data questions, and then the students - often working in pairs or trios - attempt to provide answers to the questions supported by statistical analysis.

Although the intent is to provide a minimum of structure in this part of the course, textual material is gradually being developed in support of open data exploration.

The time constraints of a final examination are not optimal for testing the students ability to explore data, but the open ended data analysis is far more authentic an analysis than fill-in-the-blank questions.

Popular posts from this blog

Box and whisker plots in Google Sheets

Creating histograms with Google Sheets

Traditional food dishes of Micronesia