Data exploration direct to slides via ChatGPT
Data explorations form the core for assessment in MS 150 Statistics. Students are given data and questions to be answered about the data. How students choose to answer the question, what statistical approach they take, provides insights into what they have learned. One such data exploration is determining what variable an exercise app is using to calculate the calories burned during a run. The following is the setup for this data exploration.
Along with MS 150 Statistics I am also the instructor for ESS 101w Walking for Fitness online. In that course the students and I use the Strava app to track distance, time, pace, and calories.
The distance is measured using satellites in low earth orbit, the time is measured by the clock, and the pace is measured using the satellite data and the clock. Calories burned cannot be directly measured by a cell phone app. Thus the calories are a calculated quantity. Strava only uses one of the measurements to calculate the calories - either distance, time, or pace.
Your task is to find which measurement is being used to calculate the calories and then report your findings in a presentation. Once again you will be submitting a presentation. The presentation should include the relevant statistics for your findings and include charts as appropriate to illustrate your findings.
The data is in three tables in the spreadsheet at:
Each table has the measurement and the calories. The data is from 16 Strava activities, both walks and runs. The data is paired data. The measurement table with the strongest relationship is the measurement being used to calculate the calories.
This assignment models studies which seek to identify which factors contribute most to another factor or variable.
Guiding question: Which measurement: distance, time, or pace, is being used to calculate the calories?
The three tables are distance versus calories, time versus calories, and pace versus calories. MS 150 Statistics is an introduction to basic statistics. The only relevant statistical tools the student has learned at this point is to make a scattergraph and to use the slope, intercept, and correlation functions.
The lines "The data is paired data. The measurement table with the strongest relationship is the measurement being used to calculate the calories." were added because students were stumped as to where to even begin. Students tended to calculate the mean or median for each column hoping that this might lead to some sort of insight. That is the level as which many students are operating in the course.
Even with those two lines, many students struggle with this exercise. If asked to calculate the slope, intercept, and correlation for paired data, the students can successfully do this. When the prompt does not include the name of a statistical function, the students do not know what to do.
In a previous data exploration exercise I noticed that one student had turned in an analysis that included the results of concepts not covered in the course. I asked the student how they had accomplished their data exploration and they said that they had used a large language model.
In my course at present the use of large language model systems is not encouraged but is also not strictly forbidden. As a student decades ago at the dawn of the portable electronic calculator age I had teachers who banned the use of calculators. Back then Texas Instruments calculators came with a black case that hung off of one's belt. As one teacher once said to me, "You will not always have a calculator on your hip." I wish they were still alive and could see the Pixel 7 Pro that is all but welded to my hip.
I was puzzled as to how they were successfully using a large language model as prior attempts by other students had failed to produce statistically meaningful results. The student explained that they had first copied the assignment directions into the large language model, then the data. The student then offered to show me what they had done, this time using the current assignment.
The students copied and pasted the directions into the large language model input. Then they went to copy and paste the whole spreadsheet into the large language model.
"That won't work," I said. "The model will not know that those are three different tables to be separately considered. You will have to copy and paste those one at a time." Then I paused, "Nevermind, go ahead and do what you were going to go." The student copied all three tables in a single copy and paste.
The large language model then returned three separate scattergraph diagrams.
Photograph of the student's screen
The large language model had "correctly discerned" that the blank columns separated three different sets of data. The model had then gone straight to the only statistic the students had learned which had a bearing on answering the guiding question: the correlation. The highest correlation would be for the variable most likely to be driving the calorie calculation. And the correlations that the large language model produced were correct.
The key findings were the optimal solution that the students were able to obtain with what they knew. The distance had "the strongest positive correlation with calories burned suggesting it is the most likely factor used by Strava for calorie calculation." The model also rules out the other two variables.
The Next Steps: item is what caught me most off guard. The large language model had "understood" that the assignment required submission as a presentation. The student read through the output and upon reaching the Next Steps, turned to me and asked, "What should I do?"
"Say yes," I responded, feeling bewildered and incredulous at the same time.
The large language model then produced a Microsoft PowerPoint presentation with an introduction, a slide with the three graphs, a slide with the statistical analysis, and a concluding slide.
The student looked at the slides and then, turning back to me, asked, "Can I submit this?" I was still in shock because I realized I had no way to have ever realized that a whole presentation could be produced by a large language model. Stunned, I could only say out loud what I was thinking, "Yes, that's the solution, and it is correctly done."
While these large language models still produce "hallucinations" and make errors in mathematics, that will be resolved by coupling reasoning engine systems to large language models. The mathematical mistakes and logic errors are but teething issues for an infant technology. During the lifetime of my students, these issues will be resolved.
The baby is watching YouTube while using her right thumb to scroll down through other videos to see what she will watch next. Her brain is setting itself up to pay attention to two separate visual inputs. Her brain will be different than that of prior generations.
Banning the use of these technologies will be just as functional as earlier attempts to ban prior technologies. Eventually a new generation born into a world with the technology. The new generation embraces and uses those technologies. Every new technology is cited as being a danger to society. Television was in my own youth. And maybe television was a threat - at least to the society in which that technology arrived. One could argue that the world of the 1950s was doomed by the arrival of television. Television certainly played a role in the differing perceptions of and reactions to the Viet Nam and Korean conflicts.
Every generation sees change as leading to social collapse or apocalypse. And each new generation takes the change to be the new normal and carries on. As an instructor my task is to prepare students for the world ahead of them, not the world of my own youth. And that world will use large language models, reasoning engines, autonomous robots, and thinking machines. I am also aware that world might not include teachers. The Alpha School in Austin, Texas, is termed a "radical approach" which is always a clear sign of an impending future reality, a future norm that only seems radical to the current generation. But to think that I am among the last generation of human teachers would be apocalyptic collapsist thinking, the real weakness of an aging generation.
Comments
Post a Comment