Passion-Driven Statistics: Another Good Week
Last week's project biographies are likely to be a surprise success. I still see some confusion when distinguishing categorical and quantitative variables. Python is rolling along smoothly.
Students are now seeing how their questions are unfolding, know the value of crosstabs for the two features of the world, and are excited about their research question. That is a big step forward as we move to our last week of descriptive joint probabilities and numerical measures for quantitative variables (measures of location and spread). Next Wednesday, we have our second exam before spring break. After the break, we move to the second part of the class, where we look at the practical value of sampling distributions and choose the best hypothesis test for their research question.
Most, but not all, students have selected two categorical variables.
There still seems to be some confusion between categorical and quantitative variables. I am wrestling with what signal that is giving me when I do the course next time.
A focus on basic joint probabilities, conditional probabilities, and independence has provided the groundwork for comparing two variables that will come after spring break. It also flowed naturally from the frequency and relative frequency tables we practiced earlier in the semester.
Python is going quite well for most of the students now. Students feel comfortable with Python frequency tables and bar charts for one variable. They could compare those with the Python-calculated values listed in the codebook and their project biographies. Basic crosstabs in Python were a success.
From a 30,000-foot view, students now are getting the pros and cons of using MS Excel and Python and the criteria they would use to choose one software or the other. They also see how built-in pivot tables save time when creating crosstabulations, especially when grouping a quantitative variable into categories using the pivot tables grouping feature. That’s a pretty neat feature that I wasn’t as familiar with before this class.
For quantitative data, I am using the “Five-Number Summary” to have students summarize quantitative data (Camm et al., 2024). I chose not to present the box plot and just use the numbers. That was a good decision.
Smallest Value
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
Largest Value
Next week: We will wrap up the initial Python data work and give instructions for the first two written elements of the PDS poster: The Introduction and The Research Question.
References
Camm, J. D., Cochran, J. J., Ohlmann, J. W., Fry, M. J., Anderson, D. R., Sweeney, D. J., & Williams, T. A. (2024). Essentials of modern business statistics with Microsoft Excel (9th ed.). Cengage.

