Passion Driven Statistics: Data and Code Books
Last Friday, we completed the codebook activity. It starts with a simple idea. Here is what I learned.
I was fascinated with what I learned this past week about the beauty of the keep it simple principle. It's easy to forget. The PDS program starts with students reviewing a PDF codebook before beginning the project. Here are the steps:
Review the AddHealth codebook topic descriptions to help students pick two topics they are curious about.
Move from the topic descriptions to actual survey questions asked about the topic.
Choose only two variables—one from each topic and brainstorm questions about how the variables might be related.
That's it. It sounds easy, but it isn't for introductory statistics students. Here are the things I learned.
Several students stopped with the descriptions only instead of choosing survey questions to develop their questions. Many found their interpretations of the descriptions differed significantly from the survey questions.
We, as teachers, use the term variable all the time, but many students don't understand what the word means. It seems best to describe it simply as facts and figures illustrating their world. One book goes so far as to replace the term variables with "features of the world." I had never heard of variables described this way, but my colleague, Dr. Cora Wigger, recently pointed me to a text that does. In sum, the description seems to stick.
When students think of a research question, they want to collect many features of the world immediately. Focusing on two the entire semester is a promising way to prevent students from going down rabbit holes instead of focusing on the specific question.
I am cautiously optimistic that the Python implementation will work. We started very small by downloading the Anaconda Distribution, showing them how to use the Spyder IDE, and writing code to access the Panda library and import the data set. It only requires two lines of code, but it is very satisfying for many students, especially those who don't believe they can code.
import pandas as pd
df = pd.read_csv('file_path.csv')
Note, I didn't have them pull their two variable just yet as students were having trouble understanding the difference between the codebook's description and the short abbreviation of the variable. We are working on that with a basic print command and frequency table this Friday.
Next week, I'll share and reflect on what I learned.
References:
Bueno de Mesquita, E., & Fowler, A. (2021). Thinking clearly with data: A guide to quantitative reasoning and analysis. Princeton University Press.