Never Too Early To Talk About Missing Data

This story starts on twitter, but it gets better.

Yesterday, a statistician tweeted the screenshot below to other statisticians. In the statistics community, replacing missing values with zero (or any other constant) is universally seen as a bad idea. There are controversies in statistics. But this isn’t one of them. Hundreds of statisticians chuckled in recognition and scrolled on.

But of course outside of the statistics community, in the wilds of the sciences, few students and researchers have had any direct training in how to handle missing data. They pick up strategies from colleagues, published papers, and their own fertile imaginations. It is hard to blame them, since very few introductory statistics courses mention missing data. And few researchers ever get more than introductory statistics, to be honest.

So what we have is a situation in which professional statisticians have a strong consensus opinion about a very common problem in data analysis (don’t replace missing data with zeros) but are failing to teach it to their primary audience. And then researchers do bad things with missing data, often in prominent publications. It’s hard to place all the blame on them. This is not an ideal situation.

My own statistics book has a large section on missing data. But it comes very late in the book, as if it were some advanced topic that the reader can neglect. I now think that was a mistake. When I do my own research or collaborate, missing values are nearly always present. Why haven’t I adjusted my teaching to make this topic more prominent?

Thank you for listening to my confession.

One last thing to note. On twitter context breaks very easily. It is possible for a statistician to share something (like the AskStatistics screenshot above) with other statisticians without malice. There is shared background in the community, a background that silently tells us that while we sometimes make little jokes at the expense of students and researchers, we remain committed to helping. It’s a bit like nurses complaining about patients.

But context bleeds on social media. People outside the community can and likely will see it. They won’t share the silent background. It will look much meaner and discouraging than it was intended.

I myself have a “no dunking on students” rule for social media. It’s not that students are never deserving of being dunked on! Students can be frustrating, because they are people. People are frustrating and sometimes deserve to get dunked. I often deserve a good dunk as well.

But on social media, context bleeds. I worry about that a lot and often self-censor and give critical feedback only in private. Which is also not ideal, because it’s inefficient. Many other people would benefit from seeing public criticism. But for now, this is the compromise I work with.