2 January 2024 by Richard

The Third Edition, the Dotted Half Note of Editions

old monastic music notation

Chant three times if you need help

Franco of Cologne was a 13th century music theorist. He was also a head of the Catholic crusader order the Knights Hospitaller, the sister order of the better known Knights Templar favored in conspiracy theories. Franco himself held a conspiracy about threes.

Franco developed many of the conventions of modern music notation. Including the terrible idea of adding a dot to a note to indicate that its duration is increased by 50%. If a half note is two beats, then a dotted half note is three beats. Franco called this “perfecting” the note, because holy things should come in threes. The English formerly called this perfectly holy dot the “prick of perfection,” because they are a silly people.

Editions and Renditions

I was thinking about all this because I am working on another edition of my subversive inferential methods book, Statistical Rethinking. This will be the third edition, the dotted half note of textbooks. So expectations are running high.

The first edition of the book focused on the core idea that statistical and machine learning tools are blind and reckless golems. Without some mature philosophy to guide us in building and applying them to well-defined questions, we should expect trouble. The second edition built on this by emphasizing formal and transparent ways to connect qualitative theories to quantitative procedures and evidence.

The prick of perfection for the third edition is a stronger emphasis on workflow. There has always been workflow in the book, gradually developed examples with explicit decisions and justifications. But there is so much more to workflow in reality. And workflow is possibly the least theorized and supported part of quantitative methods. It is essential, often invisible, and at the heart of contemporary controversies about the reliability and repeatability and transparency of scientific claims.

How to Draw an Owl

How to draw an owl meme

Some steps omitted

The goal is to take scientific data analysis at least as seriously as illustration. In the popular “how to draw an owl” meme (right), the joke is that how-to guides often leave out the many intermediate steps needed in real projects. These steps are not obvious and require their own technical study. Drawing a detailed owl needs scaffold lines, layering, blending, and erasing. Professional illustrators study and refine these techniques as part of their craft. It is not possible to reverse-engineer the process from the result.

Scientific data analysis is similar. It too contains many non-obvious steps, including intermediate analyses that are necessary scaffolds but are discarded. A successful analysis does not automatically document the steps required in its construction. Compared to illustration, there has not been much study and technical development of scientific workflow. Researchers receive little explicit instruction. We must aspire to do better.

And so we need texts that address workflow more explicitly. And this is a growing trend. Just one example, Rohan Alexander’s recent and excellent Telling Stories with Data pays a lot of explicit attention to workflow. In my text, I want to address workflow in a way that explicitly connects theory, including qualitative reasoning, to research design and statistical analysis and the interpretation of results. A lot of these ideas have arisen through conversation with and reading my colleagues Aki Vehtari and Andrew Gelman. But don’t blame them.

Work Must Flow

A reliable workflow also requires quality insurance flows that overlay the central scientific flow, such as testing and diagnostics. And each component of the workflow may need its own internal workflow so it can be reliably constructed. For example, a complex multilevel model should not be built all at once. Rather it should be logically broken down into layers which can be programmed and tested in turn. This both helps ensure the model functions as intended and often teaches us how the model works, because it allows us to compare the different sub-models.

This is a lot. But I don’t see a way for the components of research to have meaning without the holism. For example, here is a partial workflow diagram from one of my 3rd edition draft chapters:

analysis workflow diagram

What could be easier? Don’t worry. In the text, I build this up in small pieces. In early chapters, the core relations between estimands (research questions) and estimators (statistical models) and estimates is emphasized. Then quality assurance using synthetic data simulation and validation is added, at first only in simple examples. Then post-fit diagnostics like posterior predictive checks. Then prior predictive checks to help design estimators. Then more detailed design of estimands and their relationships to estimators. Then the use of estimates to construct question-appropriate summaries, such as post-stratifications and marginal effects. Then model comparison. And so on.

In any single example, most of the diagram above can be hidden, so one can focus on the point of the example. That’s the goal anyway.

And on top of all of this, I want to write some support tools to track the workflow and prompt researchers to add and document things like validation and diagnostics, as well as explicit justifications for how the estimates relate to the estimands (research questions).

This means some software development. Mathematicians have some amazing software support for proof workflow. See this blog post by Terry Tao. These tools help to decompose, track, and document the requirements for a proof and the justifications for each step. Proofs, like data analysis projects, can be very big. Tracking this complexity benefits from software supported workflow.

Lean4 proof tree

A Lean4 + Blueprint proof tree diagram

We could learn a lot from them, in general terms. Data analyses are never “proven”. But there are still targets and chains of intermediate steps with logical requirements for compatibility. There are still professional and ethical obligations to be transparent about justifications that link, for example, estimands to estimators. Probably half the published papers I read have no clear estimand and no workflow for justifying an estimator other than convention or rhetoric. This is unacceptable, in any context. But supporting researchers and prompting them for these justifications could help.

I want to build on existing tools while remaining interoperable with any statistical packages a research wants to use. I mean yeah, you should all be using bespoke Bayesian models. But I am Bayesian of the people. I meet the sinner where he dwells.

Timeline

Everytime I mention the 3rd edition, people throw money at me and ask when it will be published. The original idea was to have a draft ready for professional peer review in 2024. That would mean the book might go to press in 2025. Maybe that plan will still hold.

But I will not rush it. There is a lot to do, and I have a lot going on in my life besides. Administration. Commissions. Reading and doing research. Supporting my colleagues with their research. Teaching. Protecting my colleagues from structural and personal bullshit inside the academy.

And last but absolutely the first priority, raising a teenage boy, mostly by myself. I don’t often talk about my personal life online. But I am constantly scheduling everything around my son and for my son. That includes book projects. So I can’t really make any hard promises on timelines for big projects like this. And this is also why I keep turning down your talk/conference invitations. Sorry, not sorry.

Anyway please be patient. I serve the research revolution. I will put a dot on this half note.

Notes

See this nice, whingeing video about music notation for more about Franco and other music notation reformers.

Curious about workflow? Here a BIG paper by Gelman et al. And here is an excellent talk by Aki Vehtari.