14 July 2018 by Richard
Statistical Rethinking, Edition 2: ETA March 2020
[updated 18 Dec 2019 — see second edition table of contents at bottom]
It came as a complete surprise to me that I wrote a statistics book. Really, I am an anthropologist. I study human evolution. Statistics is for me only a necessary activity, required for making inferences from data. In my list of professional identities, statistician falls somewhere below asker-of-unfair-questions and somewhere above hobbyist-pizza-chef.
It is even more surprising how popular the book has become. But I had set out to write the statistics book that I wish I could have had in graduate school. No one should have to learn this stuff the way I did. I am glad there is an audience to benefit from the book.
It consumed 5 years to write the book. There was an initial set of course notes, melted down and hammered into a first 200 page manuscript. I discarded that first manuscript. But it taught me the outline of the book I really wanted to write. Then several years of teaching with the manuscript further refined it.
Really I could have continued refining it every year. Going to press carries the penalty of freezing a dynamic process of both learning how to teach the material and keeping up with changes in the material. As time goes on, I see more elements of the book that I wish I had done differently. I’ve also received a lot of feedback on the book, and that feedback has given me ideas for improving it.
Ch-ch-ch-ch-changes
So now I have almost finished a second edition. The goal with a second edition is only to refine the strategy that made the first edition a success. I revised the text and code and taught with it in Winter 2019. Now I’ve taken student and colleague feedback, revised more, and the book is in production for a target March 2020 publication.
The soul of the book is the same. But there is a lot of new material as well. Here is an outline of the changes.
Look out you rock ‘n rollers
The R package has some new tools. The map tool from the first edition is still here, but now it is named quap. This renaming is just to avoid misunderstanding. We just used it to get a quadratic approximation to the posterior. So now is named as such. A bigger change is that map2stan has been replaced by ulam. The new ulam is very similar to map2stan, and in many cases can be used identically. But it is also much more flexible, mainly because it does not make any assumptions about GLM structure and allows explicit variable types within the formula list. All the map2stan code is still in the package and will continue to work. But now ulam allows for much more, especially in later chapters. Both of these tools allow sampling from the prior distribution, using extract.prior, as well as the posterior. This helps with the next change.
Much more prior predictive simulation. A prior predictive simulation means simulating predictions from a model, using only the prior distribution instead of the posterior distribution. This is very useful for understanding the implications of a prior. There was only a vestigial amount of this in the first edition. Now most modeling examples have some prior predictive simulation. I think this is most useful addition to the second edition, since it helps so much with understanding not only priors but also the model itself.
More emphasis on the distinction between prediction and inference. Chapter~5, the chapter on multiple regression, has been split into two chapters. The first chapter focuses on helpful aspects of regression. The second focuses on ways that it can mislead. This allows as well a more direct discussion of causal inference. This means that DAGs—directed acyclic graphs—make an appearance. The chapter on overfitting, Chapter~7 now, is also more direct in cautioning about the predictive nature of information criteria and cross-validation. Cross-validation and importance sampling approximations of it are now discussed explicitly.
New model types. Chapter~4 now presents simple splines. Chapter~7 introduces one kind or robust regression. Chapter~12 explains how to use ordered categorical predictor variables. Chapter~13 presents a very simple type of social network model, the social relations model. Chapter~14 has an example of a phylogenetic regression, with a somewhat critical and heterodox presentation. And there is an entirely new chapter, Chapter~16, that focuses on models that are not easily conceived of as GLMMs, including ordinary differential equation models.
Some new data examples. There are some new data examples, including the Japanese cherry blossoms time series on the cover and a larger primate evolution data set with 300 species and a matching phylogeny.
More presentation of raw Stan models. There are many more places now where raw Stan model code is explained. I hope this makes a transition to working directly in Stan easier. But most of the time, working directly in Stan is still optional.
Much more material on the details of Hamiltonian Monte Carlo. There is detailed material on divergent transitions now, and complete raw R code for implementing a simple HMC simulation.
Pretty soon now you’re gonna get older
Not everything has changed. Mostly it is the same book, with the same kind style.
As in the first edition, I have tried to make the material as kind as possible. None of this stuff is easy, and the journey into understanding is long and haunted. It is important that readers expect that confusion is normal. This is also the reason that I have not changed the basic modeling strategy in the book.
First, I force the reader to explicitly specify every assumption of the model. Some readers of the first edition lobbied me to use simplified formula tools like brms or rstanarm. Those are fantastic packages, and graduating to use them after this book is recommended. But I don’t see how a person can come to understand the model when using those tools. The priors being hidden isn’t the most limiting part. Instead, since linear model formulas like y ~ (1|x) + z don’t show the parameters, nor even all of the terms, it is not easy to see how the mathematical model relates to the code. It is ultimately kinder to be a bit cruel and require more work. So the formula lists remain. In this book, you are programming the log-posterior, down to the exact relationship between each variable and coefficient. You’ll thank me later.
Second, half the book goes by before MCMC appears. Some readers of the first edition wanted me to start instead with MCMC. I do not do this because Bayes is not about MCMC. We seek the posterior distribution, but there are many legitimate approximations of it. MCMC is just one set of strategies. Using quadratic approximation in the first half also allows a clearer tie to non-Bayesian algorithms. And since finding the quadratic approximation is fast, it means readers don’t have to struggle with too many things at once.
Turn and face the links
The publisher has a page for the second edition up. The R package will remain on github. The current Experimental branch will become the master branch, when the book appears in print.
Table of Contents for Second Edition
Table of Contents
Preface to the Second Edition
Preface
Audience
Teaching strategy
How to use this book
Installing the rethinking R package
Acknowledgments
Chapter 1. The Golem of Prague
Statistical golems
Statistical rethinking
Tools for golem engineering
Summary
Chapter 2. Small Worlds and Large Worlds
The garden of forking data
Building a model
Components of the model
Making the model go
Summary
Practice
Chapter 3. Sampling the Imaginary
Sampling from a grid-approximate posterior
Sampling to summarize
Sampling to simulate prediction
Summary
Practice
Chapter 4. Geocentric Models
Why normal distributions are normal
A language for describing models
Gaussian model of height
Linear prediction
Curves from lines
Summary
Practice
Chapter 5. The Many Variables & The Spurious Waffles
Spurious association
Masked relationship
Categorical variables
Summary
Practice
Chapter 6. The Haunted DAG & The Causal Terror
Multicollinearity
Post-treatment bias
Collider bias
Confronting confounding
Summary
Practice
Chapter 7. Ulysses’ Compass
The problem with parameters
Entropy and accuracy
Golem Taming: Regularization
Predicting predictive accuracy
Model comparison
Summary
Practice
Chapter 8. Conditional Manatees
Building an interaction
Symmetry of interactions
Continuous interactions
Summary
Practice
Chapter 9. Markov Chain Monte Carlo
Good King Markov and His island kingdom
Metropolis Algorithms
Hamiltonian Monte Carlo
Easy HMC: ulam
Care and feeding of your Markov chain
Summary
Practice
Chapter 10. Big Entropy and the Generalized Linear Model
Maximum entropy
Generalized linear models
Maximum entropy priors
Summary
Chapter 11. God Spiked the Integers
Binomial regression
Poisson regression
Multinomial and categorical models
Summary
Practice
Chapter 12. Monsters and Mixtures
Over-dispersed counts
Zero-inflated outcomes
Ordered categorical outcomes
Ordered categorical predictors
Summary
Practice
Chapter 13. Models With Memory
Example: Multilevel tadpoles
Varying effects and the underfitting/overfitting trade-off
More than one type of cluster
Divergent transitions and non-centered priors
Multilevel posterior predictions
Summary
Practice
Chapter 14. Adventures in Covariance
Varying slopes by construction
Advanced varying slopes
Instruments and causal designs
Social relations as correlated varying effects
Continuous categories and the Gaussian process
Summary
Practice
Chapter 15. Missing Data and Other Opportunities
Measurement error
Missing data
Categorical errors and discrete absences
Summary
Practice
Chapter 16. Generalized Linear Madness
Geometric people
Hidden minds and observed behavior
Ordinary differential nut cracking
Population dynamics
Summary
Practice
Chapter 17. Horoscopes
Endnotes