Journal Clubbing: Accurate Age Estimation

Abstract: The road to accurate estimation of age is paved with a biological model.

Older than he looks

My institute studies human biology and evolution, and my department studies behavior and cognition in human societies. The goal is to understand processes of human adaptation. It’s all presented here [PDF] with extensive grandeur.

The sort of data we deal with is typically gathered in communities without much by way of written records. So determining a person’s age is not a simple matter of asking them their age or looking up their birth certificate. If you haven’t worked or lived in such communities yourself, it may be hard to imagine that a person would not know their age. But really calendar age isn’t that useful in such places—it conveys no unique rights or responsibilities—so people don’t keep close track of it.

A similar issue arises of course when studying other animals. Chimpanzees, meerkats, and dolphins do not keep records, as far as we know, and will not respond usefully to inquiries about their ages.

So how do we deal with this problem? The usual solution is to take an informed guess, combining a person’s self-report with qualitative knowledge, and then plug that guess into a statistical model, as if it were known with certainty.

Rarer and slightly better is to assign some standard error to the estimate or instead use a range of ages—say for example 15–25. Then the estimate with its error can be used in a Bayesian measurement error model. That’s exactly what I’m doing in a current project. This approach retains the uncertainty at least. But it ignores a lot of information that could possibly refine the estimates further.

People as Pot Sherds

A recent paper by Yoan Diekmann and colleagues pushes this one step further by treating people like pot sherds. In archaeology, artifacts like sherds and lithic tools may receive radio carbon dates. But such dates usually come with spectacular error distributions. But artifacts also come from specific layers—strata—at a site. Those layers tell us about the order of the dates for different artifacts. Using the layer information therefore can refine date estimates. Software like OxCal attempts to make this approach accessible.

Fig 5B from Diekmann et al 2017 [original]

Diekmann and colleagues apply the same approach to human age estimation, using rank order of births as an additional model input. These ranks define a set of hard constraints on differences among true ages of individuals. The result is that age estimates are refined. It’s a clever idea. The figure at right shows the result—priors (gray) get squished to concentrate mass in regions that are more plausible. Rank information can also truncate the posterior, which the example at right doesn’t show so well. But I believe the green and purple individuals show posterior truncation, if you look close.

Better than Sherds

To be honest, I hadn’t given this problem much thought, before seeing the Diekmann et al paper. I was using error ranges on age estimates and doing full Bayesian error models, but I wasn’t thinking about any next steps in refining age estimates. This paper got me—and other members of my department—thinking.

Our first thought is that a little biological information could go a long way. Consider a birth order of siblings. We know child 1 was born before child 2 and before child 3. So their ages must be ordered the same, with child 1 being oldest. But we also know that child 2 was not born the day after child 1. Human gestation takes some time. So does weaning, in most places. Information about the distribution of interbirth intervals can help us refine the age estimates further. At least that’s the guess.

Once we’ve added birth intervals, why not start adding other biological information? Age-specific fertility could also improve estimates, because knowledge of it changes our guess about the age of a woman when she had each child. This line of thought reminds us that, for us at least, the target of inference is not really age but rather biological rates like birth interval and age-specific fertility—all the components of a population projection matrix, for example. We want to use age to infer other things. We don’t actually care about individual ages themselves.

So the vision is to have a generative model of the demography of the animal population and then use that model as a statistical model. Ages would be parameters in the model, but the improved estimation of them would be useful because it leads to refined estimates of the biological rate functions. Hormone data, to the extent typical levels correlate with age, could also be used. In humans, the change in male vocal pitch at puberty could be useful. The general approach could be used for both age-structured (like humans) and stage-structured (many plants and animals) populations.

Stock image of a custom Gibbs sampler

Unfortunately, Diekmann and colleagues built their proof-of-concept in a custom Gibbs sampler. So building directly on their code isn’t an option. To say this isn’t to fault them. But the era of custom Gibbs samplers is over. For the age estimates to be useful, we need to run the models that use them inside the same chain that estimates them—it makes no sense to discard the uncertainty in the posterior. Gibbs does not scale well enough and compares very poorly to alternative like Hamiltonian Monte Carlo. One of the scientists in my department, Dr Cody Ross [staff page], built a rough version of the birth rank model in Stan. Stan uses Hamiltonian Monte Carlo, so it can handle the high-dimension models we would wish to use age estimates in. But there’s a lot left to do, like partial ranks and all the biology.

We’ll be thinking more about this problem, since we have a lot of field data arriving that really needs a more mature treatment of age uncertainty. The zoologists and primatologists we know might also profit from such a project. We may even recruit a PhD student or post-doc to work specifically on this, building a re-deployable solution built atop Stan’s amazing engine.