## One problem with models

http://www.economist.com/science/displaystory.cfm?story_id=9645336

The article highlights a typical problem with statistical models that use many “runs” to come up with most likely results — that is, how to vary the parameters for each “run”. These kind of models are used in many places, for example in determining whether someone’s retirement fund invested in the stock markets will be sufficient given the market volatilities.

In the example cited in the article, the parameters were varied on a linear basis. From my perspective, that’s just sloppy: each parameter should have its own statistical distribution (what is its most common value? how far up/down the scale does it vary? etc.) and the values that go into each “run” should be determined based on that distribution. So for parameters that are the same except in the way that they are expressed, the distribution curve would look the same and there would be no problem.

The problem, which the article does not sufficiently stress, is that a lot of the parameters may be related in a way that is much more complex, and what’s even more important, in a way we may not know. In the first instance, making the model more complex (e.g. using a small model to determine the parameters that go into the main model) would lessen the problem. But what we do not know can severely affect the outcome.

————————————

Statistics and climatology

# Gambling on tomorrow

Aug 16th 2007

From *The Economist* print edition

## Modelling the Earth’s climate mathematically is hard already. Now a new difficulty is emerging

Illustration by Dettmer Otto

“SCIENCE” is a recently coined word. When the Royal Society, the world’s oldest academy of the discipline, was founded in London in 1660, the subject was referred to as natural philosophy. In the 19th century, though, nature and philosophy went their separate ways as the natural philosophers grew in number, power and influence.

Nevertheless, the link between the fields lingers on in the name of one of the Royal Society’s journals, *Philosophical Transactions*. And appropriately, the latest edition of that publication, which is devoted to the science of climate modelling, is in part a discussion of the understanding and misunderstanding of the ideas of one particular 18th-century English philosopher, Thomas Bayes.

Bayes was one of two main influences on the early development of probability theory and statistics. The other was Blaise Pascal, a Frenchman. But, whereas Pascal’s ideas are simple and widely understood, Bayes’s have always been harder to grasp.

Pascal’s way of looking at the world was that of the gambler: each throw of the dice is independent of the previous one. Bayes’s allows for the accumulation of experience, and its incorporation into a statistical model in the form of prior assumptions that can vary with circumstances. A good prior assumption about tomorrow’s weather, for example, is that it will be similar to today’s. Assumptions about the weather the day after tomorrow, though, will be modified by what actually happens tomorrow.

Psychologically, people tend to be Bayesian—to the extent of often making false connections. And that risk of false connection is why scientists like Pascal’s version of the world. It appears to be objective. But when models are built, it is almost impossible to avoid including Bayesian-style prior assumptions in them. By failing to acknowledge that, model builders risk making serious mistakes.

## Assume nothing

In one sense it is obvious that assumptions will affect outcomes—another reason Bayes is not properly acknowledged. That obviousness, though, buries deeper subtleties. In one of the papers in *Philosophical Transactions* David Stainforth of Oxford University points out a pertinent example.

Climate models have lots of parameters that are represented by numbers—for example, how quickly snow crystals fall from clouds, or for how long they reside within those clouds. Actually, these are two different ways of measuring the same thing, so whether a model uses one or the other should make no difference to its predictions. And, on a single run, it does not. But models are not given single runs. Since the future is uncertain, they are run thousands of times, with different values for the parameters, to produce a range of possible outcomes. The outcomes are assumed to cluster around the most probable version of the future.

The particular range of values chosen for a parameter is an example of a Bayesian prior assumption, since it is derived from actual experience of how the climate behaves—and may thus be modified in the light of experience. But the way you pick the individual values to plug into the model can cause trouble.

They might, for example, be assumed to be evenly spaced, say 1,2,3,4. But in the example of snow retention, evenly spacing both rate-of-fall and rate-of-residence-in-the-clouds values will give different distributions of result. That is because the second parameter is actually the reciprocal of the first. To make the two match, value for value, you would need, in the second case, to count 1, ½, ⅓, ¼—which is not evenly spaced. If you use evenly spaced values instead, the two models’ outcomes will cluster differently.

Climate models have hundreds of parameters that might somehow be related in this sort of way. To be sure you are seeing valid results rather than artefacts of the models, you need to take account of all the ways that can happen.

That logistical nightmare is only now being addressed, and its practical consequences have yet to be worked out. But because of their philosophical training in the rigours of Pascal’s method, the Bayesian bolt-on does not come easily to scientists. As the old saw has it, garbage in, garbage out. The difficulty comes when you do not know what garbage looks like.

No comments yet.

## Leave a Reply