Modelling Failures

Nothing really new here, but pulling a few things together.

Start with Joseph K's observation:

Between the replication crisis and the Great Poll Failure of 2016, quantitative social science has basically committed suicide
— Joseph K. (@fxxfy) November 9, 2016

This is a good point, and I added that the failure of financial risk models in 2008 was essentially the same thing.

The base problem is overconfidence. "People do not have enough epistemic humility", as Ben Dixon put it.

The idea in all these fields is that you want to make some estimate about the future of some system. You make a mathematical model of the system, relating the visible outputs to internal variables. You also include a random variable in the model.

You then compare the outputs of your model to the visible outputs of the system being modelled, and modify the parameters until they match as closely as possible. They don't match exactly, but you make the effects of your random variable just big enough that your model could plausibly produce the outputs you have seen.

If that means your random variable basically dominates, then your model is no good and you need a better one. But if the random element is fairly small, you're good to go.

In polling, your visible effects are how people answer polling questions and how they vote. In social science, it's how subjects behave in experiments, or how they answer questions, or how they do things that come out in published statistics. In finance, it's the prices at which people trade various instruments.

The next step is where it all goes wrong. In the next step, you assume that your model—including its random variable to account for the unmeasured or unpredictable—is exactly correct, and make predictions about what the future outputs of the system will be. Because of the random variable, your predictions aren't certain; they have a range and a probability. You say, "Hillary Clinton has a 87% chance of winning the election". You say "Reading these passages changes a person's attitude to something-or-other in this direction 62% of the time, with a probability of 4.6% that the effect could have been caused randomly". You say, "The total value of the assets held by the firm will not decrease by more than 27.6 million dollars in a day, with a probability of 99%".

The use of probabilities suggests to an outsider that you have epistemic humility--you are aware of your own fallibility and are taking account of the possibility of having gone wrong. But that is not the case. The probabilities you quote are calculated on the basis that you have done everything perfectly, that you model is completely right, and that nothing has changed in between the production of the data you used to build the model and the events that you are attempting to predict. The unpredictability that you account for is that which is caused by the incompleteness of your model—which is necessarily a simplification of the real system—not on the possibility that what your model is doing is actually wrong.

In the case of the polling, what that means is that the margin of error quoted with the poll is based on the assumptions that the people polled answered honestly; that they belong to the demographic groups that the pollsters thought they belonged to, that the proportion of demographic groups in the electorate are what the pollsters thought they were. The margin of error is based on the random variables in the model: the fact that the random selection of people polled might be atypical of the list they were taken from, possibly, if the model is sophisticated enough, that the turnout of different demographics might vary from what is predicted (but where does the data come from to model that?)

In the social sciences, the assumptions are that the subjects are responding to the stimuli you are describing, and not to something else. Also that people will behave the same outside the laboratory as they do inside. The stated probabilities and uncertainties again are not reflecting any doubt as to those assumptions: only to the modelled randomness of sampling and measurement.

On the risk modelling used by banks, I can be more detailed, because I actually did it. It is assumed that the future price changes of an instrument follow the same probability distributions as in the past. Very often, because the instruments do not have a sufficient historical record, a proxy is used; one which is assumed to be similar. Sometimes instead of a historical record or a proxy there is just a model, a normal distribution plus a correlation with the overall market, or a sector of it. Again, lots of uncertainty in the predictions, but none of it due to the possibility of having the wrong proxy, or of there being something new about the future which didn't apply to the past.

Science didn't always work this way. The way you do science is that you propose the theory, then it is tested against observations over a period of time. That's absolutely necessary: the model, even with the uncertainty embedded within it, is a simplification of reality, and the only justification for assuming that the net effects of the omitted complexities are within error bounds is that that is seen to happen.

If the theory is about the emission spectra of stars, or the rate of a chemical reaction, then once the theory is done it can be continually tested for a long period. In social sciences or banking, nobody is paying attention for long enough, and the relevant environment is changing too much over a timescale of years for evidence that a theory is sound to build up. It's fair enough: the social scientists, pollsters and risk managers are doing the best they can. The problem is not what they are doing, it is the excessive confidence given to their results. I was going to write "their excessive confidence", but that probably isn't right: they know all this. Many of them (there are exceptions) know perfectly well that a polling error margin, or a p-value, or a VaR are not truly what the definitions say, but only the closest that they can get. It is everyone who takes the numbers at face value that is making the mistake. However, none of these analysts, of whichever flavour, are in a position to emphasise the discrepancy. They always have a target to aim for.

For a scientist, they have to get a result with a p-value to publish a paper. That is their job: if they do it, they have succeeded, otherwise, they have not. A risk manager, similarly, has a straightforward day-to-day job of persuading the regulator that the bank is not taking too much risk. I don't know the ins and outs of polling, but there is always pressure. In fact Nate Silver seems to have done exactly what I suggest: his pre-election announcement seems to be been along the lines "Model says Clinton 85%, but the model isn't reliable, I'm going to call it 65%". And he got a lot of shit for it.

Things go really bad when there is a feedback loop from the result of the modelling to the system itself. If you give a trader a VaR budget, he'll look to take risks that don't show in the VaR. If you campaign so as to maximise your polling position, you'll win the support of the people who don't bother to vote, or you'll put people off saying they'll vote for the other guy without actually stopping them voting for the other guy. Nasty.

Going into the election, I'm not going to say I predicted the result. But I didn't fall for the polls. Either there was going to be a big differential turnout between Trump supporters and Clinton supporters, or there wasn't. Either there were a lot of shy Trump supporters, or there weren't. I thought there was a pretty good chance of both, but no amount of Data was going to tell me. Sometimes you just don't know.

That's actually an argument for not "correcting" the polls. At least if there is a model—polling model, VaR model, whatever—you can take the output and then think about it. If the thinking has already been done, and corrections already applied, that takes the option away from you. I didn't know to what extent the polls had already be corrected for the unquantifiables that could make them wrong. The question wasn't so much "are there shy Trump voters?" as "are there more shy Trump voters than some polling organisation guessed there are?"

Of course, every word of all this applies just the same to that old obsession of this blog, climate. The models have not been proved; they've mostly been produced honestly, but there's a target, and there are way bigger uncertainties than those which are included in the models. But the reason I don't blog about climate any more is that it's over. The Global Warming Scare was fundamentally a social phenomenon, and it has gone. Nobody other than a few activists and scientists takes it seriously any more, and mass concern was an essential part of the cycle. There isn't going to be a backlash or a correction; there won't be papers demolishing the old theories and getting vast publicity. Rather, the whole subject will just continue to fade away. If Trump cuts the funding, as seems likely, it will fade away a bit quicker. Lip service will occasionally be paid, and summits will continue to be held, but less action will result from them. The actual exposure of the failure of science won't happen until the people who would have been most embarrassed by it are dead. That's how these things go.

Labels: climate and religion, modern history, voting