It isn't hard to find people who are probably guilty. You could round up drivers after long journeys and accuse them of having exceeded the speed limit at some point on their travels. If this doesn't satisfy a threshold of “guilt beyond reasonable doubt”, then you can select them a little more stringently, using evidence of age, gender, occupation, or psychological profile until you exceed any probability criterion you like, based on statistical data. However, we do not consider this a basis for conviction: we require evidence in addition (for example, speed camera data, or times of departure and arrival) whose existence may not reasonably be held consistent with the hypothesis of innocence—however unlikely that hypothesis may be on statistical grounds. We do not (or at least we did not before September 11th, 2001) punish people with fines or imprisonment just because we think they are probably guilty. Is there a rational basis for a different approach by lawyers and statisticians to this issue? Is the legal system failing to do the statistically sensible thing, or is there a different statistical issue that is the basis for a fair trial?
The Royal Statistical Society made a representation1 to the Lord Chancellor about statistical evidence in court, in which one of several cogent points concerned the determination of probability of guilt. In relation to a case of alleged infanticide it was pointed out that to arrive at this probability a jury must take into account not only how unlikely was an innocent explanation, but also how unlikely was infanticide. This is certainly correct from a statistical point of view, as argued in more detail by Dawid2 ; but the point will doubtless have received a frosty reception among lawyers, because it would require that evidence of the incidence of an alleged crime be admissible as relevant in court, which is not generally the case. It is not part of our legal tradition that we should consider how common murder is, when we decide whether to convict someone of murder. If a child suffers innocent cot death, with no evidence to distinguish the circumstances from murder, then an accused parent would rightly be outraged if frequency of infanticide in his/her cultural group (or even in the general population) were cited in support of conviction. A jury might believe, on the basis of such evidence (or indeed their own personal notions or prejudices), that the defendant is probably guilty. But as long as they believe that the facts and witness statements might have arisen without guilt, then they should acquit. The incidence of murder has no logical relevance to this question, or to the degree of belief assigned to it. A jury may—legitimately in my view—emerge convinced that that they are almost certainly and regrettably acquitting a guilty person, but that this is the correct thing to do. Ultimately, this may be a moral issue: one should never convict in circumstances that one can envisage could apply to an innocent person, however likely it is that one acquits the guilty. The aim here is to explore the implications of such a principle.
One might argue (as seems to be implied in the references above) that such views and the legal constraints on evidence are wrong, and that court decisions should indeed be based on probability of guilt, given the evidence. A rational basis for this is the principle of maximisation of expected utility3 . According to this, one should weigh the benefits and costs to society of convicting or acquitting criminals or innocent persons, together with the likely numbers. Where utilities are quantifiable, it is a simple application of decision theory to see that the best criterion for conviction is whether probability of guilt exceeds a threshold dependent on the relative utilities. The threshold is high, because we consider it overwhelmingly undesirable (very”costly”) to convict innocent people. On this logic the phrase “guilty beyond reasonable doubt” is tantamount to “with very high probability of guilt”—the simplest Bayesian interpretation. But it also follows from this utilitarian argument that a decision might differ, despite identical evidence about a specific allegation, according to factors normally excluded from a jury's remit, such as the defendant's background or prior convictions, the relative diligence of police investigations in different directions and crime statistics—each capable of influencing a judgment of probability of guilt. Furthermore, the threshold probability for conviction would be lowered on this logic for some particularly serious allegations (for example, conspiracy to commit a monstrous terrorist act), where the negative utility of false acquittal might hugely exceed that of false conviction. Conviction might become justifiable on grounds merely of moderate suspicion. Utilitarian arguments are gaining ground in today's society, but are certainly contrary to the instincts of some lawyers4. Is a statistician necessarily a supporter of such arguments?
The balance of expected utilities is the basis of rational decisions in the face of uncertain evidence in many areas—medicine and engineering, for example—and is usually considered the best strategy to aspire to in such situations, even when the costs, benefits and risks are scarcely commensurate or quantifiable. But in law it seems different: a utilitarian strategy looks disturbingly like a totalitarian strategy. Conviction would be more likely if the alleged crime was common or serious, and could occur despite a defence case that was totally coherent and consistent with the known facts. Ethnicity or prior convictions could justifiably swing a verdict despite identical evidence in different cases, and conviction could be justified on grounds of suspicion alone. A state might operate effectively in this way to control behaviour and crime, but faith in the processes of law would clearly evaporate.
Contrast these issues with their equivalents in medicine. Nobody seriously challenges that a decision to risk hazardous treatment, given uncertain diagnosis, may correctly depend on factors such as how common or serious is the condition, and its statistical association with genetic or social characteristics of the patient. Treatment may be justified sometimes simply on suspicion, for example of a possible infection. Probability of the disease state is what counts, inferred on the basis of all relevant statistical factors. This is compared with the balance of prognosis and utilities: the likely good and bad outcomes of treatment and non-treatment with and without disease. There is no suggestion that decisions should be based solely on evidence related directly to the patient. No statistical factors are taboo. The critical probability for embarking on treatment may be high or low, depending on the risks, but there is no “presumption of health” analogous to a “presumption of innocence” nor any requirement that diagnosis be established “beyond reasonable doubt”. Of course, each individual patient, when competent, must make the final decision as the arbiter of many of the utilities involved. But the statistical issues seem much more straightforward than in law.
Let us examine three key propositions at issue in a criminal case. Each can be assigned its own epistemic probability, or degree of belief:
A. The known facts and testimony could have arisen if the defendant is guilty;
B. The known facts and testimony could have arisen if the defendant is innocent;
C. The defendant is guilty.
One principle is surely agreed by everyone: a jury must believe A, or assign it high probability, as a necessary (but not sufficient) condition for conviction. If we require, as argued above, that the jury must also believe not-B as a condition for conviction, then this entails belief in C: the defendant cannot be innocent if the known facts are inconsistent with innocence. But the reverse is not true: guilt (C) does not imply that the facts are inconsistent with innocence. The criterion not-B is therefore a more stringent requirement for conviction: it could reverse convictions based on belief in C, but could never reverse acquittals. It is arguable that categorical and rational belief in C should require belief in not-B, because this is the only basis on which one could ever be sure that C is true. I think this is correct. But beliefs are seldom categorical, nor necessarily even rational, and we have seen in the examples related to speeding and infanticide that strong (though less than certain) beliefs in both B and C may rationally coexist.
My contention is that a jury must only convict if it believes A to be plausible and B implausible (that is, beyond reasonable belief). These are essentially judgments of the likelihood or plausibility of stories set out by prosecution and defence, similar to those proposed elsewhere for civil cases5 though involving separate criteria rather than a comparison. Much background evidence that may influence belief in C is simply not relevant to A or B, and may therefore justifiably be omitted in court. It seems to me that this principle may provide a much more rational basis on which a judge may decide whether to admit such evidence than the often rather ad hoc criteria based on notional balancing of probative v. prejudicial influence, uncertain relevance to specific cases, or simple morality. Provided one is addressing the probability of B rather than C, it is often simply irrelevant to know, for example, whether the defendant has a criminal record or certain cultural beliefs.
Although this principle would resolve many of the problems raised at the outset, it creates two challenges. One is to define it more precisely in statistical terms. The second is to reconcile it with the principle of maximisation of utility that normally governs decisions. Neither challenge is easy, and I shall address the utility issue first. How can it make sense to set standards for conviction that do not depend on the probability that the defendant is guilty? The object of a trial is to gain the benefits of convicting guilty people, so surely the criterion should take account of the frequency with which this benefit is expected. If you were to think the benefit negligible compared with the cost of false convictions, then you should simply set a high threshold, and acquit almost everybody. Should a legal system really abandon the rationale of utilitarianism in such an elementary way? Or is there another form of utility in the equation, that can make these liberal principles rational? I think there is such a factor, quite consistent with the notions of Bentham and Mill, but embarrassingly almost outside the scope of a statistical approach.
What may, and perhaps should, count most in the mind of a juror in a democratic society is the approval of his or her peers. Benefits and costs, whatever decision is made, arise from society pronouncing: “You made a just (or unjust) decision.” This peer judgment can be influenced by any number of rational and irrational ideas of what makes for a fair and tolerable society. It is affected by the ease of identifying with potentially innocent defendants, and it is made without knowledge of whether the defendant was actually guilty, so cannot take account of the direct cost or benefit resulting from a verdict. With this perspective, decisions that maximise utility may be rational in no other sense than that they strive to meet with the approval of society. That is not to say that liberal principles based on plausibility of innocence are irrational. A society that acquits people because they could be innocent, however much more likely it may be that they are guilty, may be preferred rationally on the basis of higher level utilities: the notion that this is the most comfortable and stable form of society.
How can we analyse more clearly the plausibilities of A and B as statistical criteria? Review of a case typically leads to many judgments about whether various elements of testimony (true or false) would be plausible if different hypotheses were true about hidden facts and causes. The principal hidden fact is whether or not the defendant is guilty, and the jury's task is to combine these uncertainties to arrive at how plausible could be a story linking them all together, incorporating either of the two premises. These plausibilities or degrees of belief (PA , PB) can then be compared with what seem reasonable threshold criteria qA, qB as in Fig. 1a . It is important to note that these propositions employ the word could rather than would. To illustrate the distinction, I believe that if I were to buy a lottery ticket then I could win the lottery, but not that I would win. Beliefs in A and B are not concerned with the probability that the testimony in detail would be exactly as found, which would typically be very small. The issue is how believable it is that testimony could arise that was comparable in some sense to that observed. For the critical proposition B, the circumstances under consideration need to be expanded in two ways:
Where the existence of a crime is in doubt (as in the cases of cot death), we have an interesting difference between the basis on which belief in A and B must be assessed and compared with suitable thresholds qA , qB . For proposition A the “guilty” premise is clearcut: the defendant acted as charged, and we can assess probabilities based on this. For B the premise is that evidence of a non-existent crime has arisen through accident or misfortune. This premise leads to a rate (frequency in time, or Poisson rate) at which such circumstances would be likely to arise, not to an absolute probability. The plausibility of B (unlike A) must then be judged on the basis of how many times per year or per century, in a population of some size, such incriminating circumstances and testimony might be expected without the alleged offence. This is no more nor less ponderous a decision than one based on probability, but it is different.
There remains a problem of the compounding of uncertainties. However one judges the plausibilities or probabilities PA and PB , there will be uncertainty represented in Fig. 1b by subjective probability distributions. For example, one might think: “There is a chance that exhibit A was already present before the crime”, or: “If I trust the expert witness, the defence is implausible, but can I trust him?”. Without going into statistical detail, there is a dilemma here. Should one base a decision on an average plausibility, weighted in some way according to the uncertainties, as shown by vertical lines in Fig. 1b, which would give grounds for conviction? Or should one assess the probability (shown by the substantial shaded area) that resolution of the uncertainties reflected in the distribution would lead to acquittal? Neither procedure seems clearly to be more fair. For example, if one judges there is a 10% chance that certain critical testimony is unreliable, and in that case there would be an independent 10% chance that other apparently incriminating evidence could have arisen without guilt, then should one convict on the basis that the overall probability of both issues resolving in favour of the defendant is only 1%, or should one acquit on the basis that with 10% probability one would resolve just the one issue in favour of the defendant and then acquit (assuming a threshold criterion qB between 1% and 10%)? Personally, I would acquit.
These are dilemmas by no means unique to any particular mode of decision in a jury case, but they highlight ways in which motivation to maintain faith in the judicial process and pressure on investigators and expert witnesses to reduce uncertainties may come into conflict with straight Bayesian estimation of the probability of a correct conclusion.
How does all this relate to the probability of guilt, PC? In simple cases with a clear crime and evidence quantifiable in statistical terms, the probabilities PA and PB can be straightforward likelihoods, the logarithm of whose ratio (the log-likelihood-ratio) corresponds to the weight of evidence6 provided by the evidence in favour of guilt. The necessary and sufficient conditions for conviction (PA > qA and PB < qB) would then imply a large weight of evidence (> log(qA/qB)) favouring guilt. A large weight of evidence might itself be considered a reasonable criterion for conviction, though it fails to be applicable in cases without a definite crime, when PB is based on a rate rather a probability. But even in cases with a clearcut weight of evidence, this information is insufficient to determine probability of guilt without assuming a value for the prior probability of guilt, before taking account of the evidence2. This has to be based on the background evidence that we saw earlier is often relevant to C but not to A or B. Here is the fundamental problem with decisions based on probability of guilt: they require a prior probability, which is essentially prejudice—a bias or baseline judgment arrived at before seeing the evidence. If decisions were based on PA and PB instead, then much of the complex legal argument aimed at keeping the process fair and free from unsatisfactory prejudice would become straightforward, and juries might have a better chance of understanding more clearly their highly onerous and responsible task.
1. Royal Statistical Society (2002) Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases. http://www.rss.org.uk/main.asp?page=1225
2. Dawid, A. P. (2002) Bayes's Theorem and the weighing evidence by juries. Proc. Brit. Acad. 113, 71-90
3. Mill J. S., Bentham, J., Austin J., Warnock M. (1962) Utilitarianism. Glasgow: Fontana
4. Kennedy, H. (2004) Just Law. Chatto & Windus, London 333 pp.
5. Allen, R. J. (1986) A Reconceptualization of Civil Trials, B.U.L. REV. 66, 401-437
6. Good, I. J. (1950) Probability and the Weighing of evidence. London: Griffin
Tony Gardner-Medwin is a neuro-physiologist at University College London. He has a particular interest in the use of confidence-based marking to motivate students to justify their beliefs, to improve both learning and educational assessment.
Fig. 1 Plausibility of evidence related to propositions A and B in the text, with acceptable limits for conviction with and without quantifiable uncertainties.