Some statisticians (see e.g. Refs.1,2 below) have indicated that the task in such situations is to estimate the probability that the defendant is guilty, given the evidence presented (denoted P(G|E) ). Only if this probability is greater than some high (albeit debatable) figure, perhaps 99%, should a jury convict. The legal presumption of innocence (and the corresponding requirement that guilt must be proved 'beyond reasonable doubt') would then be reflected in this high threshold, which ensures that defendants will be acquitted unless the jury considers that the probability that they are innocent is extremely low. I am not concerned with a debate about the level at which this threshold should be set, but about whether the probabilities of guilt or innocence are indeed the ones that should be addressed. Neither of the cited authors presents a case for asserting this, and indeed in correspondence Dawid suggests that it is simply 'natural and obvious'.
I want to argue that this is wrong. I certainly grant that it does seem at first sight natural, but I do not think it is either what is best expressed by the legal phrase 'beyond reasonable doubt' or what is socially and politically desirable, for reasons I shall discuss. I believe it is the probability that such evidence could arise without guilt that is critical. This, like the probability of guilt, is a Bayesian probability: it is the degree of belief that the facts could have arisen consistent with innocence (i.e. broadly, that the defence account is plausible). If this proposition is capable of reasonable belief, above some low threshold probability, then the jury should acquit - even if it may appear much more likely that the evidence arose through guilt. The difference may at first sight seem abstruse, but it has profound consequences for the issues that should be taken into account in a trial, for the perceived fairness of the system, and in some cases for the outcome.
To see the difference, consider a situation that jury members must commonly encounter. They may believe that the defendant is probably guilty, yet find the defence case both plausible and completely consistent with the evidence, so they must acquit. A simple instance would be a caricature related to the Sally Clark case, in which a death was either due to SIDS (Sudden Infant Death Syndrome) or murder. Suppose there is a city in which 1 in 100 female children are suffocated by a parent preferring a male child, while 1 in 10,000 die from SIDS. On the evidence of a mysterious female cot death alone, a jury would conclude there was a 99% probability that this case is murder, yet SIDS is entirely plausible (some SIDS cases are certainly expected to occur in a large population), and is wholly consistent with the evidence. If you ask the question 'Is doubt (i.e. the hypothesis that SIDS has occurred) reasonable?' the answer is surely yes. Such cases are bound to occur and, when they do, the evidence (an unexplained death) is what results. Acquittal is surely indicated. The high incidence of infanticide in the population (which is what leads to the conclusion that murder is much more likely than SIDS) should have no bearing on the judgement about conviction or acquittal and should not (and probably would not) be admissible evidence in court.
One can argue in such a situation (as Dawid implicitly does) that the decision should not be made in the way described, but according to the principle of maximising expected utility. On this basis one would take into account the benefits to society of convicting murderers and the costs of acquitting them, as well as the benefits and costs of acquitting and convicting innocent people. To make an optimal decision with such an objective, one does need to arrive at the Bayesian posterior probability P(G|E). The relative costs and benefits could then justify conviction in a case like that described, because the benefits of locking up murderers might be considered to outweigh the costs of convicting innocent people. Maximising expected utility is the basis of many rational decisions (including the majority of medical decisions based on uncertain evidence). But it violates what I understand to be the principle of presumption of innocence. It is not what I understand by proof of guilt 'beyond reasonable doubt'; and it requires that evidence be introduced in court (for example, about the incidence of a crime) that would not, I believe, be considered admissible in the UK. It would be a totalitarian solution. Conviction, given equivalent weight of evidence against you, would be more likely if your accused crime was common and/or serious. Conviction could occur despite a defence case that is totally coherent and consistent with the known facts. White persons charged with crime in areas where crime is mostly carried out by blacks would be more likely to be acquitted than black persons ***, despite identical evidence against each. Faith in the processes of law would clearly evaporate.
All these problems disappear (except for the cost to society of letting some guilty defendants go free for lack of evidence) if instead of trying to gauge the probability that a defendant is guilty, we gauge the probability that the evidence could have occurred, given the assumption that the defendant is innocent. Fortunately this is, I think, what courts in the UK try instinctively to do. Of course they sometimes conspicuously fail (as in the notorious Sally Clark case** and many others) through inability to understand statistical issues fully, or the lack of the kind of expert statistical testimony advocated by the Royal Statistical Society ( Ref. 2) in the wake of the Sally Clark case. This case was exceptional in that there were many failings in court : timely correction of any one of these could probably have led to an early acquittal. However, its reliance in the main on very simple evidence (2 unexplained cot deaths) highlights the inappropriateness of resolving uncertainty by attempting to estimate the probability that a defendant is guilty. To do this would require (as pointed out in Refs.1,2) consideration of evidence about the incidence of infanticide in the population, so as to compare the likelihoods that the known facts were an example of murder or misfortune. Conviction on the basis that 'lots of people do indeed commit such a crime' is not what we want from a legal system, and the idea of taking such evidence into account is neither ethically (nor legally, under English law) acceptable. Interestingly, this case makes possible a simple comparison with a hypothetical medical scenario in which, say, a near fatal cot crisis might be due to condition A which is treatable through a risky operation or condition B which is untreatable. It would be entirely appropriate to base a decision whether to operate on evidence that included relative incidence, racial and genetic factors, etc., as well as direct evidence about the patient. There is no equivalent of a 'presumption of innocence' that applies to good health, to particular diseases, or to the right to be treated or untreated. These decisions are sensibly taken on the basis of relative probabilities and the relative risks, costs and benefits (the latter, preferably as perceived by the patient) of the various outcomes.
I have included a more formal discussion of the probabilistic strategies in a brief appendix*****. Though not particularly technical, those unfamiliar with probability formalisms may prefer to skip this. My thesis is that juries should consider the probability that, given all the facts and uncertainties, the defendant's case could be true. This is different from the probability that the defendant is innocent, satisfactorily analysed by Dawid. Both criteria for a decision can be equally clear, though each may be hard to quantify. My concern is not with the mathematics but with the type of question asked, and the potentially severe consequences if this is wrong. Many court cases are of course complex so that a jury must rely for judgement on intuition to handle issues that are interdependent and essentially unquantifiable. But this makes it all the more important, I suspect, to clarify what question is being asked, and the appropriate logic.
A.R. Gardner-Medwin, Physiology, UCL, London WC1E 6BT
24/3/03, and subsequently revised
1. A.P. Dawid (2001) Bayes theorem and weighing evidence by juries. http://22.214.171.124/evidence/content/dawid-paper.pdf
2. Royal Statistical Society (2002) Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases. (see also http://www.rss.org.uk/statsandlaw )
* 'Weight of evidence' corresponds here to what a jury person might describe as the 'strength of the case against the defendant', and in technical terms, in at least simple cases, to the sum of the log likelihood ratios given guilt and innocence, for independently relevant facts - including the absence of any observations that might have been expected on one or other hypothesis.
** Sally Clark Case. In that case (see http://www.sallyclark.org.uk/ ) the notoriously small claimed incidence of double SIDS in the population - 1 in 73 million - was probably a huge underestimate, principally because it treated SIDS incidents as necessarily independent (as pointed out by the Royal Statistical Society, in a Press Release (23/10/01) and letter (23/1/02) to the Lord Chancellor, and now seemingly agreed by everyone). But in my view even a correct figure would have been quite irrelevant because the defendant came to court for the (more or less) sole reason that she had lost 2 children through either murder or SIDS. Only two hypotheses were tenable about the defendant: she was a murderer or had experienced double SIDS. On each of these two hypotheses the likelihood of the principal evidence arising (2 unexplained deaths) was unity, not some small number referred to the general population. The defendant was not selected at random from the population and then found to have lost 2 babies (which is the situation in which the disputed probability for the incidence of double SIDS would apply); she was selected because she had lost 2 babies in an unexplained way. Given this, the critical statistic ( the probability that the evidence in court would be at least as incriminating as presented, if the defendant was in fact innocent) was essentially unity. The case should have been thrown out because the circumstances that brought her to court would have brought to court any unfortunate victim of double SIDS. Only if double SIDS was considered wholly untenable as a hypothesis (i.e. bound never to occur in the population) would there have been any case for conviction on the basis of evidence that consisted almost entirely of the 2 deaths alone. The letter from the Royal Statistical Society referred to above understates this problem rather severely when it says: 'The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value.' The justification it goes on to offer for this statement is the same as Dawid's, based on an assumption that the task is to establish the probability of guilt: 'Two deaths by murder may well be even more unlikely. What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation'. The Lord Chancellor could be excused if he dismissed this argument, since it seeks to introduce evidence (about the incidence of a crime) that would not be admissible in court. Unfortunately he probably failed to see that a different argument leads to a much stronger conclusion wholly consistent with the law of evidence.
*** Selective use of evidence.
It has been suggested to me that rejecting evidence about relative
rates in different ethnic groups, while accepting evidence linked more
directly to the crime, is really a moral choice I am introducing - not
justified on purely statistical grounds. I don't think so. Suppose
A is that the defendant belongs to a crime-prone ethnic group, and
B is that when apprehended he had blood on his shirt. Both are
likely to be true if the defendant is guilty (G) than if he is not
(~G) and both contribute to the probability P(G|E) that in the light of
all the known facts, the defendant is guilty. However, I argue
only B and not A contributes to proper assessment of the case, given
presumption of innocence. Evidence A may affect the prior
that the defendant will turn out to be guilty, judged even before
of evidence in the case****, and as a
it may affect the posterior probability P(G|E) after the case; but I
that it should not (on purely statistical grounds) contribute to the
issue, which is the weight of evidence favouring G or ~G in the
case and relating to the specific defendant. One can see this by
to introduce the ethnic evidence into the case. One can apply the 'How
come?' test to the evidence: the likelihoods that a jury is
about a piece of evidence are essentially, in an adversarial legal
the probabilities that explanations offered by the defence (answers to
the question 'How come this evidence arose, given the hypothesis ~G?')
and by the prosecution (the same, given the hypothesis G) could be
In the case of evidence A these answers are identical: 'The
belongs to this ethnic group because he was born into it.' In
the case of B the answers will be different, and their likelihoods may
be different, thus contributing to the weight of evidence in the case.
The handling of such issues can be trickier in cases that are in a sense intermediate. Suppose evidence C is that the defendant belongs to a group sworn to kill the person who was the victim in a case. My argument says that logically this evidence is irrelevant, since the explanation of 'How come you are a member of this group?' is unlikely to be very different from defence and prosecution, with therefore the same likelihood, and it therefore does not weigh in favour of G or ~G in relation to the specific case. Yet it is surely hard to argue that it should be inadmissible evidence or that it should not sway a jury. The defence case is presumably along the lines 'Yes, the defendant would have taken an opportunity to kill the victim, but in fact he was in this case simply a bystander'. My position is that the jury should indeed consider simply the likelihood, on the evidence, that such a sworn killer was in this instance a bystander. Part of this assessment might be the answer to the question 'How come, given you were sworn to kill this person, you did not - according to your case - take part?'. The defendant must be acquitted if the answers to such questions, in the light of the facts, do not seem too improbable. The way in which I would feel prepared, as a jury person, to be swayed more directly by the evidence C per se is actually not in relation to any weight of evidence favouring guilt, but in relation to the criterion I would set for 'reasonable doubt'. Given C (though not ethnic evidence such as A) I might be more inclined to convict someone on the basis of a case that was otherwise weak, on the grounds that the undesirability of a false conviction (negative utility) was somewhat reduced by evidence C. 28/3/03
**** Relevance of the process of selection for prosecution. Strictly speaking evidence A might not even do this in the expected manner, depending on how the defendant was selected for trial. If the police apprehended him because he was the only person in the vicinity of a minority crime-prone ethnic group, his ethnicity might actually render him a priori less likely to be guilty than someone apprehended from the majority culture, for whom there would presumably be some more solid basis for selection. 28/3/03
***** A more formal approach. How is one to resolve uncertainty and establish conviction beyond reasonable doubt if it is not a question of judging the probability of guilt? The failure of this, the simplest mathematical interpretation of the task in hand, combined with the obvious difficulty there is in quantifying uncertainties in anything but the rarest of court cases, could conspire to the view that probability theory has little or no place in the courtroom. But the mathematics allows one to distinguish alternative approaches that are crisply different when the facts are simple, and that still form the basis for distinct intuitive approaches in more difficult cases. A first step is to dissect the various stages of a Bayesian argument to arrive at a probability of guilt in the light of evidence:
The process of assessing the weight of evidence (stage 3) in the conventional Bayesian analysis is one of arriving at an overall likelihood ratio - the ratio Pg/Pi of probabilities that the evidence, taken as a whole, would be observed if the defendant is either guilty (Pg) or innocent (Pi). The posterior probability can then be calculated (stage 4) by multiplying the prior odds ratio from stage 1 (r = p/(1-p) where p is the prior probability of guilt) by this likelihood ratio, to obtain the posterior odds ratio (r' = r Pg/Pi). This ratio can then be straightforwardly converted to the posterior probability p' = r'/(1+r'). This set of operations implements Bayes theorem. However, as discussed above, a preferred strategy would cut out stages 1 and 4 altogether and treat Pg and Pi differently. Defining the weight of evidence* as W=log(Pg/Pi) - essentially a measure of the strength of the case against the defendant - we can ask the question'What is the probability that a person who is innocent, possessing the established characteristics of the defendant, and selected for trial in the manner of the defendant, would encounter a case against him/her that is as strong or stronger than W?' . This is a question about the distribution of W, as a random variable. Its assessment, like that of any other probability in court, can be difficult - requiring the weighing of uncertainties and explanations about potentially unreliable testimony. It is essentially equivalent to the question 'What probability (or degree of belief) can one assign to the narrative provided by the defence (and challenged by the prosecution), as an explanation of the evidence without implying guilt?' This question successfully distinguishes between legally admissible and inadmissible forms of evidence that could each influence the wrong type of probability (the probability of guilt). If a defendant has a general characteristic G (ethnic, genetic, income, appearance, mode of speech, prior conviction, etc.) that is more common amongst guilty than innocent persons, or if s/he is the subject of a piece of seemingly directly relevant evidence D (seen holding a smoking gun, running away from the scene, with matching DNA, etc.) the question is the same : how do you account, if you are innocent, for the fact that you are (or were) foreign, feeble-minded, poor, scruffy, ill-spoken, previously in prison, reported to be seen with a smoking gun, running away from the scene, with matching DNA, etc.. Expressed this way, the narratives in response to characteristics G obviously do not bear on the probability that the case against this particular defendant would be as strong as it actually is: they are fixed things that would be the same in any case, whether the defendant was innocent or guilty. On the other hand, the narratives in response to testimony of type D are the normal stuff of criminal trials - the jury must assess whether it is plausible, as perhaps claimed, that the witness is lying, that the gun was a plant, that the defendant really was trying to catch a bus, or that the police really did bring to court with no further evidence the first person from a database of 10 million found to match (with a false identification rate of, say, 1 in 100,000) the DNA at the crime scene, etc., etc..
This approach helps in principle to do away with the need for careful selection of evidence presented to a jury (2, in the initial scheme above). In principle, a jury making what I think is the correct type of decision should always be able to make a better decision by being better informed with more facts of any type. For this reason, one could argue that in principle they should even be allowed to know the criminal history of the defendant (contrary to UK practice) because it can be relevant to interpretation of the way the defendant has reacted to circumstances. More important, they should know the full narrative of how the defendant came to be selected for trial, since the manner in which something is selected can be highly relevant to the probabilistic inferences that can be drawn from its characteristics. (For example, consider the DNA match referred to in the last paragraph: it would be dramatically more significant if the match were found as a result of testing the victim's next door neighbour. This example is extreme, but I guess there have been many miscarriages of justice because juries have been kept in ignorance of police procedure, even procedure carried out in good faith and consistent with good protocols.) Despite these arguments in principle that favour maximum availability of information, one must recognise the difficulty for a jury of keeping clear the nature of a decision to be made. Rules of evidence that suppress information (perhaps particularly criminal history) may be justified on the grounds that the jury might be inappropriately swayed to make a decision on the wrong basis - on the basis of the probability of guilt rather than the plausibility of innocence.