Tag Archives: hypotheses

Truthseeking: Correlation Is Not Causation

One of the most oft-repeated mantras in the entire field of research methods is that correlation is not causation. The oft-used Latin phrase for the associated logical fallacy is post hoc, ergo propter hoc, which means “after this, therefore because of this.” It’s an old one: according to Britannica (n.d.), Aristotle included its earlier Greek version in his list of material fallacies.

The post hoc fallacy observes that two events occur in sequence, and assumes that the first must have caused the second. There was the sound of a gunshot; Mike went running down the street; therefore Mike’s running was due to the gunshot – when in fact Mike is deaf; he started running just to catch up with his friend.

A related fallacy differs only in its first word: cum hoc, ergo propter hoc, apparently best pronounced with a mix of short and long O sounds (i.e., “koom hock ergoh prah-pter hock” or, to some, “hoke” instead of “hock”). The first word, cum, means “with” rather than “after.” In this less commonly noticed logical error, the two things occur at the same time, and yet one is believed to cause the other. People start wearing shorts at around the same time as when ice cream sales rise; therefore wearing shorts causes people to want ice cream. For more examples, see Tyler Vigen’s book – providing, for instance, a graph of a very close correlation between U.S. spending on science and technology with suicides by hanging, strangulation, and suffocation.

In another post, I provided a real-world example of the post hoc fallacy, in a discussion of potentially erroneous interpretations of research into the relationship between insufficient sleep and Alzheimer’s disease. In that post, I said,

[W]riters about relevant research often make the post hoc logical error. For instance, among the top results in a search for relevant studies, I saw a Harvard Medical School article (Budson, 2021). … [That article’s title said,] “Sleep well – and reduce your risk of dementia and death.” [Budson] based that claim on two studies. But both clearly spoke in terms of associations; they did not claim to find a cause-effect relationship. As one of them … stated, “[S]hort sleep duration in midlife is [merely] associated with an increased risk of late-onset dementia” (emphasis added).

MathTutorDVD (2017) explains: “Correlation between two events or variables simply indicates that a relationship exists, whereas causation is more specific and says that one event actually causes the other.” As Amplitude (Madhavan, 2019) puts it in a different context, “[Y]ou might think you know which specific key activation event results in long-term user retention, but without rigorous testing you run the risk of basing important product decisions on the wrong user behavior” (emphasis added).

In the Harvard Medical School example, Budson (2021) said that sleeping well would reduce your risk of dementia and death. But that causal relationship has not been established; indeed, a causal relationship is often very hard to establish. According to Chambliss and Schutt (2013, p. 104),

Five criteria should be considered in trying to establish a causal relationship. The first three criteria are generally considered as requirements for identifying a causal effect: (1) empirical association, (2) temporal priority of the independent variable, and (3) nonspuriousness. You must establish these three to claim a causal relationship. Evidence that meets the other two criteria—(4) identifying a causal mechanism, and (5) specifying the context in which the effect occurs—can considerably strengthen causal explanations.

As Chambliss and Schutt explain, “empirical association” or “correlation” means the two fluctuate together. The shorts-and-ice-cream example might pass this test, whereas a claim that chewing gum is associated with changes in the wind’s direction would be nonsense: there would be (as far as we know) no correlation whatsoever.

Second, Chambliss and Schutt say, “temporal priority” means that the alleged cause comes before the alleged effect. The ice cream example would have a hard time with this test; ice cream and shorts seem to happen at roughly the same time. It would be pretty difficult to prove that a candle causes a match to catch fire; in human experience, it actually happens the other way around.

Third, “nonspuriousness” means you can’t have a spurious (i.e., false) association. This is where the ice cream example dies. Wearing shorts and eating ice cream are both caused by (or at least associated with) the arrival of warm weather. Take away shorts, keep the warm weather, and you still have people eating ice cream.

In the Harvard Medical School article, the problem was that researchers are not claiming to know that going short on sleep increases one’s risk of dementia. They don’t do that because the connection between sleep and dementia may be spurious. The one may not influence the other at all; both may instead be due to a third (often called a confounding) factor, or a combination of multiple factors. For instance, it could be that, in people with dementia, both insufficient sleep and dementia are caused by a combination of genetics and a particular type of impact to the head at some point in life – or maybe by exposure to a certain molecular compound in infancy, found perhaps in dust, or in some kinds of pollen.

The other post‘s exploration of links between insufficient sleep and Alzheimer’s noted another unfortunate tendency in the research literature. As SBU (2020) says, “risk factors” are often treated as if they were causal, when they may not be (see Wikipedia). For an example of problematic usage, Li et al. (2022) assert that “Nighttime sleep disturbances are known risk factors for developing cognitive impairment or dementia.” In that remark, Li et al. seem to be claiming a degree of certainty (i.e., these are “known” risk factors) that exceeds what the underlying research supports. All 1 2 3 of their cited studies speak clearly of mere “associations.”

It is not sufficient to find that an association is “strong”; that could merely mean that the underlying cause (e.g., mild brain injury in childhood) almost always yields both dementia and (to cite one such “risk factor”) sleep fragmentation. Indeed, the causal direction could be the opposite of what such writers believe: the underlying cause could trigger early brain changes that begin the progression toward dementia; that progression could then cause sleep fragmentation. Such a hypothesis would be consistent with the impression that Alzheimer’s seems to aggravate sleep problems.

Gianicolo et al. (2020) find roots of causation – and problems for medical researchers – in Hume:

According to the eighteenth-century philosopher David Hume, causality is present when two conditions are satisfied: 1) B always follows A—in which case, A is called a “sufficient cause” of B; 2) if A does not occur, then B does not occur—in which case, A is called a “necessary cause” of B. …

In many scientific disciplines, causality must be demonstrated by an experiment. In clinical medical research, this purpose is achieved with a randomized controlled trial (RCT). An RCT, however, often cannot be conducted for either ethical or practical reasons. If a risk factor such as exposure to diesel emissions is to be studied, persons cannot be randomly allocated to exposure or non-exposure. Nor is any randomization possible if the research question is whether or not an accident associated with an exposure, such as the Chernobyl nuclear reactor disaster, increased the frequency of illness or death.

Gianicolo et al. (2020) seem to think that medical practitioners are practicing a form of science that must assume causation when nothing more than association has been established. An assumption of causation may be relatively easy when the association has been well-studied, as in their example involving cigarette smoking and lung cancer. But most medical associations have much less scientific support. As I have learned from my own exposure to sometimes ill-advised tests and treatments, medicine today is often a mélange of the semi-known, the unknown, and the misunderstood – not to mention the unaffordable.

Strictly speaking, even an RCT that may seem to establish causation remains vulnerable in principle to further questioning, as indicated by the University of California’s Museum of Paleontology (n.d.; see also InfluentialPoints, n.d.):

The concept of proof — real, absolute proof — is not particularly scientific. Science is based on the principle that any idea, no matter how widely accepted today, could be overturned tomorrow if the evidence warranted it. Science accepts or rejects ideas based on the evidence; it does not prove or disprove them.

That view may be consistent with Karl Popper’s philosophy of science. McLaughlin (2006) summarizes Popper as contending that “a knowledge claim is scientific … not when it is true or proven, but when it [survives] systematic attempts to falsify it.” This is widely understood as an obligation to “rule out” as many competing explanations as possible. It’s not that we know the truth; it’s that at least we are not spouting demonstrable falsehood.

In law – to cite another example of a practical field that tries to apply rules to ascertain truth – Popper’s “systematic attempts to falsify” are generally presented by the opposing side. The Paulson & Nace law firm (n.d.) illustrates this in the context of medical malpratice litigation, where they say causation is proved when the plaintiff shows that a doctor-patient relationship exists, the standard of care was breached, the negligent care directly caused the harm, and demonstrable damages occurred. That’s legal causation, not proof of the truth. There could be persuasive evidence out there somewhere, not discovered or presented effectively by the defendant, that the harm was actually caused by something other than what the plaintiff alleges.

In fields like epidemiology, where RCTs are typically infeasible, researchers often look to the classic Bradford Hill (1965) criteria for identifying causality. Others (e.g., Fedak et al., 2015; Lucas & McMichael, 2005) have sought to refine those criteria in light of contemporary research practice. The Bradford Hill criteria, listed by Wikipedia, may be characterized as follows:

  1. Strength (effect size): the larger the association, the more likely that it is causal.
  2. Consistency (reproducibility): causation is more likely when findings are consistent across multiple studies, in different places with different samples.
  3. Specificity: causation is more likely if there is a very specific population at a specific site and outcome with no other likely explanation.
  4. Temporality: effect follows cause, and also follows any expected delay between the cause and the expected effect.
  5. Biological gradient (dose-response relationship): greater exposure to the alleged cause will usually yield greater incidence of the effect.
  6. Plausibility: to the extent feasible within existing knowledge, it helps if the alleged cause-effect relationship makes sense.
  7. Coherence: causation is more likely if epidemiological and laboratory findings agree.
  8. Experiment: supportive experimental evidence is a plus.
  9. Analogy: similarities between the observed association and any other associations may help to support causation.

As such criteria suggest, Popperian attempts to falsify a truth claim can generate many reasons to doubt causation. Returning to the example of the claim by Li et al. (2022) (i.e., that “Nighttime sleep disturbances are known risk factors for developing cognitive impairment or dementia”), there is a problem with the Specificity criterion, insofar as it appears that none of the underlying studies sought to rule out caffeine as a “likely explanation” – as, that is, a potential cause of fragmented sleep and, over the longer term, of dementia. That oversight is remarkable in, for instance, the study by Lim et al. (2013), which explicitly controlled for “the use of common medications which can affect sleep.” In the Lim study, there is also a problem with the Strength criterion: some study participants with little to no sleep fragmentation still wound up with dementia. There may be other problems as well, in that and/or in the other studies that Li et al. (2022) relied upon.

Readers of scientific literature may encounter terms like “association,” “correlation,” and “risk factor” on a daily basis. Over time, it can become difficult to remember that research finding links between two phenomena may not at all establish that one caused the other. The scientific knowledge underlying most science-oriented fields of endeavor is a hodgepodge of findings and ideas that vary greatly in the extent to which they have survived thorough testing.

State-of-the-art research can be interesting and even exciting – and it can also be completely mistaken, though it may take years to demonstrate that. The urge to interpret correlations as causation can seem natural – and it can kill people. The pursuit of truth calls, not only for the thrill of new insights, but also for patience and caution developed through experience with false hopes of the past. So, no, correlation is definitely not causation.