All Medications Are Insignificant In The Eyes Of God And Traditional Effect Size Criteria
SSRI antidepressants like Prozac were first developed in the 1980s and 1990s. The first round of studies, sponsored by pharmaceutical companies, showed they worked great! But later, skeptics found substantial bias in these early trials; several later analyses that corrected for this all found effect sizes (compared to placebo) of only 0.30. Is an effect size of 0.30 good or bad? The usual answer is “bad”. The UK’s National Institute for Clinical Excellence used to say that treatments were only “clinically relevant” if they had an effect size of 0.50 or more. The US FDA apparently has a rule of thumb that any effect size below 0.50 is “small”. Others are even stricter. Leucht et al investigate when doctors subjectively feel like their patients have gotten better, and find that even effect size 0.50 correlates to doctors saying they see little or no change. Based on this research, Irving Kirsch, author of some of the earliest credible antidepressant effect size estimates, argues that “[the] thresholds suggested by NICE were not empirically based and are presumably too small”, and says that “minimal improvement” should be defined as an effect size of 0.875 or more. No antidepressant consistently attains this. He wrote:
…sparking a decade of news articles like Antidepressants Don’t Work - Official Study and Why Antidepressants Are No Better Than Placebos. Since then everyone has gotten into a lot of fights about this, with inconclusive results. Recently a Danish team affiliated with the pharma company Lundbeck discovered an entirely new way to get into fights about this. I found their paper, Determining maximal achievable effect sizes of antidepressant therapies in placebo-controlled trials, more enlightening than most other writing on this issue. They ask: what if the skeptics’ preferred effect size number is impossible to reach? Consider the typical antidepressant study. You’re probably measuring how depressed people are using a test called HAM-D - on one version, the scale ranges from 0 to 54, anything above 7 is depressed, anything above 24 is severely depressed. Most of your patients probably start out in the high teens to low twenties. You give half of them antidepressant and half placebo for six weeks. By the end of the six weeks, maybe a third of your subjects have dropped out due to side effects or general distractedness. On average, the people left in the placebo group will have a HAM-D score of around 15, and the people left in the experimental group will have some other score depending on how good your medication is. The Danes simulate several different hypothetical medications. The one I find most interesting is a medication that completely cures some fraction of the people who take it. They simulate “completely cures” by giving the patients a distribution of HAM-D scores similar to those of healthy non-depressed people. Here’s what they find: The pictures from A to F are antidepressants that cure 0%, 20%, 40%, 60%, 80%, and 100% of patients respectively. And we’re looking at the ES - effect size - for each. Only D, E, and F pass NICE’s 0.50 threshold. And only F passes Kirsch’s higher 0.875 threshold. So a drug that completely cured 40% of people who took it would be “clinically insignificant” for NICE. And even a drug that completely cured 80% of the people who took it would be clinically insignificant for Kirsch! Clearly this use of “clinically insignificant” doesn’t match our intuitive standards of “meh, doesn’t matter”. We can make this even worse. Suppose that instead of completely curing patients, the drug “only” makes their depression improve a bit - specifically, half again as much as the placebo effect. Here’s the same simulation: Here we find that only E and F meet NICE’s criteria, and nobody meets Kirsch’s criteria! A drug that significantly improves 60% of patients is clinically insignificant for NICE, and even a drug that significantly improves 100% of patients improve is clinically insignificant for Kirsch! What’s gone wrong here? The authors point to three problems. First, most people in depression trials respond very well to the placebo effect. The minimum score on a depression test is zero, and even healthy non-depressed people rarely get zero points exactly. So if most of the placebo group is doing pretty well, there’s not a lot of room for the drug to make people do better than placebo. Second, this improvement in the placebo group is inconsistent; a lot do recover completely, but others don’t recover at all. That means there’s a large standard deviation in the placebo group. Effect size is measured as a percent of standard deviation. If standard deviation is very high, this artificially lowers effect size. Third, many patients (often about 30%) leave the study partway through for side effects or other reasons. These people stop taking the medication. But intention-to-treat analysis leaves them in the final statistics. Since a third of the experimental group isn’t even taking the medication, this artificially lowers the medication’s apparent effect size. I’m not sure about this, but I think NICE and Kirsch were basing their criteria off observations of single patients. That is, in one person, it takes a difference of 0.50 or 0.875 to notice much of a change. But studies face different barriers than single-person observations and aren’t directly comparable. NICE has since walked back on their claim that only effect sizes higher than 0.50 are clinically relevant (although this is part of a broader trend for health institutes not to say things like this, so I don’t want to make too big a deal of it). As far as I know, Kirsch hasn’t. Still, I think that a broader look at medication effect sizes suggests that the Danish team’s effect size laxness is overall right, and the earlier effect size strictness was wrong. Here’s a chart by a team including Leucht, who did some of the original HAM-D research: Some of our favorite medications, including statins, anticholinergics, and bisphosphonates, don’t reach the 0.50 level. And many more, including triptans, benzodiazepines (!), and Ritalin (!!) don’t reach 0.875. This doesn’t even include some of my favorites. Zolpidem (“Ambien”) has effect size around 0.39 for getting you to sleep faster. Ibuprofen (“Advil”, “Motrin”) has effect sizes between from about 0.20 (for surgical pain) to 0.42 (for arthritis). All of these are around the 0.30 effect size of antidepressants. There’s no anti-ibuprofen lobby trying to rile people up about NSAIDs, so nobody’s pointed out that this is “clinically insignificant”. But by traditional standards, it is! Statisticians have tried to put effect sizes in context by saying some effect sizes are “small” or “big” or “relevant” or “miniscule”. I think this is a valiant effort. But it makes things worse as often as it makes them better. Some effect sizes are smaller than we think; others are larger. Consider a claim that the difference between treatment and control groups was “only as big, in terms of effect size, as the average height difference between men and women - just a couple of inches” (I think I saw someone say this once, but I’ve lost the reference thoroughly enough that I’m presenting it as a hypothetical). That drug would be more than four times stronger than Ambien! The difference between study effect sizes, population effect sizes, and individual effect sizes only confuses things further. I would downweight all claims about “this drug has a meaningless effect size” compared to your other sources of evidence, like your clinical experience. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Key phrases
Older messages
Are Woo Non-Responders Defective?
Tuesday, May 30, 2023
...
Open Thread 278
Monday, May 29, 2023
...
Your Book Review: Lying for Money
Friday, May 26, 2023
Finalist #2 in the ACX Book Review contest
Hypergamy: Much More Than You Wanted To Know
Wednesday, May 24, 2023
...
Mantic Monday 5/22/23
Tuesday, May 23, 2023
Whales v. Minnows // US v. Itself // EPJ v. The Veil Of Time // Balaji v. Medlock
You Might Also Like
Welcome to The Flyover
Sunday, May 5, 2024
Thanks for joining The Flyover! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
AIPAC is secretly funneling money into a congressional race, sources say. Here are the details.
Saturday, May 4, 2024
AIPAC is not done trying to take down the Squad. AIPAC is secretly funneling money into a congressional race, sources say. Here are the details. If somebody forwarded you this newsletter, you can sign
$33 Billionaire C.Z. Gets 4 Months | ‘Bitcoin Jesus’ Arrested On Tax Charges
Saturday, May 4, 2024
The asset manager's new short-term credit fund is hosted on the Ethereum blockchain. ADVERTISEMENT Forbes START INVESTING • Newsletters • MyForbes Mitchell Martin Senior Editor, Forbes Money &
Sen. Maria Cantwell calls for an ‘AI Bill for education’
Saturday, May 4, 2024
Anthropic's intriguing Seattle billboard | Zebra deepfake overload | Upcoming events ADVERTISEMENT GeekWire SPONSOR MESSAGE: Washington state's second-largest city is the hub of an ambitious
Welcome to The Flyover
Saturday, May 4, 2024
Thanks for joining The Flyover! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Everything Worth Buying From Wayfair’s Way Day Sale
Saturday, May 4, 2024
From patio gear to kitchen appliances. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission. Everything
Beethoven's Ninth at 200
Saturday, May 4, 2024
+ 'Pat the Bunny' is a powerful learning tool
Welcome to The Flyover
Saturday, May 4, 2024
Thanks for joining The Flyover! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Veil Is Lifted
Saturday, May 4, 2024
Columns and commentary on news, politics, business, and technology from the Intelligencer team. Intelligencer Weekend Reader Required Reading for Political Compulsives 1. David Pecker and Keith
YOU LOVE TO SEE IT: Happy Math For Public Schools
Saturday, May 4, 2024
Plus, free tax filing pays off, electric vehicles work double time, and pharmaceutical giants face scrutiny for outrageous pricing. YOU LOVE TO SEE IT: Happy Math For Public Schools By Katherine Li • 4