Fooled By Statistical Significance

Republished By Plato

Followers: 0

Don’t let poets lie to you

Behold the world’s shortest lecture on #statistics and everything that is wrong with how people approach it:

42.

Or, rather: p=0.042

Screenshot from thesaurus.com. My other thesaurus is terrible, terrible, and also terrible.

Contrary to popular belief, the term “statistically significant” does not mean that something important, momentous, or convincing took place. If you think that we’re using the word significant here in a way that would make your thesaurus proud, you’re falling victim to a cunning bit of sleight of hand. Don’t let poets lie to you.

“You shouldn’t let poets lie to you.” — Björk

For those who prefer to keep their exposure to statistical nitty gritty to a minimum, here’s all you need to know about the term statistically significant:

It doesn’t mean that anything significant has happened.
It doesn’t mean the results are “big” or noteworthy.
It doesn’t mean you will find the data interesting.
It means that someone is claiming to be surprised by something.
It doesn’t tell you anything useful if you don’t know much about the someone and the something in question.

To everyone other than the decision-maker in question, statistically significant results are rarely significant in the sense of “important” — they’re occasionally great for raising interesting questions, but often they’re irrelevant.

Be on extra high alert when non-experts use this term, especially when it is accompanied by breathless exuberance. Sometimes especially cheeky charlatans go one step further and drop the “statistically” bit, tapping into the full power of the poetry. “Hey look,” they tell you, “what we’re talking about is SIGNIFICANT in the eyes of the universe.”

No, it isn’t.

The worst possible offenders are those who pronounce “statistically significant” like it’s a synonym for “definite” or “certain” or “flawless knowledge” — there’s some irony getting lost here. The term comes from a field that deals with uncertainty and thus (by definition!) only belongs in settings where our knowledge is not flawless.

For those who prefer to fight jargon with jargon, I’ll help myself to more formal language in the next section. Feel free to nope out of that bit, but if you’re simultaneously curious and new around here, take a little detour to zoom through all the biggest ideas in statistics in just 8 minutes:

Most of the links in my articles take you to blog posts where I’ve given you a deeper overview of highlighted topics, so you can also use this article as a launchpad for a Choose Your Own Adventure minicourse on data science.

“Statistical significance” merely means that a p-value* was low enough to change a decision-maker’s mind. In other words, it’s a term we use to indicate that a null hypothesis was rejected.** What was the null hypothesis, though? And how strict was the test? ¯_(ツ)_/¯

Welcome to statistics, where The Answer is p = 0.042 but you don’t know what the question was.

Technically, the decision-maker who set up the conditions of the hypothesis test is the only person for whom that test’s results can be statistically significant.

Statistics gives you a set of tools for decision-making, but how you use them is up to you — it’ll be as individual as any other decision.

The process involves phrasing your decision question very carefully, picking the assumptions you’re willing to live with, making some risk tradeoffs about the different ways your answer might be wrong*** (because randomness is a jerk), and then using mathematics to get a risk-controlled answer to your particular question.

There’s something perverse and comical in its popularity as a prop for rhetorical bullying.

That’s why real experts would never use statistics like a hammer for beating Truth into one’s enemies. Two decision-makers can use the same tools on the same data and come to two different — and completely valid — conclusions… which means that there’s something both perverse and comical in its popularity as a prop for rhetorical bullying.

Statistical significance is personal. Just because I am surprised enough by the data to change my mind doesn’t mean you should be.

As soon as I understood how statistics works, I couldn’t help but marvel at how remarkably arrogant — almost rude — it is to declare something to be statistically significant in the presence of people who aren’t fluent in the limitations of statistical decision-making. The term sounds much too universal for anyone’s good; it plays like a “shut up and trust me because my methods are fancy” rhetorical device. I hope you’ll join me in giving that brand of rhetoric the “pffft” it deserves.

Hang on, is there nothing at all we can learn from someone else’s statistically significant result?

Here’s where it gets somewhat philosophical, so I’ll need a separate article for my take on that question:

In a nutshell, my advice is that it’s fine to delegate some of your decision-making to other people as long as you trust them to be competent and have your best interests at heart. When they’re convinced, you’ll borrow their opinion so you don’t have to redo all their work yourself.

By using someone else’s statistical conclusions, you’re not basing your decision on data but rather on your trust in an individual human being.

Just be aware that by using someone else’s results, you’re not basing your decision on data but rather on your trust in an individual human being. There’s no problem with choosing to trust others so you don’t need to build your whole worldview empirically from scratch — knowledge sharing is part of what makes the human species so successful — but it’s worth being aware that you might be a few rounds of broken telephone downstream of whatever “knowledge” you think you’re tuning into.

If you let someone step up to make decisions on your behalf — that’s what it means to consume someone else’s p-value and conclusions for decision-making — then be sure it’s someone you consider sufficiently competent and trustworthy.

What if the person shoveling statistical jargon at you is someone you don’t trust? Run for the hills!

Whenever there’s a whiff of persuasion clinging to declarations of statistical significance, be extra cautious of whatever wares the utterer is peddling. If you trust the person you’re talking to, you don’t need their appeals to statistical significance. All you need to know is that they’re convinced. If you don’t trust them, you can’t trust their stats jargon any more than you’d trust their jazz hands.

What good is an answer if you haven’t bothered to understand what the question was?

If there’s one thing I’d like you to take away from this blog post, it’s this: If you don’t know much about the decision-maker and how they set about figuring out whether they should change their minds (and precisely about what), then their claims related to statistical significance are utterly meaningless to you. What good is an answer if you haven’t bothered to understand what the question was?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the course playlist broken up into 120 separate bite-sized lesson videos here: bit.ly/machinefriend

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.

Here are some of my favorite 10 minute walkthroughs:

*If you’re keen to learn what a p-value is, here’s a video I made to help you out:

This is the first video on my YouTube playlist, which you can find at http://bit.ly/quaesita_p1

**For an explanation of hypothesis testing, head over to my blog post on the topic or check out this pair of videos:

Fooled by statistical significance Republished from Source https://towardsdatascience.com/fooled-by-statistical-significance-7fed1bc2caf9?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

Time Stamp: October 30, 2022October 31, 2022