Fisher’s Exact Test | Lady Tasting Tea
In this video, I want to talk about Ronald Fisher and his Fisher’s exact test and specifically I want to talk about the lady tasting tea example. This is a picture of a young Ronald Fisher. He was a famous British statistician in the ’20s and ’30s. In 1935, he published a book called The Design of Experiments and in the second chapter of his book, he describes an experiment, or the design of an experiment, which would later be known as the lady tasting tea example. And the test that he proposed is known as the Fisher’s exact test because you can calculate the p-value of the test directly. The example of the book was, there was this lady in London and this lady said that she could taste the difference between teas where the tea was added first and then milk versus tea where the milk was added first and then the tea was added. This I guess was a very British argument at the time, very controversial. Ronald in the second chapter of his book discusses how you could design an experiment to prove whether her claim was true or not. Very famously in the second chapter, he discusses this idea of a null hypothesis. And I think maybe in other papers it had been discussed a little bit before, but this was the main intro– it’s cited as the first introduction of this concept. At least in a book, it was probably the first time it was discussed. The design of the experiment that he created for this test was that he was going to ask the lady to drink 8 cups of tea. And she would know that 4 cups had milk poured in first and 4 cups had tea poured in first. And she would drink the 8 cups and then have to tell the researchers which cups she thought had milk poured in first– I think she was asked to pick the four with milk. One of the first things to note was that this was a small sample size because they didn’t want to have the lady drinking 30 or more cups of tea to get a larger sample. It was a small test. Actually, well you’ll see. It’s not a very powerful test at all. The results of this test could be written out in a two-by-two contingency table where you’re trying to test whether there’s any relationship between what was poured first, either the milk or the tea, and what her guess was first, either milk or tea. If she had guessed correctly, the table would hypothetically look like this where her guess of milk being poured first was 4 times out of the 4 times milk was poured first and her guess of tea being poured first 4 times is 4 times out of the 4 times tea really was poured first. And then the total of all the trials is in this box. Fisher’s null hypothesis was that there was no association between her guests and what was really poured first. This can also be stated as guessing that the odds ratio is equal to one. And before he tested the experiment, he said we’re going to reject this hypothesis if the p-value is less than .05. His hypothesis is that the association is random. He’s going to use a model that will assume that the association is random and if he finds that the probability that the association is random is very, very small–so 5% or less chance–then he’s going to reject the assumption that they’re independent. And because this is an exact test, he was able to come up with a formula for exactly determining the probability. He was only worried about asking her to pick which 4 the milk was poured in first because obviously that would imply that the other 4 had tea poured in first. And she was given the knowledge that she would have 8 cups and half would have milk and half would have tea poured in first. The total number of possible combinations would be 8 choose 4 and that equals 70. And then from there the probabilities that he could calculate would just be the sum of permutations of getting each number correctly. He calculated the probability that you get one out of four. The probability you get two out of four, three out of four, and four out of four. And these are all without replacements because you’re not replacing the cups after you have one. Borrowing this table from Wikipedia, you see that there’s only one way of getting four out of four milk guesses correctly and therefore getting all the teas correctly as well. You can only do that by getting everything right one way. The probability of that is one out of 70. In the chapter, he never actually discussed performing this experiment on the woman although it was based on a true story. And I’ve read later on that someone did actually test the woman and she did get four out of four correctly. If you do that test, then it would be–the p-value would be one over 70, which is approximately equal to .014, which is obviously less than .05. According to this p-value and the rules that we had discussed prior to running the test, in that case you would reject the null hypothesis that there was no association. And you could reject the idea that the woman just randomly was guessing milk or tea. As you can see this is a really small p-value. It’s much less than the p-value required to reject the null hypothesis and this suggests that the test is very conservative. You can see that it doesn’t–it’s not a very powerful test at all just because the sample is so small. If the woman had guessed three out of four correctly, that sounds like it’s pretty good but if you do the math because there are 16 different ways of getting three out of four correctly the p-value for that actually ends up being .22, which is way over .05. If the woman got anything less than a perfect score, we wouldn’t have been able to reject the null hypothesis. He discusses towards the end of the chapter how this test is not perfect and how the design would be better if you had her drink more cups of tea and the different ways you could improve the test but it was still a pretty influential a design that is still used today. There’s another test called the Barnard’s exact test, which is also used for contingency tables that is considered more powerful than this test although this test is still used and people can use it obviously with alternative hypotheses. You don’t have to do just the null hypothesis. But I always liked this weird example of the lady tasting tea. I just thought I would make a video of it. And also you can extend this idea to a contingency table of any size but I think the only way to do hand calculations easily is by having a two-by-two table. That’s what it’s usually known for. It’s usually known as a two-by-two contingency table test for independence, but if you have a statistical package then you can easily compute larger contingencies.