Lady Tasting Tea

4 min readJan 8, 2021

Fisher’s exact test: Lady Tasting Tea

The story begins when Sir Ronald Aylmer Fisher participated in a tea party where a woman called Muriel Bristol, claimed to be able to tell if a tea was prepared with milk added to the cup first OR with milk added after the tea was poured.

Fisher designed an experiment where the lady was presented with 8 random cups of tea, 4 with milk first, 4 with tea first.
She then tasted each cup and reported which four she thought had milk added first.

Now the question Fisher asked is:

“how do we test whether she is really good at this or if she’s just guessing?”

Fisher introduced the idea of a null hypothesis . In the lady tasting tea experiment, the null hypothesis was that the lady could not really tell the difference between teas, and she is just guessing.

Now,the idea of hypothesis testing is to attempt to reject the null hypothesis, or more accurately, to see how much the data collected in the experiment provides evidence that the null hypothesis is FALSE.

The test statistic is a simple count of the number of successes in selecting the 4 out of 8 cups. Moreover, the distribution of possible numbers of successes, assuming the null hypothesis is true, can be computed using the number of combinations.

Number of options to select exactly 4 out of 8 cups is defined as:

R Implementation

Define cups:

cups = c(0, 1, 2, 3, 4, 5, 6, 7)

Number of options to select exactly 4 out of 8 cups:

# Selecting exactly 4 out of 8: (4 of 8) = 8! / (4! * (8-4)!)
pnum = (8*7*6*5)/(4*3*2*1) # 70.0

The probability to select correctly 4 out of 8 cups:

# The probability to select exactly the correct 4 out of 8 cups
prob = (1/70)*100print(cat("The probability to select exactly the correct 4 out of 8 cups = ", prob, '%'))

All combinations:

poss = t(combn(8,4))
print(cat("Length = ", nrow(poss)))df = data.frame(poss[1:70, 1:4])poss2 = paste(df$X1, df$X2, df$X3, df$X4, sep=" , ")
df$new = poss2

Random Selection:

listy = c()
rc_list = c()
success_list = c()
for (i in 1:length(df$new)) {
    
    # Random Choice from combinations
    rc <- sample(df$new, 1)
    rc_list <- c(rc_list, rc)
    
    # Iterate over sample from combinations
    ii = df$new[i]
    
    rc2 = strsplit(rc, ' , ')[[1]]
    ii2 = strsplit(ii, ' , ')[[1]]
    
    print(cat("Random Choice = ", rc, " | Length = ", length(rc2)))
    print(cat("Sample = ", ii, " | Length = ", length(ii2)))
    
    listy <- c(listy, list(str_detect(rc, ii2)))    
    print(cat("Success = ", table(list(str_detect(rc, ii2)))["TRUE"]))
    success_list <- c(success_list, table(list(str_detect(rc, ii2)))["TRUE"])
    print("")
    print("===========================================================")
    print("")
  }df$random_choice <- rc_list
df$counter <- listy
df$success_list <- success_list
df[is.na(df)] <- 0

Distribution:

count_table = as.data.frame(table(df$success_list))ggplot(count_table) + geom_bar(aes(x=Var1, y=Freq), stat="identity") + 
  ggtitle("Distribution")count_table

Python Implementation

Define cups:

cups = [0, 1, 2, 3, 4, 5, 6, 7]

Number of options to select exactly 4 out of 8 cups:

pnum = (8*7*6*5)/(4*3*2*1)
print(pnum) # 70.0

The probability to select correctly 4 out of 8 cups:

prob = (1/70) # 0.014
print("The probability to select correctly 4 out of 8 cups = ", prob)

All combinations:

poss = list(itertools.combinations(cups,4))
print("Length = ", len(poss))
print(poss)

Random Selection:

dist_vals = []for i in poss:
    
    # Random Choice from combinations
    rc = set(random.choice(poss))
    
    # Iterate over sample from combinations
    ii = set(i)
    
    print("Random Choice = ", rc, " | Length = ", len(rc))
    print("\nSample = ", ii, " | Length = ", len(ii))
    print("\nSuccess = ", rc&ii, " | Length = ", len(rc&ii))
    dist_vals.append(len(rc&ii))
    print("\n\n==========================================\n")

Distribution:

plt.figure(figsize=(20,10))
sns.countplot(dist_vals)
plt.xlabel("Value", size = 20)
plt.ylabel("Frequency", size = 20)
plt.title("Distribution", size = 28)
plt.show()dKeys = Counter(dist_vals).keys()
dVals = Counter(dist_vals).values()
df_vals = pd.DataFrame({"Key":dKeys, "Value":dVals})tot = np.sum(list(dVals))
percentages = []
for k in range(len(df_vals)):
    percentages.append(df_vals['Value'][k]/tot)
df_vals["Percentage"] = percentagesdf_vals.sort_values("Key")

Summary

Link to code

The test is useful for categorical data that result from classifying objects in two different ways. It is used to examine the significance of the association (contingency) between the two kinds of classification.

So in Fisher’s original example, one criterion of classification could be whether milk or tea was put in the cup first.