Basics, Fun with Statistics, Python

Post-Hoc Analysis of ANOVA Test

Finding Relationship Between Variables – ANOVA (Part 2)

Post-Hoc Analysis of ANOVA Test in Python

This article is in continuation of my earlier post, Finding Relationship Between Variables – ANOVA Test. In that post, I have explained basics of ANOVA Test on when and how to implement it using Python programming language. We took the Auto MPG DataSet and tried to find out if there is any relationship between the number of cylinders of car and mileage of that car. After implementing the ANOVA Test, we rejected the null hypothesis and stated there is a difference in the group means of the categorical variable number of cylinders. But we did not mention which group means are statistically different from each other. We can find this by performing post-hoc analysis of the ANOVA Test which is also the agenda of this article. It is advisable to go through the earlier post to get a better understanding of this article.

The term post-hoc literally means after the test which is used to perform post-hoc paired comparisons of the group means. But did you ever get this thought why don’t we simply conduct ANOVA Test on two groups at a time by subsetting the dataset. This is a legitimate thought and we can also find out whether the groups are statistically different or not. Then why don’t we do that? Because of the inflating type 1 error. Whenever we conduct any hypothesis test we commit either type 1 error or type II error. When we reject our null hypothesis, we commit type 1 error i.e. we incorrectly reject the null hypothesis 5% of the time (provided significance level is 95%). And if we fail to reject our null hypothesis we commit type II error i.e. incorrectly accepting our null hypothesis when it is actually not true.

Now if we have 5 different groups in the predictor categorical variable we will need to conduct 5C2 = 10 ANOVA Test taking two groups at a time. The overall type 1 error that we commit after conducting 10 ANOVA Test is known as the Family-wise error rate. FWER is given by the following formula

{ \alpha }_{ FWER }\quad =\quad 1\quad -\quad { (1\quad -\quad \alpha ) }^{ n }


n = number of comparisons

α = type 1 error rate (5% or 0.05, at 95% significance level)

Following table shows family-wise error rate (or inflating type 1 error rate) for different number of paired comparisons

Number of paired comparisonsFamily-wise error rate

From the above table we can see that in our case, the family-wise error rate will be approx. 0.4 (40%). This means that we will be incorrectly rejecting our null hypothesis 40% of the time and it is simply not acceptable. So, to solve this problem we conduct post-hoc paired comparison of group means while protecting against the family-wise error rate (or inflation of type 1 errors). There are numerous numbers of post-hoc test available at our disposal. Following are list of few post-hoc tests

  • Tukey’s Honestly Significant Difference
  • Fisher’s Least Significant Difference
  • Bonferroni Procedure
  • Sidak Test
  • Holm T-Test
  • Scheffe Test
  • Newman-Keuls Method
  • Dunnett’s Multiple Comparison
  • Duncan Multiple Range

Now that you have a number of post-hoc tests, you might be thinking how these tests control the family-wise error rate and which test to choose from? Each one of these above tests has its own assumptions, advantages, and disadvantages. And research shows that Tukey’s HSD test is more versatile than the other tests under most circumstances and it is also readily available in most programming packages. So, in this article, we will use Tukey’s HSD test to perform post-hoc analysis of the performed ANOVA Test. Let’s now perform the post-hoc analysis using Python

import statsmodels.stats.multicomp as multi

test = multi.MultiComparison(data[‘mpg’], data[‘cylinders’])
res = test.tukeyhsd()

model summary

As you can see from the above result, the test has reduced FWER from 0.40 to 0.05 (at the top right corner of above table) and completed the post-hoc pairwise comparisons between groups of the cylinder variable by taking two groups at a time. Now you compare the results and boxplots together. We can see that cars with 4 number of cylinders have significantly different mileage from cars having 3, 6 and 8 number of cylinders. Cars with 8 number of cylinders have significantly different mileage than cars with 4, 5 and 6 number of cylinders. You can also infer this visually by plotting side by side boxplots


I hope you find this article interesting. Please let us know your thoughts in the comments section below.

Tagged ,