The Blog

The math behind web analytics: box plots

Here goes the fifth in this series. We started with some basic stuff, then we reviewed proper use of averages, trends and standard deviation. We also looked at control limits, histograms and in my last post we explored moving averages.

This week I’m revisiting a topic which was immensely popular when I first published it in January 2007: a solution to draw box plots & whisker plots in Excel 2007.

Get the FREE MetricsAnalysis_110930.xlsx file which recaps everything from the previous articles, including basic stats, an histogram tool and a nifty box plot with automatic outliers detection!

What are box plots/whisker plots?

A box plots are a quick visualization method used in statistics. They show, at a glimpse, several very important elements describing a data sample:

  • the smallest observation
  • the lower quartile (25%)
  • the median (50%)
  • the higher quartile (75%)
  • the largest observation

In addition, a box plot can help visualize two other elements:

  • abnormal data (outliers)
  • average

This can be very handy in web analytics since box plots can easily reveal the limits of acceptable data and any outliers, they can be used to easily convey standard information but also highlight trends and abnormalities.

An example

As with previous articles, you can download the Excel 2007 example of a fully working box plot and whisker plot graph.

Boxplot data

Box plot calculations

Starting with our sample time series, we first calculate a number of useful elements to draw the boxplot: min & max, 1st quarter (q1), median (M) and 3rd quarter (q3), lower & higher control limits (lcl, hcl) and finally, the Inter Quartile Range (iqr).

Visualization

Drawing the box plot itself needs a little magic which would be too long to explain here, but the result is quite interesting.

Box plot, whisker plot in Excel 2007

Control limits revisited

Control limits

Using the box plot approach, we can review the way control limits are set so outliers are NOT influencing them. Basically, instead of just doing +/- of standard deviation, we use the Inter Quartile Range, which is basically the difference between 3rd quarter and 1st quarter of data. That is

iqr = q3 – q1
lcl = MAX(M – iqr * 1.5, min)
hcl = MIN(M + iqr * 1.5, max)

Outliers

The next interesting element is an automated identification of how many outliers there might be – either low or high, and the list of values. Those are also added to the box plot graph (however, this might need some manual tweaking of the graph).

Outliers

Automated outliers detection

Conclusion

As we’ve seen since the begining of this series, there’s a lot more to analysis than simply using the numbers our web analytics tools of choice provides us. Developing our analytical skills, what the data tells us and how we can slice & dice it is a cornerstone of the analyst role – the two others being a thorough understanding of the business and an awareness of the possibilities and constraints of the medium we’re using (the web, social media, etc.).

Coming up!

In June of 2008 I published the first version of a sample dashboard for Excel 2007. It became immensely popular as a learning example and I received lots of feedback from practitioners, independant consultants and agencies telling me they were using it in real contexts – although it was really built as an example. So now that we have some background, we’ll be able to move to the next stage: developing powerful analytics dashboards in Excel!

This entry was posted in Technology, Web Analytics and tagged , , . Bookmark the permalink.
  • Leo

    Thank you so much! This is wonderfully helpful!

  • Meji

    MetricsAnalysis_110930.xlsx  is no longer available….

    • http://twitter.com/SHamelCP Stephane Hamel

      It’s fixed and I have even updated with a newer version.

      • sacred_kau

        Hi Stephane. It appears that the link to the .xlsx file is broken.

        • http://twitter.com/SHamelCP Stephane Hamel

          Oups! Sorry about that – it’s back! :)

          • sacred_kau

            Thank you, Stephane!

  • Marcus

    I understand the file was popular. Unfourtnatly it’s no longer available! :(
    I’m looking for the original first version for excel 2007

Copyright © 2014, All Rights Reserved. Privacy and Copyright Policies.