What are Exploratory Data Analysis tools and what do I do with them?
Exploratory Data Analysis tools (EDA) are a diverse mix of tools that are mainly used to explore data, to find trends, exception, rules, correlation and other statistical feedback. These tools are something fairly technical (R | SPSS) or the fairly visual (Visual Intelligence | Tableau Software) stack.
As a designer, I sometimes am confronted with finding the best way to represent a certain metric.
Touching on Data Science, which is an up and coming field just being recognized on its own, using EDA tools are great accelerators to find those relationships and test visual representations. I especially like the more visual tools since they provide great visual feedback and are easy to publish to the business users for comment and annotations.
How to start?
There are many methods to use but I strongly recommend the work of Stephen Few, more specifically his whitepaper on getting acquainted with data sets.
In his book, Now you See It, you can see some basic ways that a user interacts with the information:
- Time Serie Analysis
- Part To Whole / Ranking Analysis
- Deviation Analysis
- Distribution Analysis
- Correlation Analysis
- Multivariate Analysis
Some graphs are more suited to a method than others. If the user can explain the way he or she plans to use the information, this can then be used to choose the proper representation.
For example, Multivariate Analysis can be successfully used with a dual axis scatterplot, and further enhanced with a time animation, the like of which Hans’ Rosling has done with gapminder.
Let’s face it, 90% of the time, the level of analysis required to successfully represent a metric in a dashboard will not require a PhD in Statistical analysis. EDA tools can and should be used by dashboard designers as a way to mold and massage the information into a pleasing and intuitive form.