Data visualization is an art as well as a science. I always keep exploring how to make my visualizations more interesting and informative. One of the jobs of a data scientist is to tell a story with the data at their disposal, and you really want to make the data jump out at the reader, to make your visualizations as understandable as possible.
Another one of our main tasks is data manipulation. Today the main tool I use for that is Pandas (Python). What if I tell you that you can build some beautiful and interactive charts for the web right from your Pandas dataframes? Well, you can! We can use Plotly for that. Fortunately, this is a great time for Python plotting, and after exploring the options, a clear winner — in terms of ease-of-use, documentation, and functionality — is the plotly Python library. In this article, we’ll dive right into plotly, learning how to make better plots in less time — often with one line of code.
If you are unfamiliar with plotly itself, I drew up a brief beginner's guide a while back, available here. Some of the plots I generated in that post are re-created here - so it'll be interesting for you to see how cufflinks simplifies plotly's (already surprisingly simple)syntax when working with pandas.
The plotly python package is an open-source library built on plotly.js which in turn is built on d3.js. We’ll be using a wrapper on plotly called cufflinks, which is designed to work with Pandas dataframes. All the work in this article was done in a Jupyter Notebook with plotly + cufflinks running in offline mode. Actually, the article you're reading right now is a rendered notebook. First things first, let's import the libraries we'll be needing for this post:
I have imported download_plotlyjs, init_notebook_mode, plot and iplot from plotly.offline and the .go_offline() method to allow us to generate interactive visualizations in Kyso's jupyterlab environment offline.
We have a simple list of the countries with the world's best restaurants. The magazine Restaraunt released its Worlds best restaraunts 2018 list at the end of the year. The list is obviously super subjective, but an interesting exercise all the same.
So let's read in the dataset and plot it using plotly's cufflinks, a library for easy interactive Pandas charting with Plotly. Cufflinks binds the power of plotly with the flexibility of pandas for easy plotting.
Pretty easy, right? This plot allows us to click on the elements in the legend to hide and display context which is pretty neat. Move the cursor to the top right of the plot to observe the various features of the plot. We can also use the zoom feature of specific areas of the plot.
We simply use the
.iplot() method and specify the kind of chart we want to generate with the dataset.
Ok, so we've generated a simple bar chart - now let's read in other datasets and have a test some other cufflinks-generated plots. For this tutorial, we have 2 different datasets:
First up, it's pokemon data from Kaggle's Pokemon with Stats, read in as df1.
Second, FIFA 18 data from Kaggle's FIFA 18 Updated Dataset as df2.
Let's run a few more bar-chart examples to visualize the strength differences between different Pokemon types.
The benefits of interactivity are that we can explore and subset the data as we like. There’s a lot of information in a boxplot, and without the ability to see the numbers, we’ll miss most of it! Generating a box plot to demonstrate the shape of the distribution of each stat:
Ok, time for our FIFA data! The histogram is a go-to plot for graphing a distribution.
What's the distribution of all the players' age in the game?
And by their overall player rating:
Ok, let's break the data down a little to compare player stats across countries:
Spanish and Brazilean players clearly dominate the upper quartile, but to be fair, FIFA 18 includes more lower league players in England in comparison to other countries. Disclaimer: I happen to be Irish & so took particular delight in this graph, while also acknowledging Ireland's dismal performance at international level, if and when we even qualify!😀
Let's step it up a notch & segment the dataset by top finishers, so that we can look at the attributes of the game's forwards and strikers. First, let's generate a box-plot for descriptive stats on the games' finishers by country.
Argentina just about steals the show when it comes to prowess in front of goal.
The scatterplot is found at the heart of most analyses - it allows us to see the evolution of a variable over time or the relationship between two (or more) variables.
Let's look at the top 100 finishers in the game. Creating a bubble chart, where our y-values represent the players' finishing score, x-values their Composure and the players' wages are represented by the marker size.
Unsurprisingly, those two big red bubbles in the upper right-hand corner represent Ronaldo and Messi.
Now, does a player's market value and wage justify his ranking in FIFA 18's ratings? Let's find out!
Ok, now let's get a distribution of the players' actual and potential overall rating as a function of their market value.
Now we’ll get into a few plots that you probably won’t use all that often, but which can be quite impressive.
To visualize the correlations between numerical values, in this case the various stats of FIFA 18 players, we calculate the correlations and then make a heatmap:
cf.colors.scales() to see the available colorscales for cufflinks.
Let's find out what percentage of the game's top 100 players are playing for which clubs:
All European clubs - continental poachers!
With plotly you can all plot geographical data. Now, while not as effective as geo-spatial plotting with folium a leaflet.js wrapper for python that generates html interactive maps, pandas & cufflinks works just fine for our purposes. Let's get a sense of the distribution of pro players by their nationality:
And finally, a cool 3D-plot of the intersection between player composure, positioning and finishing ability:
I hope you liked this short intro to cufflinks. It is a pretty awesome tool for quick-fire EDA on any dataframe. I reckon it is the best plotting library if you're working with python, not only for it's ease of use for you, but also in terms of bringing data to life for the reader.
For more information, check out plotly's documentation.