Enhacing Data visualizations using plotnine and ggplot
Here we will learn about the remaining optional components. These components are –
- Facets
- Statistical transformations
- Coordinates
- Themes
Facets
Facets are used to plot subsets of data. it allows an individual plot for groups of data in the same image.
For example, let’s consider the tips dataset that contains information about people who probably had food at a restaurant and whether or not they left a tip, their age, gender and so on. Lets have a look at it.
Note: To download the dataset used, click here.
Now let’s suppose we want to plot about what was the total bill according to the gender and on each day. In such cases facets can be very useful, let’s see how.
Example: Facets with plotnine and ggplot in Python
Python3
import pandas as pd from plotnine import ggplot, aes, facet_grid, labs, geom_col # reading dataset df = pd.read_csv( "tips.csv" ) ( ggplot(df) + facet_grid(facets = "~sex" ) + aes(x = "day" , y = "total_bill" ) + labs( x = "day" , y = "total_bill" , ) + geom_col() ) |
Output:
Statistical transformations
Statistical transformations means computing data before plotting it. It can be seen in the case of a histogram. Now let’s consider the above example, where we wanted to find the measurement of the sepal length column and now we want to distribute that measurement into 15 columns. The geom_histogram() function of the plotnine computes and plot this data automatically.
Example: Statistical transformations using plotnine and ggplot in Python
Python3
import pandas as pd from plotnine import ggplot, aes, geom_histogram # reading dataset df = pd.read_csv( "Iris.csv" ) ggplot(df) + aes(x = "SepalLengthCm" ) + geom_histogram(bins = 15 ) |
Output:
Coordinates
The coordinates system defines the imappinof the data point with the 2D graphical location on the plot. Let’s see the above example of histogram, we want to plot this histogram horizontally. We can simply do this by using the coord_flip() function.
Example: Coordinate system in plotnine and ggplot in Python
Python3
import pandas as pd from plotnine import ggplot, aes, geom_histogram, coord_flip # reading dataset df = pd.read_csv( "Iris.csv" ) ( ggplot(df) + aes(x = "SepalLengthCm" ) + geom_histogram(bins = 15 ) + coord_flip() ) |
Output:
Themes
Themes are used for improving the looks of the data visualization. Plotnine includes a lot of theme which can be found in the plotnine’s themes API. Let’s use the above example with facets and try to make the visualization more interactive.
Example: Themes in plotnine and ggplot in Python
Python3
import pandas as pd from plotnine import ggplot, aes, facet_grid, labs, geom_col, theme_xkcd # reading dataset df = pd.read_csv( "tips.csv" ) ( ggplot(df) + facet_grid(facets = "~sex" ) + aes(x = "day" , y = "total_bill" ) + labs( x = "day" , y = "total_bill" , ) + geom_col() + theme_xkcd() ) |
Output:
We can also fill the color according to add more information to this graph. We can add color for the time variable in the above graph using the fill parameter of the aes function.
Data Visualization using Plotnine and ggplot2 in Python
Data Visualization is the technique of presenting data in the form of graphs, charts, or plots. Visualizing data makes it easier for the data analysts to analyze the trends or patterns that may be present in the data as it summarizes the huge amount of data in a simple and easy-to-understand format.
In this article, we will discuss how to visualize data using plotnine in Python which is a strict implementation of the grammar of graphics. Before starting let’s understand a brief about what is the grammar of graphics.