Steps to Create a Sankey Plot in R
In the R Programming Language, there are several ways to make Sankey plots, including by using the networkD3 package as it is easy to use and rather flexible.
- Install and Load Required Packages
- Prepare the Data
- Create the Sankey Plot
- Customize the Plot
- Save or Export the Plot
1. Install and Load Required Packages
First, you need to install the networkD3 package if you haven’t already: Then, load the package into your R session.
install.packages("networkD3")
library(networkD3)
2. Prepare the Data
You need two data frames: one for the nodes and one for the links.
- The nodes data frame lists all the entities involved
- The links data frame specifies the connections between these nodes and their flow values.
# Create nodes data frame
nodes <- data.frame(name = c("Source A", "Source B", "Source C", "Destination 1", "Destination 2"))
# Create links data frame
links <- data.frame(source = c(0, 1, 2, 0, 1, 2),
target = c(3, 3, 3, 4, 4, 4),
value = c(10, 20, 30, 5, 15, 25))
Nodes are labeled “Source A”, “Source B”, “Source C”, “Destination 1”, and “Destination 2”. Links specify the flow from each source to each destination with corresponding values.
3. Create the Sankey Plot
Use the sankeyNetwork function to create the plot, as shown below:
sankeyPlot <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
4. Customize the Plot
Enhancing the features of Sankey plots can add more value to it and make the difference between the quality of the given data. There are also many ways to modify a Sankey diagram in the networkD3 package of R such as altering the position, shape and size of the nodes as well as modifying the links, adding labels and changing the color of nodes and other features. Basic customization options are :
Adjusting Node Width and Font Size
- nodeWidth: Sets the width of the nodes.
- fontSize: Sets the font size for node labels.
- units: Adds a unit of measurement to the values displayed.
sankeyPlot <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
By default, networkD3 provides tooltips when you hover over nodes and links, showing details about them. This feature is enabled by default and enhances interactivity.
5. Save or Export the Plot
To save the plot, you can use the htmlwidgets package to save it as an HTML file:
library(htmlwidgets)
saveWidget(sankeyPlot, file = "sankey_plot.html")
Here are a few more examples of Sankey plots with different datasets and customizations:
Energy Flow Sankey Plot
The Sankey plot to be developed for the energy flow example will depict how the energy inputs (Coal, Oil and Gas) are transformed into electricity generation and how this electricity is then transmitted to the several sectors of the economy such as the Industrial, Residential, and Commercial. The width of the links will vary according to the amount of energy transferred ; thereby allowing someone to easily see the locations of major energy input and output.
# Load necessary libraries
library(networkD3)
library(htmlwidgets)
# Create the nodes data frame
nodes <- data.frame(name = c("Coal", "Oil", "Gas", "Electricity", "Industry",
"Residential", "Commercial"))
# Create the links data frame
links <- data.frame(source = c(0, 1, 2, 3, 3, 3),
target = c(3, 3, 3, 4, 5, 6),
value = c(50, 30, 20, 60, 20, 20))
# Create the Sankey plot
sankeyPlotEnergy <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
# Save the plot as an HTML file
saveWidget(sankeyPlotEnergy, file = "sankey_plot_energy.html")
Output:
When you execute the code for the Energy Flow example, the Sankey plot will display the following:
- Nodes: Groups the data according to some specific categories that can represent different types of energy sources or customers – “Coal”, “Oil”, “Gas”, “Electricity”, “Industry”, “Residential”, “Commercial”.
- Links: Explain how the energy generated from coal, oil and gas gets to the target of producing electricity and how this electricity gets distributed among industry, residents and commercial.
- Flow Widths: The width of each link depends on the energy transfer, expressed in the Terawatt-hours (TWh) sane above in Figure 4. For example, the link that connects “Coal” with“Electricity” can be wider to denote a larger energy exchange than can a narrow link.
This plot is used to understand the distribution and proportion of energy source and how electricity is domed out according to various sectors.
Website User Flow Sankey Plot
The Sankey diagram for the Website User Flow will show a map of actual directions used by the Website users. This is seen when first arriving at the Home page where it is an indicator of how many people go to About, Services, and Contact pages and finally the Purchase page. It will enable the plot to establish the most frequently used routes and areas that likely post Consumers to a variety of drop-off points.
# Load necessary libraries
library(networkD3)
library(htmlwidgets)
# Create the nodes data frame
nodes <- data.frame(name = c("Home", "About", "Services", "Contact", "Purchase"))
# Create the links data frame
links <- data.frame(source = c(0, 0, 1, 2, 3),
target = c(1, 2, 3, 3, 4),
value = c(1000, 500, 200, 100, 50))
# Create the Sankey plot
sankeyPlotWeb <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
units = "Users", fontSize = 12, nodeWidth = 30)
# Save the plot as an HTML file
saveWidget(sankeyPlotWeb, file = "sankey_plot_web.html")
Output:
When you run the code for the Website User Flow example, the Sankey plot will illustrate:
- Nodes: They show different webpages and they are homepage, about, services, contact and purchase webpages respectively.
- Links: D show the progression of users from one page to the other page. For instance, visitors can move from one page such as “Home” to other page such as “About Us,” “Our Services” or “Contact US” and from “Contact US” to “Purchase” page.
- Flow Widths: The width of each link is proportional to the number of users transitioning between pages. For instance, a wide link from “Home” to “About” indicates that many users visit the About page from the Home page.
This plot helps visualize user navigation paths on a website, showing the most common routes and highlighting any significant drop-offs or conversions.
Sankey Plot In R
Sankey Plots refer to a type of flow diagram where the thickness of the arrows represents the flow rate. Data flow diagrams are especially used when showing transfers of data or energy or the movement of materials between stages or categories. It is possible to use them to find who contributes most to a flow and specify the intricacies of the process; see how different members of a system are connected.