Summarising group-wise data of Multiple Variable
Let’s create another sample dataframe ->df2:
R
# sample dataframe df2 <- data.frame ( Quarter = paste0 ( "Q" , rep (1:4, each = 4)), Week = rep ( c ( "Weekday" , "Weekend" ), each=2, times=4), Direction = rep ( c ( "Inbound" , "Outbound" ), times=8), Delay = c (10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4)) df2 |
Output:
Summarizing data group-wise:
In this case, our dataframe is having 4 variables: Quarter, Week, Direction, Delay. In the code below, we have grouped and summarised by Quarter and Week, and in the process, the variable Direction is collapsed.
Syntax: group_by(variable_name1,variable_name2 )
R
library (dplyr) # sample dataframe df2 <- data.frame ( Quarter = paste0 ( "Q" , rep (1:4, each = 4)), Week = rep ( c ( "Weekday" , "Weekend" ), each=2, times=4), Direction = rep ( c ( "Inbound" , "Outbound" ), times=8), Delay = c (10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4)) # summarizing by group df2 %>% group_by (Quarter, Week) %>% summarize (min_delay = min (Delay), max_delay = max (Delay)) |
Output:
How to find group-wise summary statistics for R dataframe?
Finding group-wise summary statistics for the dataframe is very useful in understanding our data frame. The summary includes statistical data: mean, median, min, max, and quartiles of the given dataframe. The summary can be computed on a single column or variable, or the entire dataframe. In this article, we are going to see how to find group-wise summary statistics for data frame in R Programming Language.
Importing data in R language
In the code below we have used a built-in data set: iris flower dataset. Then we can inspect our dataset by using the head() or tail() function which will print the top and bottom part of the dataframe. In the code below, we have displayed the top 10 rows of our sample dataframe.
R
# import data df <- iris # inspecting the dataset head (df, 10) |
Output: