How to use data.table package In R Language
The data.table method in R is used to perform data storage and manipulations in a well organized manner. The package can be downloaded and installed into the working directory using the following command :
install.packages(data.table)
The data table can be re-ordered by group in descending order of their values by the order method. The corresponding dataframe is then accessed using the indexing method by taking the order function’s output in the form of row indexes.
Syntax: order(vec, decreasing = TRUE)
Arguments :
Vec – The dataframe column name to arrange in descending order
Decreasing – The flag to set data in descending order
The dataframe can then be converted into a data table using the data.table() method along with the column name to be used in setKey() method. The key attribute contains the column name to group the data by in the data.table.
data.table(df, key = )
Now, the head along with .SD attribute can be used to access the top n rows of each of the taken groups. The by argument contains the grouping column. The head method takes as arguments .SD and integer value n.
df[ , head(.SD, 3), by =]
Code:
R
library ( "data.table" ) # creating dataframe data_frame <- data.frame (col1 = rep ( letters [1:4], each = 5), col2 = 1:20, col3 = 20:39) print ( "Original DataFrame" ) print (data_frame) # sorting the data in descending order # Top N highest values by group data_mod <- data_frame[ order (data_frame$col2, decreasing = TRUE ), ] # organising the data by group data_mod <- data.table (data_mod, key = "col1" ) # getting top2 values data_mod <- data_mod[ , head (.SD, 2), by = col1] # printing modified dataframe print ( "Modified DataFrame" ) print (data_mod) |
Output:
Select Top N Highest Values by Group in R
In this article, we are going to see how to select the Top Nth highest value by the group in R language.