How to aggregate data using custom functions
The aggregate function in R is designed to aggregate data in a data frame. R language offers various methods to aggregate data by using custom functions. By using these methods provided by R, it is possible to aggregate data easily. Some of the methods to aggregate data using custom functions are:
Aggregating data by sum using the custom function
This method is used to aggregate data by sum using the custom function. In the below example, we created a data frame and performed mean by using the custom function .
# creating data frame
df <- data.frame(
date = as.Date(c("2024-01-01", "2024-01-15", "2024-02-10", "2024-02-20", "2024-03-20",
"2024-03-15")),
sold = c(100, 150, 200, 250,300,350)
)
print("The original dataframe is")
print(df)
# Custom function to result
result = function(x) {
return(sum(x))
}
print("After calculating the sum is")
sales_permonth <- aggregate(sold ~ format(date, "%Y-%m"),
data = df, FUN = result)
print(sales_permonth)
Output:
[1] "The original dataframe is"
date sold
1 2024-01-01 100
2 2024-01-15 150
3 2024-02-10 200
4 2024-02-20 250
5 2024-03-20 300
6 2024-03-15 350
[1] "Aggregating data per month is"
format(date, "%Y-%m") sold
1 2024-01 250
2 2024-02 450
3 2024-03 650
In the below example, we created a data frame and performed sum by using the custom function .
goods=c("a","b","c","d","b","c","a")
prices=c(100,200,300,400,500,600,700)
#creating data frame
df = data.frame(goods,prices)
print(df)
print("After calculating the sum is")
res = aggregate(prices ~ goods , data = df, FUN = sum)
print(res)
Output:
goods prices
1 a 100
2 b 200
3 c 300
4 d 400
5 b 500
6 c 600
7 a 700
[1] "Aggregating data by sum is"
goods prices
1 a 800
2 b 700
3 c 900
4 d 400
Aggregating data by mean using the custom function
This method is used to aggregate data by mean using the custom function. In the below example, we created a data frame and performed mean by using the custom function .
names=c("a","a","b","c","c","b")
scores=c(100,95,90,80,85,70)
# creating data frame
df = data.frame(names,scores)
print("The original dataframe is")
print(df)
# calculating mean
cal_mean = function(x) {
return(mean(x))
}
print("After calculating the mean is")
result = aggregate(scores ~names, data = df,
FUN = cal_mean)
print(result)
Output:
[1] "The original dataframe is"
names scores
1 a 100
2 a 95
3 b 90
4 c 80
5 c 85
6 b 70
[1] "After calculating the mean is"
names scores
1 a 97.5
2 b 80.0
3 c 82.5
In the below example, we created a data frame and performed mean by using the custom function.
team = c("csk", "rcb", "rcb", "srh", "srh","csk",'csk')
run_rate= c(80, 85, 70, 85, 85, 86, 95)
# creating data frame
df = data.frame(team, run_rate)
print("The original dataframe is")
print(df)
cal_mean = function(x) {
return(mean(x))
}
print("After calculating the mean is")
# Aggregating data by group
result <- aggregate(run_rate ~ team, data = df,
FUN = cal_mean)
print(result)
Output:
[1] "The original dataframe is"
team run_rate
1 csk 80
2 rcb 85
3 rcb 70
4 srh 85
5 srh 85
6 csk 86
7 csk 95
[1] "After calculating the mean is"
team run_rate
1 csk 87.0
2 rcb 77.5
3 srh 85.0
Aggregating data by median using the Custom Function
This method is used to aggregate data by median using the custom function. In the below example, we created a data frame and performed median by using the custom function.
# Sample data
prices <- data.frame(
category = c("A", "A","A", "B", "B","B", "C", "C","C"),
values = c(10, 15, 20, 23, 30, 25, 40, 55, 60)
)
print("The original dataframe is")
print(prices)
# calculating median
cal_median = function(x) {
return(median(x))
}
result = aggregate(values ~ category,
data = prices, FUN = cal_median)
print("After calculating the median is")
print(result)
Output:
[1] "The original dataframe is"
category values
1 A 10
2 A 15
3 A 20
4 B 23
5 B 30
6 B 25
7 C 40
8 C 55
9 C 60
[1] "After calculating the median is"
category values
1 A 15
2 B 25
3 C 55
In the below example, we created a data frame and performed median by using the custom function.
name=c("a","b","c","b","a","b")
r_no=c(350,355,355,360,365,370)
# creating data frame
product_prices = data.frame(name, r_no )
print("The original dataframe is")
print(product_prices)
# To calculate median
calculate_median = function(x) {
return(median(x))
}
res<- aggregate(r_no~ name, data = product_prices,
FUN = calculate_median)
print(res)
Output:
[1] "The original dataframe is"
name r_no
1 a 350
2 b 355
3 c 355
4 b 360
5 a 365
6 b 370
name r_no
1 a 357.5
2 b 360.0
3 c 355.0
Aggregating data by standard deviation using the Custom Function
This method is used to aggregate data by standard deviation using the custom function. In the below example, we created a data frame and performed standard deviation by using the custom function.
batch = c("x", "y", "x", "y", "x","x")
number = c(20, 35, 20, 34, 25,40)
df <- data.frame(batch, number)
print(df)
cus_sd <- function(x) {
return(sd(x, na.rm = TRUE))
}
res = aggregate(number ~ batch, data = df, FUN = cus_sd)
print(res)
Output:
batch number
1 x 20
2 y 35
3 x 20
4 y 34
5 x 25
6 x 40
batch number
1 x 9.4648472
2 y 0.7071068
In the below example, we created a data frame and performed standard deviation by using the custom function.
names = c("raju", "ravi", "rakesh", "raju", "rakesh","ravi")
cgpa = c(7.5, 8.5, 7.0, 9.5, 8.8, 8.0)
df <- data.frame(names, cgpa)
print(df)
cus_sd <- function(x) {
return(sd(x, na.rm = TRUE))
}
print("After calculating the standard deviation is")
res = aggregate( cgpa ~ names, data = df, FUN = cus_sd)
print(res)
Output:
names cgpa
1 raju 7.5
2 ravi 8.5
3 rakesh 7.0
4 raju 9.5
5 rakesh 8.8
6 ravi 8.0
[1] "After calculating the standard deviation is"
names cgpa
1 raju 1.4142136
2 rakesh 1.2727922
3 ravi 0.3535534
Aggregate data using custom functions using R
In this article, we will explore various methods to aggregate data using custom functions by using the R Programming Language.