What is the describe() function?
The describe() function is available in several R packages, with Hmisc and psych being the most popular. This article will guide you through using the describe() function from both packages. the following packages are:
- Hmisc Package
- psych Package
Installing the Required Packages
Before using the describe() function, ensure that the necessary packages are installed and loaded. You can install the packages using the following commands:
install.packages(“Hmisc”)
library(Hmisc)
install.packages(“psych”)
library(psych)
Using describe() from the Hmisc Package
The describe() function from the Hmisc package provides a detailed summary of each variable in a data frame, including the number of missing values, unique values, mean, and quantiles.
library(Hmisc)
# Example data frame
data <- data.frame(
age = c(25, 30, 35, 40, 45, NA),
income = c(50000, 60000, 65000, 70000, 75000, 80000),
gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from Hmisc
describe(data)
Output:
data
3 Variables 6 Observations
----------------------------------------------------------------------------------------
age
n missing distinct Info Mean Gmd
5 1 5 1 35 10
Value 25 30 35 40 45
Frequency 1 1 1 1 1
Proportion 0.2 0.2 0.2 0.2 0.2
For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
income
n missing distinct Info Mean Gmd
6 0 6 1 66667 13333
Value 50000 60000 65000 70000 75000 80000
Frequency 1 1 1 1 1 1
Proportion 0.167 0.167 0.167 0.167 0.167 0.167
For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
gender
n missing distinct
6 0 2
Value female male
Frequency 3 3
Proportion 0.5 0.5
----------------------------------------------------------------------------------------
The output includes the number of observations (n), missing values (missing), unique values (unique), mean, standard deviation (sd), and various percentiles for numeric variables. For factor variables, it shows the count and the unique categories.
Using describe() from the psych Package
The describe() function from the psych package also provides a summary of descriptive statistics, but with a focus on psychological data. It includes measures such as skewness and kurtosis.
library(psych)
# Example data frame
data <- data.frame(
age = c(25, 30, 35, 40, 45, NA),
income = c(50000, 60000, 65000, 70000, 75000, 80000),
gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from psych
describe(data)
Output:
vars n mean sd median trimmed mad min max range skew
age 1 5 35.00 7.91 35.0 35.00 7.41 25 45 20 0.00
income 2 6 66666.67 10801.23 67500.0 66666.67 11119.50 50000 80000 30000 -0.26
gender* 3 6 1.50 0.55 1.5 1.50 0.74 1 2 1 0.00
kurtosis se
age -1.91 3.54
income -1.58 4409.59
gender* -2.31 0.22
In this output:
- vars indicates the variable index.
- n is the number of non-missing values.
- mean is the average.
- sd is the standard deviation.
- median is the middle value.
- trimmed is the mean after trimming 10% of the observations from each tail.
- mad is the median absolute deviation.
- min and max are the minimum and maximum values.
- range is the difference between the maximum and minimum.
- skew is the skewness of the distribution.
- kurtosis is the measure of the “tailedness” of the distribution.
- se is the standard error.
Describe() Function in R
The describe() function in R Programming Language is a useful tool for generating descriptive statistics of data. It provides a comprehensive summary of the variables in a data frame, including central tendency, variability, and distribution measures. This function is particularly valuable for preliminary data analysis, helping to understand the basic characteristics of the dataset.