What is the describe() function?

The describe() function is available in several R packages, with Hmisc and psych being the most popular. This article will guide you through using the describe() function from both packages. the following packages are:

  1. Hmisc Package
  2. psych Package

Installing the Required Packages

Before using the describe() function, ensure that the necessary packages are installed and loaded. You can install the packages using the following commands:

install.packages(“Hmisc”)

library(Hmisc)

install.packages(“psych”)

library(psych)

Using describe() from the Hmisc Package

The describe() function from the Hmisc package provides a detailed summary of each variable in a data frame, including the number of missing values, unique values, mean, and quantiles.

R
library(Hmisc)
# Example data frame
data <- data.frame(
  age = c(25, 30, 35, 40, 45, NA),
  income = c(50000, 60000, 65000, 70000, 75000, 80000),
  gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from Hmisc
describe(data)

Output:

data 

3 Variables 6 Observations
----------------------------------------------------------------------------------------
age
n missing distinct Info Mean Gmd
5 1 5 1 35 10

Value 25 30 35 40 45
Frequency 1 1 1 1 1
Proportion 0.2 0.2 0.2 0.2 0.2

For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
income
n missing distinct Info Mean Gmd
6 0 6 1 66667 13333

Value 50000 60000 65000 70000 75000 80000
Frequency 1 1 1 1 1 1
Proportion 0.167 0.167 0.167 0.167 0.167 0.167

For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
gender
n missing distinct
6 0 2

Value female male
Frequency 3 3
Proportion 0.5 0.5
----------------------------------------------------------------------------------------

The output includes the number of observations (n), missing values (missing), unique values (unique), mean, standard deviation (sd), and various percentiles for numeric variables. For factor variables, it shows the count and the unique categories.

Using describe() from the psych Package

The describe() function from the psych package also provides a summary of descriptive statistics, but with a focus on psychological data. It includes measures such as skewness and kurtosis.

R
library(psych)
# Example data frame
data <- data.frame(
  age = c(25, 30, 35, 40, 45, NA),
  income = c(50000, 60000, 65000, 70000, 75000, 80000),
  gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from psych
describe(data)

Output:

        vars n     mean       sd  median  trimmed      mad   min   max range  skew
age 1 5 35.00 7.91 35.0 35.00 7.41 25 45 20 0.00
income 2 6 66666.67 10801.23 67500.0 66666.67 11119.50 50000 80000 30000 -0.26
gender* 3 6 1.50 0.55 1.5 1.50 0.74 1 2 1 0.00
kurtosis se
age -1.91 3.54
income -1.58 4409.59
gender* -2.31 0.22

In this output:

  • vars indicates the variable index.
  • n is the number of non-missing values.
  • mean is the average.
  • sd is the standard deviation.
  • median is the middle value.
  • trimmed is the mean after trimming 10% of the observations from each tail.
  • mad is the median absolute deviation.
  • min and max are the minimum and maximum values.
  • range is the difference between the maximum and minimum.
  • skew is the skewness of the distribution.
  • kurtosis is the measure of the “tailedness” of the distribution.
  • se is the standard error.

Describe() Function in R

The describe() function in R Programming Language is a useful tool for generating descriptive statistics of data. It provides a comprehensive summary of the variables in a data frame, including central tendency, variability, and distribution measures. This function is particularly valuable for preliminary data analysis, helping to understand the basic characteristics of the dataset.

Similar Reads

What is the describe() function?

The describe() function is available in several R packages, with Hmisc and psych being the most popular. This article will guide you through using the describe() function from both packages. the following packages are:...

Conclusion

The describe() function in R is a powerful tool for generating descriptive statistics. Whether using the Hmisc or psych package, describe() provides a comprehensive summary of your data, making it easier to understand the basic characteristics of your dataset. Here’s a quick comparison:...