What is the describe() function?

The describe() function is available in several R packages, with Hmisc and psych being the most popular. This article will guide you through using the describe() function from both packages. the following packages are:

Hmisc Package
psych Package

Installing the Required Packages

Before using the describe() function, ensure that the necessary packages are installed and loaded. You can install the packages using the following commands:

install.packages(“Hmisc”)

library(Hmisc)

install.packages(“psych”)

library(psych)

Using describe() from the Hmisc Package

The describe() function from the Hmisc package provides a detailed summary of each variable in a data frame, including the number of missing values, unique values, mean, and quantiles.

library(Hmisc)
# Example data frame
data <- data.frame(
  age = c(25, 30, 35, 40, 45, NA),
  income = c(50000, 60000, 65000, 70000, 75000, 80000),
  gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from Hmisc
describe(data)

Output:

data 

 3  Variables      6  Observations
----------------------------------------------------------------------------------------
age 
       n  missing distinct     Info     Mean      Gmd 
       5        1        5        1       35       10 
                              
Value       25  30  35  40  45
Frequency    1   1   1   1   1
Proportion 0.2 0.2 0.2 0.2 0.2

For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
income 
       n  missing distinct     Info     Mean      Gmd 
       6        0        6        1    66667    13333 
                                              
Value      50000 60000 65000 70000 75000 80000
Frequency      1     1     1     1     1     1
Proportion 0.167 0.167 0.167 0.167 0.167 0.167

For the frequency table, variable is rounded to the nearest 0
----------------------------------------------------------------------------------------
gender 
       n  missing distinct 
       6        0        2 
                        
Value      female   male
Frequency       3      3
Proportion    0.5    0.5
----------------------------------------------------------------------------------------

The output includes the number of observations (n), missing values (missing), unique values (unique), mean, standard deviation (sd), and various percentiles for numeric variables. For factor variables, it shows the count and the unique categories.

Using describe() from the psych Package

The describe() function from the psych package also provides a summary of descriptive statistics, but with a focus on psychological data. It includes measures such as skewness and kurtosis.

library(psych)
# Example data frame
data <- data.frame(
  age = c(25, 30, 35, 40, 45, NA),
  income = c(50000, 60000, 65000, 70000, 75000, 80000),
  gender = factor(c("male", "female", "female", "male", "male", "female"))
)
# Using describe() from psych
describe(data)

Output:

        vars n     mean       sd  median  trimmed      mad   min   max range  skew
age        1 5    35.00     7.91    35.0    35.00     7.41    25    45    20  0.00
income     2 6 66666.67 10801.23 67500.0 66666.67 11119.50 50000 80000 30000 -0.26
gender*    3 6     1.50     0.55     1.5     1.50     0.74     1     2     1  0.00
        kurtosis      se
age        -1.91    3.54
income     -1.58 4409.59
gender*    -2.31    0.22

In this output:

vars indicates the variable index.
n is the number of non-missing values.
mean is the average.
sd is the standard deviation.
median is the middle value.
trimmed is the mean after trimming 10% of the observations from each tail.
mad is the median absolute deviation.
min and max are the minimum and maximum values.
range is the difference between the maximum and minimum.
skew is the skewness of the distribution.
kurtosis is the measure of the “tailedness” of the distribution.
se is the standard error.

Describe() Function in R

The describe() function in R Programming Language is a useful tool for generating descriptive statistics of data. It provides a comprehensive summary of the variables in a data frame, including central tendency, variability, and distribution measures. This function is particularly valuable for preliminary data analysis, helping to understand the basic characteristics of the dataset.

What is the describe() function?

Installing the Required Packages

Using describe() from the Hmisc Package

Using describe() from the psych Package

Describe() Function in R

Categories

Contact US

What is the describe() function?

Installing the Required Packages

Using describe() from the Hmisc Package

Using describe() from the psych Package

Describe() Function in R

Similar Reads

Categories

Contact US