Factor Analysis

Here’s a step-by-step explanation of factor analysis, followed by an example in R:

Step 1: Data Collection

Collect data on multiple observed variables (also called indicators or manifest variables). These variables are usually measured on a scale and are hypothesized to be influenced by underlying latent factors.

Step 2: Assumptions of Factor Analysis

Factor analysis makes several assumptions, including:

Linearity: The relationships between observed variables and latent factors are linear.
No Perfect Multicollinearity: There are no perfect linear relationships among the observed variables.
Common Variance: Observed variables share common variance due to latent factors.
Unique Variance: Each observed variable also has unique variance unrelated to latent factors (measurement error).

Step 3: Factor Extraction

Factor extraction is the process of identifying the underlying latent factors. Common methods for factor extraction include Principal Component Analysis (PCA) and Maximum Likelihood Estimation (MLE). These methods extract factors that explain the most variance in the observed variables.

Step 4: Factor Rotation

After extraction, factors are often rotated to improve interpretability. Rotation methods (e.g., Varimax, Promax) help in achieving a simpler and more interpretable factor structure.

Step 5: Interpretation

Interpret the rotated factor loadings. Factor loadings represent the strength and direction of the relationship between each observed variable and each factor. High loadings indicate a strong relationship.

Step 6: Naming and Using Factors

Based on the interpretation of factor loadings, you can give meaningful names to the factors. These names help in understanding the underlying constructs. Researchers often use these factors in subsequent analyses.

Now, let’s see a code using R:

R

# Load necessary libraries 
library(psych) 
  
# Generate sample data with three latent factors 
set.seed(123) 
n <- 100 
factor1 <- rnorm(n) 
factor2 <- 0.7 * factor1 + rnorm(n) 
factor3 <- 0.5 * factor1 + 0.5 * factor2 + rnorm(n) 
observed1 <- 0.6 * factor1 + 0.2 * factor2 + rnorm(n) 
observed2 <- 0.4 * factor1 + 0.8 * factor2 + rnorm(n) 
observed3 <- 0.3 * factor1 + 0.5 * factor3 + rnorm(n) 
  
# Create a data frame 
data <- data.frame(observed1, observed2, observed3) 
  
# Perform factor analysis 
factor_analysis <- fa(data, nfactors = 3, rotate = "varimax") 
  
# Print factor loadings 
print(factor_analysis$loadings)

Output:

Loadings:
          MR1   MR2   MR3  
observed1 0.169 0.419      
observed2 0.574 0.544      
observed3 0.582 0.233      
                 MR1   MR2   MR3
SS loadings    0.697 0.526 0.000
Proportion Var 0.232 0.175 0.000
Cumulative Var 0.232 0.408 0.408

In this R example, we first generate sample data with three latent factors and three observed variables. We then use the `fa` function from the `psych` package to perform factor analysis. The output includes factor loadings, which indicate the strength and direction of the relationships between the observed variables and the latent factors.

Here’s a breakdown of the output:

Standardized Loadings (Pattern Matrix): This section provides the factor loadings for each observed variable on the three extracted factors (MR1, MR2, and MR3). Factor loadings represent the strength and direction of the relationship between observed variables and latent factors.
SS Loadings: These are the sum of squared loadings for each factor, indicating the proportion of variance in the observed variables explained by each factor.
Proportion Var: This shows the proportion of total variance explained by each factor.
Cumulative Var: This shows the cumulative proportion of total variance explained as more factors are added.

Factor Analysis on Iris Dataset

R

# Load the built-in iris dataset 
data(iris) 
  
# Perform factor analysis on the iris dataset 
factanal_result <- factanal(iris[, 1:4], factors = 1, rotation = "varimax") 
  
# Print the factor analysis results 
print(factanal_result)

Output:

Call:
factanal(x = iris[, 1:4], factors = 1, rotation = "varimax")
Uniquenesses:
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       0.240        0.822        0.005        0.069 
Loadings:
             Factor1
Sepal.Length  0.872 
Sepal.Width  -0.422 
Petal.Length  0.998 
Petal.Width   0.965 
               Factor1
SS loadings      2.864
Proportion Var   0.716
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 85.51 on 2 degrees of freedom.
The p-value is 2.7e-19

In this example, we use the built-in iris dataset, which contains measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers. We perform factor analysis on the first four columns of the dataset (the measurements) using the ‘factanal’ function.

The output includes:

Uniquenesses: These values represent the unique variance in each observed variable that is not explained by the factors.
Loadings: These values represent the factor loadings for each observed variable on the extracted factors. Positive and high loadings indicate a strong relationship.
SS loadings, Proportion Var, and Cumulative Var: These statistics provide information about the variance explained by the extracted factors.
Test of the hypothesis: This section provides a chi-square test of whether the selected number of factors is sufficient to explain the variance in the data.

Factor analysis helps in understanding the underlying structure of the iris dataset and can be useful for dimensionality reduction or creating composite variables for further analysis.

By interpreting these factor loadings, researchers can gain insights into the underlying structure of the data and potentially reduce the dimensionality for further analysis.

Principal Components and Factor Analysis Using R

Factor analysis is a statistical technique used for dimensionality reduction and identifying the underlying structure (latent factors) in a dataset. It’s often applied in fields such as psychology, economics, and social sciences to understand the relationships between observed variables. Factor analysis assumes that observed variables can be explained by a smaller number of latent factors.

Table of Content

Factor Analysis
Unveiling Hidden Insights: Principal Components and Factor Analysis Using R
Understanding the Foundation: Principal Components Analysis (PCA)
Let’s Walk Through with few Examples

Factor Analysis

R

Factor Analysis on Iris Dataset

R

Principal Components and Factor Analysis Using R

Table of Content

Categories

Contact US

Factor Analysis

R

Factor Analysis on Iris Dataset

R

Principal Components and Factor Analysis Using R

Table of Content

Similar Reads

Categories

Contact US