Factor Analysis

Here’s a step-by-step explanation of factor analysis, followed by an example in R:

Step 1: Data Collection

Collect data on multiple observed variables (also called indicators or manifest variables). These variables are usually measured on a scale and are hypothesized to be influenced by underlying latent factors.

Step 2: Assumptions of Factor Analysis

Factor analysis makes several assumptions, including:

  • Linearity: The relationships between observed variables and latent factors are linear.
  • No Perfect Multicollinearity: There are no perfect linear relationships among the observed variables.
  • Common Variance: Observed variables share common variance due to latent factors.
  • Unique Variance: Each observed variable also has unique variance unrelated to latent factors (measurement error).

Step 3: Factor Extraction

Factor extraction is the process of identifying the underlying latent factors. Common methods for factor extraction include Principal Component Analysis (PCA) and Maximum Likelihood Estimation (MLE). These methods extract factors that explain the most variance in the observed variables.

Step 4: Factor Rotation

After extraction, factors are often rotated to improve interpretability. Rotation methods (e.g., Varimax, Promax) help in achieving a simpler and more interpretable factor structure.

Step 5: Interpretation

Interpret the rotated factor loadings. Factor loadings represent the strength and direction of the relationship between each observed variable and each factor. High loadings indicate a strong relationship.

Step 6: Naming and Using Factors

Based on the interpretation of factor loadings, you can give meaningful names to the factors. These names help in understanding the underlying constructs. Researchers often use these factors in subsequent analyses.

Now, let’s see a code using R:

R




# Load necessary libraries
library(psych)
  
# Generate sample data with three latent factors
set.seed(123)
n <- 100
factor1 <- rnorm(n)
factor2 <- 0.7 * factor1 + rnorm(n)
factor3 <- 0.5 * factor1 + 0.5 * factor2 + rnorm(n)
observed1 <- 0.6 * factor1 + 0.2 * factor2 + rnorm(n)
observed2 <- 0.4 * factor1 + 0.8 * factor2 + rnorm(n)
observed3 <- 0.3 * factor1 + 0.5 * factor3 + rnorm(n)
  
# Create a data frame
data <- data.frame(observed1, observed2, observed3)
  
# Perform factor analysis
factor_analysis <- fa(data, nfactors = 3, rotate = "varimax")
  
# Print factor loadings
print(factor_analysis$loadings)


Output:

Loadings:
MR1 MR2 MR3
observed1 0.169 0.419
observed2 0.574 0.544
observed3 0.582 0.233
MR1 MR2 MR3
SS loadings 0.697 0.526 0.000
Proportion Var 0.232 0.175 0.000
Cumulative Var 0.232 0.408 0.408


In this R example, we first generate sample data with three latent factors and three observed variables. We then use the `fa` function from the `psych` package to perform factor analysis. The output includes factor loadings, which indicate the strength and direction of the relationships between the observed variables and the latent factors.

Here’s a breakdown of the output:

  • Standardized Loadings (Pattern Matrix): This section provides the factor loadings for each observed variable on the three extracted factors (MR1, MR2, and MR3). Factor loadings represent the strength and direction of the relationship between observed variables and latent factors.
  • SS Loadings: These are the sum of squared loadings for each factor, indicating the proportion of variance in the observed variables explained by each factor.
  • Proportion Var: This shows the proportion of total variance explained by each factor.
  • Cumulative Var: This shows the cumulative proportion of total variance explained as more factors are added.

Factor Analysis on Iris Dataset

R




# Load the built-in iris dataset
data(iris)
  
# Perform factor analysis on the iris dataset
factanal_result <- factanal(iris[, 1:4], factors = 1, rotation = "varimax")
  
# Print the factor analysis results
print(factanal_result)


Output:

Call:
factanal(x = iris[, 1:4], factors = 1, rotation = "varimax")
Uniquenesses:
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.240 0.822 0.005 0.069
Loadings:
Factor1
Sepal.Length 0.872
Sepal.Width -0.422
Petal.Length 0.998
Petal.Width 0.965
Factor1
SS loadings 2.864
Proportion Var 0.716
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 85.51 on 2 degrees of freedom.
The p-value is 2.7e-19

In this example, we use the built-in iris dataset, which contains measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers. We perform factor analysis on the first four columns of the dataset (the measurements) using the ‘factanal’ function.

The output includes:

  • Uniquenesses: These values represent the unique variance in each observed variable that is not explained by the factors.
  • Loadings: These values represent the factor loadings for each observed variable on the extracted factors. Positive and high loadings indicate a strong relationship.
  • SS loadings, Proportion Var, and Cumulative Var: These statistics provide information about the variance explained by the extracted factors.
  • Test of the hypothesis: This section provides a chi-square test of whether the selected number of factors is sufficient to explain the variance in the data.

Factor analysis helps in understanding the underlying structure of the iris dataset and can be useful for dimensionality reduction or creating composite variables for further analysis.

By interpreting these factor loadings, researchers can gain insights into the underlying structure of the data and potentially reduce the dimensionality for further analysis.

Principal Components and Factor Analysis Using R

Factor analysis is a statistical technique used for dimensionality reduction and identifying the underlying structure (latent factors) in a dataset. It’s often applied in fields such as psychology, economics, and social sciences to understand the relationships between observed variables. Factor analysis assumes that observed variables can be explained by a smaller number of latent factors.

Table of Content

  • Factor Analysis
  • Unveiling Hidden Insights: Principal Components and Factor Analysis Using R
  • Understanding the Foundation: Principal Components Analysis (PCA)
  • Let’s Walk Through with few Examples

Similar Reads

Factor Analysis

...

Unveiling Hidden Insights: Principal Components and Factor Analysis Using R

Here’s a step-by-step explanation of factor analysis, followed by an example in R:...

Understanding the Foundation: Principal Components Analysis (PCA)

...

Let’s Walk Through with few Examples

...

Conclusion

In the ever-evolving landscape of data analysis, the quest to uncover hidden patterns and reduce the dimensionality of complex datasets has led us to the intriguing realm of Principal Components and Factor Analysis. These techniques offer a lens through which we can distil the essence of our data, capturing its intrinsic structure and shedding light on the underlying relationships between variables. In this article, we embark on a journey to demystify Principal Components Analysis (PCA) and Factor Analysis (FA), exploring their concepts, steps, and implementation using the versatile R programming language....