Factor Analysis
Here’s a step-by-step explanation of factor analysis, followed by an example in R:
Step 1: Data Collection
Collect data on multiple observed variables (also called indicators or manifest variables). These variables are usually measured on a scale and are hypothesized to be influenced by underlying latent factors.
Step 2: Assumptions of Factor Analysis
Factor analysis makes several assumptions, including:
- Linearity: The relationships between observed variables and latent factors are linear.
- No Perfect Multicollinearity: There are no perfect linear relationships among the observed variables.
- Common Variance: Observed variables share common variance due to latent factors.
- Unique Variance: Each observed variable also has unique variance unrelated to latent factors (measurement error).
Step 3: Factor Extraction
Factor extraction is the process of identifying the underlying latent factors. Common methods for factor extraction include Principal Component Analysis (PCA) and Maximum Likelihood Estimation (MLE). These methods extract factors that explain the most variance in the observed variables.
Step 4: Factor Rotation
After extraction, factors are often rotated to improve interpretability. Rotation methods (e.g., Varimax, Promax) help in achieving a simpler and more interpretable factor structure.
Step 5: Interpretation
Interpret the rotated factor loadings. Factor loadings represent the strength and direction of the relationship between each observed variable and each factor. High loadings indicate a strong relationship.
Step 6: Naming and Using Factors
Based on the interpretation of factor loadings, you can give meaningful names to the factors. These names help in understanding the underlying constructs. Researchers often use these factors in subsequent analyses.
Now, let’s see a code using R:
R
# Load necessary libraries library (psych) # Generate sample data with three latent factors set.seed (123) n <- 100 factor1 <- rnorm (n) factor2 <- 0.7 * factor1 + rnorm (n) factor3 <- 0.5 * factor1 + 0.5 * factor2 + rnorm (n) observed1 <- 0.6 * factor1 + 0.2 * factor2 + rnorm (n) observed2 <- 0.4 * factor1 + 0.8 * factor2 + rnorm (n) observed3 <- 0.3 * factor1 + 0.5 * factor3 + rnorm (n) # Create a data frame data <- data.frame (observed1, observed2, observed3) # Perform factor analysis factor_analysis <- fa (data, nfactors = 3, rotate = "varimax" ) # Print factor loadings print (factor_analysis$loadings) |
Output:
Loadings:
MR1 MR2 MR3
observed1 0.169 0.419
observed2 0.574 0.544
observed3 0.582 0.233
MR1 MR2 MR3
SS loadings 0.697 0.526 0.000
Proportion Var 0.232 0.175 0.000
Cumulative Var 0.232 0.408 0.408
In this R example, we first generate sample data with three latent factors and three observed variables. We then use the `fa` function from the `psych` package to perform factor analysis. The output includes factor loadings, which indicate the strength and direction of the relationships between the observed variables and the latent factors.
Here’s a breakdown of the output:
- Standardized Loadings (Pattern Matrix): This section provides the factor loadings for each observed variable on the three extracted factors (MR1, MR2, and MR3). Factor loadings represent the strength and direction of the relationship between observed variables and latent factors.
- SS Loadings: These are the sum of squared loadings for each factor, indicating the proportion of variance in the observed variables explained by each factor.
- Proportion Var: This shows the proportion of total variance explained by each factor.
- Cumulative Var: This shows the cumulative proportion of total variance explained as more factors are added.
Factor Analysis on Iris Dataset
R
# Load the built-in iris dataset data (iris) # Perform factor analysis on the iris dataset factanal_result <- factanal (iris[, 1:4], factors = 1, rotation = "varimax" ) # Print the factor analysis results print (factanal_result) |
Output:
Call:
factanal(x = iris[, 1:4], factors = 1, rotation = "varimax")
Uniquenesses:
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.240 0.822 0.005 0.069
Loadings:
Factor1
Sepal.Length 0.872
Sepal.Width -0.422
Petal.Length 0.998
Petal.Width 0.965
Factor1
SS loadings 2.864
Proportion Var 0.716
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 85.51 on 2 degrees of freedom.
The p-value is 2.7e-19
In this example, we use the built-in iris dataset, which contains measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers. We perform factor analysis on the first four columns of the dataset (the measurements) using the ‘factanal’ function.
The output includes:
- Uniquenesses: These values represent the unique variance in each observed variable that is not explained by the factors.
- Loadings: These values represent the factor loadings for each observed variable on the extracted factors. Positive and high loadings indicate a strong relationship.
- SS loadings, Proportion Var, and Cumulative Var: These statistics provide information about the variance explained by the extracted factors.
- Test of the hypothesis: This section provides a chi-square test of whether the selected number of factors is sufficient to explain the variance in the data.
Factor analysis helps in understanding the underlying structure of the iris dataset and can be useful for dimensionality reduction or creating composite variables for further analysis.
By interpreting these factor loadings, researchers can gain insights into the underlying structure of the data and potentially reduce the dimensionality for further analysis.
Principal Components and Factor Analysis Using R
Factor analysis is a statistical technique used for dimensionality reduction and identifying the underlying structure (latent factors) in a dataset. It’s often applied in fields such as psychology, economics, and social sciences to understand the relationships between observed variables. Factor analysis assumes that observed variables can be explained by a smaller number of latent factors.
Table of Content
- Factor Analysis
- Unveiling Hidden Insights: Principal Components and Factor Analysis Using R
- Understanding the Foundation: Principal Components Analysis (PCA)
- Let’s Walk Through with few Examples