Factor analysis requires positive definite correlation matrices. Unfortunately, with pairwise deletion of missing data or if using tetrachoric or polychoric correlations, not all correlation matrices are positive definite. cor.smooth does a eigenvector (principal components) smoothing. Negative eigen values are replaced with 100 * eig.tol, the matrix is reproduced and forced to a correlation matrix using cov2cor.

cor.smooth(x,eig.tol=10^-12)
cor.smoother(x,cut=.01)

Arguments

x

A correlation matrix or a raw data matrix.

eig.tol

the minimum acceptable eigenvalue

cut

Report all abs(residuals) > cut

Details

The smoothing is done by eigen value decomposition. eigen values < eig.tol are changed to 100 * eig.tol. The positive eigen values are rescaled to sum to the number of items. The matrix is recomputed (eigen.vectors %*% diag(eigen.values) %*% t(eigen.vectors) and forced to a correlation matrix using cov2cor. (See Bock, Gibbons and Muraki, 1988 and Wothke, 1993).

This does not implement the Knol and ten Berge (1989) solution, nor do nearcor and posdefify in sfmsmisc, not does nearPD in Matrix. As Martin Maechler puts it in the posdedify function, "there are more sophisticated algorithms to solve this and related problems."

cor.smoother examines all of nvar minors of rank nvar-1 by systematically dropping one variable at a time and finding the eigen value decomposition. It reports those variables, which, when dropped, produce a positive definite matrix. It also reports the number of negative eigenvalues when each variable is dropped. Finally, it compares the original correlation matrix to the smoothed correlation matrix and reports those items with absolute deviations great than cut. These are all hints as to what might be wrong with a correlation matrix.

Value

The smoothed matrix with a warning reporting that smoothing was necessary (if smoothing was in fact necessary).

References

R. Darrell Bock, Robert Gibbons and Eiji Muraki (1988) Full-Information Item Factor Analysis. Applied Psychological Measurement, 12 (3), 261-280.

Werner Wothke (1993), Nonpositive definite matrices in structural modeling. In Kenneth A. Bollen and J. Scott Long (Editors),Testing structural equation models, Sage Publications, Newbury Park.

D.L. Knol and JMF ten Berge (1989) Least squares approximation of an improper correlation matrix by a proper one. Psychometrika, 54, 53-61.

See also

tetrachoric, polychoric, fa and irt.fa, and the burt data set.

See also nearcor and posdefify in the sfsmisc package and nearPD in the Matrix package.

Examples

bs <- cor.smooth(burt) #burt data set is not positive definite
#> Warning: Matrix was not positive definite, smoothing was done
plot(burt[lower.tri(burt)],bs[lower.tri(bs)],ylab="smoothed values",xlab="original values")
abline(0,1,lty="dashed")
round(burt - bs,3)
#> Sociality Sorrow Tenderness Joy Wonder Elation Disgust Anger #> Sociality 0.000 0.000 0.010 0.002 0.003 0.001 0.003 0.003 #> Sorrow 0.000 0.000 0.016 0.004 0.005 0.002 0.006 0.006 #> Tenderness 0.010 0.016 0.000 0.001 -0.001 0.002 -0.002 -0.003 #> Joy 0.002 0.004 0.001 0.000 0.000 0.000 0.000 0.000 #> Wonder 0.003 0.005 -0.001 0.000 0.000 0.000 -0.001 0.000 #> Elation 0.001 0.002 0.002 0.000 0.000 0.000 0.000 0.001 #> Disgust 0.003 0.006 -0.002 0.000 -0.001 0.000 0.000 -0.001 #> Anger 0.003 0.006 -0.003 0.000 0.000 0.001 -0.001 0.000 #> Sex 0.000 -0.001 0.004 0.001 0.001 0.000 0.001 0.002 #> Fear 0.000 0.002 0.001 0.000 0.000 0.000 0.000 0.000 #> Subjection 0.000 0.001 0.002 0.000 0.000 0.000 0.000 0.000 #> Sex Fear Subjection #> Sociality 0.000 0.000 0.000 #> Sorrow -0.001 0.002 0.001 #> Tenderness 0.004 0.001 0.002 #> Joy 0.001 0.000 0.000 #> Wonder 0.001 0.000 0.000 #> Elation 0.000 0.000 0.000 #> Disgust 0.001 0.000 0.000 #> Anger 0.002 0.000 0.000 #> Sex 0.000 0.000 0.000 #> Fear 0.000 0.000 0.000 #> Subjection 0.000 0.000 0.000
fa(burt,2) #this throws a warning that the matrix yields an improper solution
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
#> Warning: An ultra-Heywood case was detected. Examine the results carefully
#> Factor Analysis using method = minres #> Call: fa(r = burt, nfactors = 2) #> Standardized loadings (pattern matrix) based upon correlation matrix #> MR1 MR2 h2 u2 com #> Sociality 0.67 0.51 1.02 -0.021 1.9 #> Sorrow 0.26 0.79 0.88 0.119 1.2 #> Tenderness 0.03 0.88 0.79 0.205 1.0 #> Joy 0.45 0.41 0.54 0.463 2.0 #> Wonder 0.67 0.13 0.55 0.450 1.1 #> Elation 0.61 0.16 0.48 0.523 1.1 #> Disgust 0.39 0.26 0.31 0.690 1.7 #> Anger 0.79 -0.15 0.54 0.462 1.1 #> Sex 0.66 -0.08 0.40 0.605 1.0 #> Fear -0.19 0.59 0.29 0.715 1.2 #> Subjection -0.43 0.60 0.31 0.689 1.8 #> #> MR1 MR2 #> SS loadings 3.21 2.90 #> Proportion Var 0.29 0.26 #> Cumulative Var 0.29 0.55 #> Proportion Explained 0.53 0.47 #> Cumulative Proportion 0.53 1.00 #> #> With factor correlations of #> MR1 MR2 #> MR1 1.00 0.46 #> MR2 0.46 1.00 #> #> Mean item complexity = 1.4 #> Test of the hypothesis that 2 factors are sufficient. #> #> The degrees of freedom for the null model are 55 and the objective function was 29.97 #> The degrees of freedom for the model are 34 and the objective function was 23.27 #> #> The root mean square of the residuals (RMSR) is 0.07 #> The df corrected root mean square of the residuals is 0.1 #> #> Fit based upon off diagonal values = 0.97
#Smoothing first throws a warning that the matrix was improper, #but produces a better solution fa(cor.smooth(burt),2)
#> Warning: Matrix was not positive definite, smoothing was done
#> The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
#> Warning: An ultra-Heywood case was detected. Examine the results carefully
#> Factor Analysis using method = minres #> Call: fa(r = cor.smooth(burt), nfactors = 2) #> Standardized loadings (pattern matrix) based upon correlation matrix #> MR1 MR2 h2 u2 com #> Sociality 0.68 0.50 1.02 -0.019 1.8 #> Sorrow 0.28 0.77 0.87 0.130 1.3 #> Tenderness 0.05 0.86 0.78 0.223 1.0 #> Joy 0.46 0.40 0.54 0.462 2.0 #> Wonder 0.68 0.12 0.55 0.451 1.1 #> Elation 0.61 0.15 0.48 0.523 1.1 #> Disgust 0.40 0.25 0.31 0.690 1.7 #> Anger 0.79 -0.16 0.53 0.465 1.1 #> Sex 0.67 -0.09 0.40 0.604 1.0 #> Fear -0.19 0.60 0.29 0.711 1.2 #> Subjection -0.42 0.61 0.31 0.685 1.8 #> #> MR1 MR2 #> SS loadings 3.25 2.82 #> Proportion Var 0.30 0.26 #> Cumulative Var 0.30 0.55 #> Proportion Explained 0.54 0.46 #> Cumulative Proportion 0.54 1.00 #> #> With factor correlations of #> MR1 MR2 #> MR1 1.00 0.45 #> MR2 0.45 1.00 #> #> Mean item complexity = 1.4 #> Test of the hypothesis that 2 factors are sufficient. #> #> The degrees of freedom for the null model are 55 and the objective function was 29.97 #> The degrees of freedom for the model are 34 and the objective function was 23.22 #> #> The root mean square of the residuals (RMSR) is 0.07 #> The df corrected root mean square of the residuals is 0.09 #> #> Fit based upon off diagonal values = 0.97
#this next example is a correlation matrix from DeLeuw used as an example #in Knol and ten Berge. #the example is also used in the nearcor documentation cat("pr is the example matrix used in Knol DL, ten Berge (1989)\n")
#> pr is the example matrix used in Knol DL, ten Berge (1989)
pr <- matrix(c(1, 0.477, 0.644, 0.478, 0.651, 0.826, 0.477, 1, 0.516, 0.233, 0.682, 0.75, 0.644, 0.516, 1, 0.599, 0.581, 0.742, 0.478, 0.233, 0.599, 1, 0.741, 0.8, 0.651, 0.682, 0.581, 0.741, 1, 0.798, 0.826, 0.75, 0.742, 0.8, 0.798, 1), nrow = 6, ncol = 6) sm <- cor.smooth(pr)
#> Warning: Matrix was not positive definite, smoothing was done
resid <- pr - sm # several goodness of fit tests # from Knol and ten Berge tr(resid %*% t(resid)) /2
#> [1] 0.003520413
# from nearPD sum(resid^2)/2
#> [1] 0.003520413