Smooth a non-positive definite correlation matrix to make it positive definite

Factor analysis requires positive definite correlation matrices. Unfortunately, with pairwise deletion of missing data or if using tetrachoric or polychoric correlations, not all correlation matrices are positive definite. cor.smooth does a eigenvector (principal components) smoothing. Negative eigen values are replaced with 100 * eig.tol, the matrix is reproduced and forced to a correlation matrix using cov2cor.

cor.smooth(x,eig.tol=10^-12)
cor.smoother(x,cut=.01)

Arguments

x	A correlation matrix or a raw data matrix.
eig.tol	the minimum acceptable eigenvalue
cut	Report all abs(residuals) > cut

Details

The smoothing is done by eigen value decomposition. eigen values < eig.tol are changed to 100 * eig.tol. The positive eigen values are rescaled to sum to the number of items. The matrix is recomputed (eigen.vectors %*% diag(eigen.values) %*% t(eigen.vectors) and forced to a correlation matrix using cov2cor. (See Bock, Gibbons and Muraki, 1988 and Wothke, 1993).

This does not implement the Knol and ten Berge (1989) solution, nor do nearcor and posdefify in sfmsmisc, not does nearPD in Matrix. As Martin Maechler puts it in the posdedify function, "there are more sophisticated algorithms to solve this and related problems."

cor.smoother examines all of nvar minors of rank nvar-1 by systematically dropping one variable at a time and finding the eigen value decomposition. It reports those variables, which, when dropped, produce a positive definite matrix. It also reports the number of negative eigenvalues when each variable is dropped. Finally, it compares the original correlation matrix to the smoothed correlation matrix and reports those items with absolute deviations great than cut. These are all hints as to what might be wrong with a correlation matrix.

Value

The smoothed matrix with a warning reporting that smoothing was necessary (if smoothing was in fact necessary).

References

R. Darrell Bock, Robert Gibbons and Eiji Muraki (1988) Full-Information Item Factor Analysis. Applied Psychological Measurement, 12 (3), 261-280.

Werner Wothke (1993), Nonpositive definite matrices in structural modeling. In Kenneth A. Bollen and J. Scott Long (Editors),Testing structural equation models, Sage Publications, Newbury Park.

D.L. Knol and JMF ten Berge (1989) Least squares approximation of an improper correlation matrix by a proper one. Psychometrika, 54, 53-61.

Examples

bs <- cor.smooth(burt)  #burt data set is not positive definite
#> Warning: Matrix was not positive definite, smoothing was done
plot(burt[lower.tri(burt)],bs[lower.tri(bs)],ylab="smoothed values",xlab="original values")
abline(0,1,lty="dashed")

round(burt - bs,3)
#>            Sociality Sorrow Tenderness   Joy Wonder Elation Disgust  Anger
#> Sociality      0.000  0.000      0.010 0.002  0.003   0.001   0.003  0.003
#> Sorrow         0.000  0.000      0.016 0.004  0.005   0.002   0.006  0.006
#> Tenderness     0.010  0.016      0.000 0.001 -0.001   0.002  -0.002 -0.003
#> Joy            0.002  0.004      0.001 0.000  0.000   0.000   0.000  0.000
#> Wonder         0.003  0.005     -0.001 0.000  0.000   0.000  -0.001  0.000
#> Elation        0.001  0.002      0.002 0.000  0.000   0.000   0.000  0.001
#> Disgust        0.003  0.006     -0.002 0.000 -0.001   0.000   0.000 -0.001
#> Anger          0.003  0.006     -0.003 0.000  0.000   0.001  -0.001  0.000
#> Sex            0.000 -0.001      0.004 0.001  0.001   0.000   0.001  0.002
#> Fear           0.000  0.002      0.001 0.000  0.000   0.000   0.000  0.000
#> Subjection     0.000  0.001      0.002 0.000  0.000   0.000   0.000  0.000
#>               Sex  Fear Subjection
#> Sociality   0.000 0.000      0.000
#> Sorrow     -0.001 0.002      0.001
#> Tenderness  0.004 0.001      0.002
#> Joy         0.001 0.000      0.000
#> Wonder      0.001 0.000      0.000
#> Elation     0.000 0.000      0.000
#> Disgust     0.001 0.000      0.000
#> Anger       0.002 0.000      0.000
#> Sex         0.000 0.000      0.000
#> Fear        0.000 0.000      0.000
#> Subjection  0.000 0.000      0.000
fa(burt,2) #this throws a warning that the matrix yields an improper solution
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> Warning: Matrix was not positive definite, smoothing was done
#> The estimated weights for the factor scores are probably incorrect.  Try a different factor extraction method.
#> Warning: An ultra-Heywood case was detected.  Examine the results carefully
#> Factor Analysis using method =  minres
#> Call: fa(r = burt, nfactors = 2)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>              MR1   MR2   h2     u2 com
#> Sociality   0.67  0.51 1.02 -0.021 1.9
#> Sorrow      0.26  0.79 0.88  0.119 1.2
#> Tenderness  0.03  0.88 0.79  0.205 1.0
#> Joy         0.45  0.41 0.54  0.463 2.0
#> Wonder      0.67  0.13 0.55  0.450 1.1
#> Elation     0.61  0.16 0.48  0.523 1.1
#> Disgust     0.39  0.26 0.31  0.690 1.7
#> Anger       0.79 -0.15 0.54  0.462 1.1
#> Sex         0.66 -0.08 0.40  0.605 1.0
#> Fear       -0.19  0.59 0.29  0.715 1.2
#> Subjection -0.43  0.60 0.31  0.689 1.8
#> 
#>                        MR1  MR2
#> SS loadings           3.21 2.90
#> Proportion Var        0.29 0.26
#> Cumulative Var        0.29 0.55
#> Proportion Explained  0.53 0.47
#> Cumulative Proportion 0.53 1.00
#> 
#>  With factor correlations of 
#>      MR1  MR2
#> MR1 1.00 0.46
#> MR2 0.46 1.00
#> 
#> Mean item complexity =  1.4
#> Test of the hypothesis that 2 factors are sufficient.
#> 
#> The degrees of freedom for the null model are  55  and the objective function was  29.97
#> The degrees of freedom for the model are 34  and the objective function was  23.27 
#> 
#> The root mean square of the residuals (RMSR) is  0.07 
#> The df corrected root mean square of the residuals is  0.1 
#> 
#> Fit based upon off diagonal values = 0.97
#Smoothing first throws a warning that the matrix was improper, 
#but produces a better solution 
fa(cor.smooth(burt),2)
#> Warning: Matrix was not positive definite, smoothing was done
#> The estimated weights for the factor scores are probably incorrect.  Try a different factor extraction method.
#> Warning: An ultra-Heywood case was detected.  Examine the results carefully
#> Factor Analysis using method =  minres
#> Call: fa(r = cor.smooth(burt), nfactors = 2)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>              MR1   MR2   h2     u2 com
#> Sociality   0.68  0.50 1.02 -0.019 1.8
#> Sorrow      0.28  0.77 0.87  0.130 1.3
#> Tenderness  0.05  0.86 0.78  0.223 1.0
#> Joy         0.46  0.40 0.54  0.462 2.0
#> Wonder      0.68  0.12 0.55  0.451 1.1
#> Elation     0.61  0.15 0.48  0.523 1.1
#> Disgust     0.40  0.25 0.31  0.690 1.7
#> Anger       0.79 -0.16 0.53  0.465 1.1
#> Sex         0.67 -0.09 0.40  0.604 1.0
#> Fear       -0.19  0.60 0.29  0.711 1.2
#> Subjection -0.42  0.61 0.31  0.685 1.8
#> 
#>                        MR1  MR2
#> SS loadings           3.25 2.82
#> Proportion Var        0.30 0.26
#> Cumulative Var        0.30 0.55
#> Proportion Explained  0.54 0.46
#> Cumulative Proportion 0.54 1.00
#> 
#>  With factor correlations of 
#>      MR1  MR2
#> MR1 1.00 0.45
#> MR2 0.45 1.00
#> 
#> Mean item complexity =  1.4
#> Test of the hypothesis that 2 factors are sufficient.
#> 
#> The degrees of freedom for the null model are  55  and the objective function was  29.97
#> The degrees of freedom for the model are 34  and the objective function was  23.22 
#> 
#> The root mean square of the residuals (RMSR) is  0.07 
#> The df corrected root mean square of the residuals is  0.09 
#> 
#> Fit based upon off diagonal values = 0.97

#this next example is a correlation matrix from DeLeuw used as an example 
#in Knol and ten Berge.  
#the example is also used in the nearcor documentation
 cat("pr is the example matrix used in Knol DL, ten Berge (1989)\n")
#> pr is the example matrix used in Knol DL, ten Berge (1989)
 pr <- matrix(c(1,     0.477, 0.644, 0.478, 0.651, 0.826,
                0.477, 1,     0.516, 0.233, 0.682, 0.75,
                0.644, 0.516, 1,     0.599, 0.581, 0.742,
                0.478, 0.233, 0.599, 1,     0.741, 0.8,
                0.651, 0.682, 0.581, 0.741, 1,     0.798,
                0.826, 0.75,  0.742, 0.8,   0.798, 1),
              nrow = 6, ncol = 6)

sm <- cor.smooth(pr)
#> Warning: Matrix was not positive definite, smoothing was done
resid <- pr - sm
# several goodness of fit tests
# from Knol and ten Berge
tr(resid %*% t(resid)) /2
#> [1] 0.003520413

# from nearPD
sum(resid^2)/2
#> [1] 0.003520413

Smooth a non-positive definite correlation matrix to make it positive definite

Arguments

Details

Value

References

See also

Examples

Contents

Author