The stats
implementation of rWishart
is in
C
and is very fast. It is often the case that we do not
want a sample from the Wishart distribution, but rather from the inverse
of it or from the Cholesky decomposition of a sample from the Wishart
distribution. Or even from the inverse of the Cholesky decomposition of
a draw from the Wishart distribution. Funnily enough (if you have a
weird sense of humor), when you inspect the source
code for the rWishart
distribution (R Core Team (2017)), it generates the Cholesky
decomposition and then multiplies it out. Meanwhile, drawing from the
rWishart
and then inverting or doing a Cholesky
decomposition or whatever in R is just slow – comparatively.
This suggests some obvious efficiencies: perhaps, if we would rather have the Cholesky decomposition of the Wishart random matrix, we could tell the function to stop right there.
library('CholWishart')
set.seed(20180220)
A <- stats::rWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
B <- rInvWishart(1,10,.2*diag(4))[,,1]
set.seed(20180220)
C <- rCholWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
D <- rInvCholWishart(1,10,.2*diag(4))[,,1]
Suppose \(X_i \sim MVN(0, \Sigma)\) are independent \(p\)-variate normal random variables, \(i = 1, 2, \ldots n\) with \(n > p-1\). Then \(S = \sum X_i^T X_i\), called the “scatter matrix”, is almost surely positive definite if \(\Sigma\) is positive definite. The random variable \(S\) is said to be distributed as a Wishart random variable: \(S \sim W_p(n, \Sigma)\), see Gupta and Nagar (1999). This can be extended to the non-integer case as well.
How does rWishart(n, df, Sigma)
work (supposing
Sigma
is a \(p \times p\)
matrix)? First, it generates a sample from the Cholesky decomposition of
a Wishart distribution with \(\Sigma =
\mathbf{I}_p\). How this is done: on the \(i^{th}\) element of the main diagonal, draw
from \(\sqrt{\chi_{p-i+1}^2}\). On the
upper triangle of the matrix, sample from an independent \(N(0,1)\) for each entry in the matrix.
Then, this can be multiplied by the Cholesky decomposition of the
provided Sigma
to obtain the Cholesky factor of the desired
sample from the Wishart random variable (this construction is due to
Bartlett and is also known as the Bartlett Decomposition) (see Anderson (1984)). The rWishart
function multiplies this out. Therefore, if the Cholesky decomposition
is desired, one only needs to stop there.
If \(X \sim \textrm{W}_p(\nu,\Sigma)\), then we define the Inverse Wishart as \(X^{-1} = Y \sim \textrm{IW}_p(\nu , \Sigma^{-1})\). There are other parameterizations of the distribution, mostly coming down to different ways of writing the \(\nu\) parameter - be aware of this when using any package drawing from the Inverse Wishart distribution (see Dawid (1981) for an alternative; this presentation follows Gupta and Nagar (1999)). This comes up directly in Bayesian statistics. We are also interested in the Cholesky decomposition of this, as it is required in the generation of the matrix variate \(t\)-distribution. In this package it is done by taking the covariance matrix, inverting it, computing the Cholesky decomposition of the inverted covariance matrix, drawing the Cholesky factor of a Wishart matrix using that, and then inverting based on that (as finding \(\Psi^{-1}\) given the Cholesky factorization of \(\Psi\) is relatively fast). This can then be converted into the Cholesky factor of the Inverse Wishart if that is what is desired. This would be slow to do in R, but in C it is not so bad.
Here is what happens with the results of the above:
A %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16 1.000000e+00 1.387779e-17 0.000000e+00
## [3,] -1.249001e-16 1.387779e-17 1.000000e+00 -5.551115e-17
## [4,] 1.110223e-16 -1.110223e-16 0.000000e+00 1.000000e+00
crossprod(C) %*% crossprod(D) # note: we do not expect C = D^-1, we expect this!
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -2.081668e-16 -1.110223e-16
## [2,] -4.718448e-16 1.000000e+00 1.387779e-16 0.000000e+00
## [3,] -1.249001e-16 1.110223e-16 1.000000e+00 -1.110223e-16
## [4,] 1.110223e-16 -8.326673e-17 -8.326673e-17 1.000000e+00
crossprod(D) %*% A
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -4.718448e-16 -1.249001e-16 1.110223e-16
## [2,] -2.775558e-17 1.000000e+00 1.110223e-16 -8.326673e-17
## [3,] -2.081668e-16 1.387779e-16 1.000000e+00 -8.326673e-17
## [4,] -1.110223e-16 0.000000e+00 -1.110223e-16 1.000000e+00
crossprod(C) %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16 1.000000e+00 1.387779e-17 0.000000e+00
## [3,] -1.249001e-16 1.387779e-17 1.000000e+00 -5.551115e-17
## [4,] 1.110223e-16 -1.110223e-16 0.000000e+00 1.000000e+00
There is some roundoff error.
Suppose, instead of the above definition of the Wishart, we have \(n \leq p-1\). Then the scatter matrix defined above will not be positive definite. This is called the pseudo Wishart distribution. If we then take the Moore-Penrose pseudo-inverse of this, we have the generalized inverse Wishart distribution.
A <- rPseudoWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.9553698 0.3134085 -0.8487259 -0.5174803 0.7487007
## [2,] 0.3134085 2.3856728 -0.6106572 1.4471373 1.5976931
## [3,] -0.8487259 -0.6106572 0.5838199 -0.7509533 -0.7863404
## [4,] -0.5174803 1.4471373 -0.7509533 4.8520003 1.6696925
## [5,] 0.7487007 1.5976931 -0.7863404 1.6696925 1.4396817
qr(A)$rank
## [1] 3
B <- rGenInvWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
B
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.33422500 0.189387794 0.07086651 -0.052283855 -0.04241885
## [2,] 0.18938779 0.155647000 -0.02450637 0.001357582 -0.07549583
## [3,] 0.07086651 -0.024506366 0.10489215 -0.052325490 0.03515182
## [4,] -0.05228386 0.001357582 -0.05232549 0.028056042 -0.02793470
## [5,] -0.04241885 -0.075495830 0.03515182 -0.027934699 0.24217199
qr(B)$rank
## [1] 3
Note that the rank of both of these matrices is less than the dimension.
This package also has functions for density computations with the Wishart distribution. Densities are only defined for positive-definite input matrices and \(\nu\) parameters larger than the dimension \(p\).
The return value is on the log
scale but it can be
specified otherwise.
dWishart(diag(3), df = 5, 5*diag(3))
## [1] -19.45038
dInvWishart(diag(3), df = 5, .2*diag(3))
## [1] -19.45038
Note that, in general, these will not agree even if their covariance matrix parameters are inverses of each other. One of the reasons this works is that the determinant of \(\mathbf{X}\) is \(1\).
The density functions can take 3-D array input indexed on the third dimension and will output a vector of densities.
The multivariate gamma (\(\Gamma_p\)) and digamma (\(\psi_p\)) functions are extensions of the univariate gamma (\(\Gamma\)) and digamma (\(\psi\)) functions (Mardia, Bibby, and Kent (1982)). They are useful in calculating the densities above. They come up in other distributions as well. The digamma is the first derivative of the gamma function. When the dimension \(p = 1\), they coincide with the usual definitions of the digamma and gamma functions.
The multivariate gamma also comes in a logarithmic form
(lmvgamma
).