What do you do if glmmTMB
hasn’t implemented the
response distributions you want/need? You could try asking the
developers to do it, but if you have the technical skills (reading and
modifying R and C++ code) you may be able to do it yourself. You will
need to make appropriate modifications to the R and C++ code and
recompile/reinstall the package.
This example will show how to add a “zero-one truncated Poisson”, i.e. a Poisson distribution with only values >1. This is a fairly easy case (we will discuss below what characteristics make a distribution easier or harder to implement).
The most general advice is “identify the most similar distribution
that has already been implemented in glmmTMB
and
copy/modify the relevant bits of code”.
Download the tarball (glmmTMB.tar.gz
) from CRAN and
unpack it.
These files are in the src/
directory.
enum valid_family
is the list of distributions that
glmmTMB
knows about. Give your distribution an unused index
(in a range with other similar distributions) and add it to the
list....
truncated_genpois_family =404,
truncated_compois_family =405,
// new
zo_truncated_poisson_family = 410,
...
switch
statement that handles
the conditional likelihood (search for
// Observation likelihood
at the beginning): in this case
we can most easily do this by modifying the code for the zero-truncated
Poisson family (case truncated_poisson_family
):
mu(i)
is the value of the location parameter for the
current observationphi(i)
is the value of the dispersion parameter for the
current observation. This value always uses a log link, so it will be a
value on [0,Inf)
. You should decide whether this value
needs to be transformed or combined with mu(i)
to form the
traditional scale or dispersion parameter for your distribution. There
are lots of examples in glmmTMB.cpp
logspace_add
and logspace_sub
functions in
TMB
: see here
for more information). This isn’t necessary but will make your
computations more stable.SIMULATE{}
condition that samples a random deviate from
your distributioncase zo_truncated_poisson_family:
log_nzprob = logspace_sub(Type(0), -mu(i)); // log(1-exp(-mu(i)));
// now subtract the prob(X==1)
log_nzprob = logspace_sub(log_nzprob, log(mu(i)) - mu(i));
// log-Poisson likelihood minus the 'missing mass'
tmp_loglik = dpois(yobs(i), mu(i), true) - log_nzprob;
// this is a utility for use in ther zero-inflated case
tmp_loglik = zt_lik_nearzero(yobs(i), tmp_loglik);
SIMULATE{
// conveniently, this built-in function already allows truncation
// at different points
yobs(i) = glmmtmb::rtruncated_poisson(1, asDouble(mu(i)));
}
break;
We might be able to get away with specifying family=
as
a list, but it’s better to implement it as a new function.
#' @rdname nbinom2
#' @export
zo_truncated_poisson <- function(link="log") {
r <- list(family="zo_truncated_poisson",
variance=function(lambda) {
stop("haven't implemented variance function")
## should figure this out ...
## (lambda+lambda^2)/(1-exp(-lambda)) - lambda^2/((1-exp(-lambda))^2)
})
return(make_family(r,link))
}
As you can see, I haven’t yet worked out the variance of a zero-one-truncated Poisson. This will only cause problems if/when a user wants to estimate Pearson residuals.
Ideally a $dev.resids()
component should also be added,
to return the deviance residuals (i.e., 2(log L(yi) − log Lsat(yi)),
where Lsat is the
log-likelihood of yi under the
saturated model; see the $dev.resids
components of
families built into base R for examples.
For some families, the variance and deviance-residuals function
require extra information such as a dispersion parameter. For the
nbinom1
and nbinom2
families,
glmmTMB
does some additional stuff to store the value of
the dispersion parameter in the environment of the variance/deviance
residual functions (which share an environment), and to retrieve the
dispersion parameter from the environment (search for “.Theta” in the R
code for the package).
You should also document your new family, probably in the
?glmmTMB::family_glmmTMB
page. This material is located in
R/family.R
, above the nbinom2
family
function.
There may not be any other R code that needs to be updated, depending
on the details of the family you are adding. Again, it’s best to try to
work by analogy with the closest family to the one you’re adding. In
this case, the only occurrence of truncated_poisson
in
glmmTMB.R
is in the definition of which families have no
dispersion parameter:
You need to make sure that your new family function is exported
(listed in the NAMESPACE
file). The easiest way to do this
is by running devtools::document()
.
The R file that keeps the C++ and R code in sync with respect to
which families are available and which numeric code corresponds to which
family is enum.R
. Do not edit this file by
hand: instead, run make enum-update
Re-install the package from source (R CMD INSTALL
or
install.package(..., repos = NULL)
)
library(glmmTMB)
set.seed(101)
dd <- data.frame(y = rpois(500, exp(1)))
table(dd$y)
## 0 1 2 3 4 5 6 7 8 9
## 34 91 117 116 68 45 17 7 3 2
dd <- dd[dd$y>1,,drop=FALSE]
table(dd$y)
## 2 3 4 5 6 7 8 9
## 117 116 68 45 17 7 3 2
glmmTMB(y ~ 1, family = "zo_truncated_poisson", data = dd)
This appears to give the right answer (i.e. the estimated value of the intercept (log-link), 1.015, is close to the true value of 1).
Formula: y ~ 1
Data: dd
AIC BIC logLik df.resid
1184.2750 1188.2020 -591.1375 374
Number of obs: 375
Fixed Effects:
Conditional model:
(Intercept)
1.015
If you are adding the material for long-term use you should also add
some tests to tests/testthat/test-families.R
Some families (Tweedie, Student-t) have shape parameters or other
parameters beyond the usual parameters determining the location (mean)
and scale (dispersion). These parameters are passed in the
thetaf
vector: the best thing to do here is to search the R
and C++ code for “[Tt]weedie” and see what will need to be adjusted.
General advice, but written while adding a “diagonal with homogeneous
variance” (homdiag
) covariance structure.
valid_covStruct
enum
in
glmmTMB.cpp
termwise_nll
. In the case of
homdiag
we can re-use the existing
diag_covstruct
code (since everything is vectorized so
should work equally well with a length-1 or length-p vector of (log) standard
deviations)parFun
to specify the number of parametersglmmTMB()
make enum-update
glmmTMB
objectSince I don’t think this is explicitly documented anywhere …
obj
: this is a TMB-object (no explicit class, just a
list) as created by TMB::MakeADFun()
. It has useful
components
$fn
: the negative log-likelihood function (takes a
vector of non-random parameters (beta
,
betazi
, bzi
, theta
,
thetazi
, psi
depending on the model;
b
and bzi
are excluded)$gr
: gradient of the NLL function$env
: environment, holding useful stuff like
$random
(positions of random-effect parameters),
$last.par.best
, etc.$report
(return derived values)$simulate
(simulate new responses)fit
: results of optimizationsdr
: results of calling sdreport()
call
: original model callframe
: model framemodelInfo
: lots of useful information
nobs
: number of observations (should be the same as
nrow(x$frame)
)respCol
: response columngrpVar
: (?)family
: GLM familycontrasts
reTrms
terms
reStruc
allForm
REML
map
sparseX
parallel