[R-group] updated question: mixed model for percentage data in R

Michael Renton michael.renton at uwa.edu.au
Thu May 21 12:22:47 AWST 2015


Here is a short bit of code to try to explain why I wouldn’t use Gaussian errors with logistic link function to model data constrained between 0 and 1… but would rather logit-transform the data, then fit a linear model, then back-transform to get predictions…

In both cases the expected values predicted lie between 0 and 1, as you say Mark, but in the first case if you actually simulate predictions from the model including the modelled error/variation, then you can go out of 0-1, whereas in the second case you can’t.

n=10000
xs = runif(n)*10
expected.values = plogis(2*xs+0.5)
plot(xs,expected.values,cex=0.1)

## gaussian error added to logistic expected values
simulated.values.1 = plogis(2*xs+0.5)+rnorm(n)*0.1
plot(xs,simulated.values.1,cex=0.1)

## gaussian error added to linear expected values, then logit transformed
simulated.values.2 = plogis(2*xs+0.5+rnorm(n)*0.2)
plot(xs,simulated.values.2,cex=0.1)


From: r-group-bounces at maillists.uwa.edu.au [mailto:r-group-bounces at maillists.uwa.edu.au] On Behalf Of Mark Murphy
Sent: Wednesday, 20 May 2015 9:39 PM
To: Angela Eads
Cc: r-group at maillists.uwa.edu.au
Subject: Re: [R-group] updated question: mixed model for percentage data in R

Hi,
In regards to model predictions that fall outside the 0-1 range because of the gaussian specification, when you generate predicted values from the model you can specify whether to predict values in the units of the (untransformed) response variable, e.g. in lme4: predict(model, type = 'response', ...). So predicted values would then be bounded by 0 and 1. The default option is 'link' which predicts values on the scale of the linear predictor - which wouldn't be meaningful for interpretation/presentation.

I'm probably missing some other fundamental problem with this kind of model spec though.... Would love to know as I'm still getting to used to this stuff :p

M

On 20 May 2015 at 16:48, Angela Eads <angela.eads at graduate.uwa.edu.au<mailto:angela.eads at graduate.uwa.edu.au>> wrote:
Hey all,
We've had a few questions on GLMMs lately, and I keep directing people to this website:
http://glmm.wikidot.com/faq
It's written by Ben Bolker from an ecologist's point of view and I've found it very useful.
Just thought I'd share it with the group!
Cheers,
Angie

---------------------------------------------------
Angela Eads
BA/BSc (Hons), PhD candidate

Zoology G.21
School of Animal Biology (M092)
The University of Western Australia
Crawley, WA  6009

Phone: (08) 6488 1483<tel:%2808%29%206488%201483> or 0415 662 737<tel:0415%20662%20737>
Email: angela.eads at graduate.uwa.edu.au<mailto:angela.eads at graduate.uwa.edu.au>

On 20 May 2015 at 16:40, Clelia Gasparini <clelia.gasparini at uwa.edu.au<mailto:clelia.gasparini at uwa.edu.au>> wrote:
Hi Keren

That’s my two cent contribution on percentage data.
Usually, I use binomial only if it is a true yes or no data, and not sure what your proportions are referring to. So, if this is not the case I would indeed not use the binomial but rather the Gaussian as other suggested. Even if it is probably a bit traditional I go for transforming data with arcsin [ asin(sqrt(data$var) ] , even if it is true that nowadays transforming data is going out of fashion ;).

Good luck!
Clelia

*********************************************

Dr Clelia Gasparini
clelia.gasparini at uwa.edu.au<mailto:clelia.gasparini at uwa.edu.au>
Centre for Evolutionary Biology
School of Animal Biology (M092)
The University of Western Australia
Crawley, WA 6009, Australia
ph: +61 8 6488 2239<tel:%2B61%208%206488%202239>/1477

From: r-group-bounces at maillists.uwa.edu.au<mailto:r-group-bounces at maillists.uwa.edu.au> [mailto:r-group-bounces at maillists.uwa.edu.au<mailto:r-group-bounces at maillists.uwa.edu.au>] On Behalf Of Michael Renton
Sent: Wednesday, 20 May 2015 4:24 PM
To: Mark Murphy; Keren Raiter

Cc: r-group at maillists.uwa.edu.au<mailto:r-group at maillists.uwa.edu.au>
Subject: Re: [R-group] updated question: mixed model for percentage data in R

Ha Keren, you’ve entered the murky world of glimmers: generalised linear mixed-effects… give up all hope of a simple black and white answer to anything ☺

If you use glmer for your model and specify a gaussian distribution with a logit link, then you have the problem that predicated values will be outside the 0-1 range. That’s just what the reply in the Mark’s forum link below says, and it must be true if you think about it – you have normal errors on a predicted mean that tend to 0 and 1 – there is nothing to squish them tighter as you get closer to 0 and 1.

For that reason I would NOT use family = gaussian (link = 'logit') but rather prefer logit transforming the response first and just fitting the standard linear mixed effects model, even if it seems old-fashioned. Quasibinomial error would be nice I think, but like you say it doesn’t seem to work anymore in lme4.

When I’ve done glmms, I’ve never used automated model selection but rather stepped through carefully myself, removing one by one. I don’t trust automated model selection at the best of times, and with the complexity of mixed effects, I’d want to make decisions myself…




From: r-group-bounces at maillists.uwa.edu.au<mailto:r-group-bounces at maillists.uwa.edu.au> [mailto:r-group-bounces at maillists.uwa.edu.au] On Behalf Of Mark Murphy
Sent: Wednesday, 20 May 2015 12:08 PM
To: Keren Raiter
Cc: r-group at maillists.uwa.edu.au<mailto:r-group at maillists.uwa.edu.au>
Subject: Re: [R-group] updated question: mixed model for percentage data in R

Hi Keren,
I don't know if this will help much, but I am running similar models on similar types of data so can share my experience with it.
I would use glmer for your model and specify a gaussian distribution with a logit link. I would try not to transform your response before running the model, as the accepted wisdom these days seems to be to fit the model to the data using links etc. (but other people might disagree?). Also transforming in advance could give you different results (variance structure of the models) than using a link, see http://stats.stackexchange.com/questions/48485/what-is-the-difference-between-logit-transformed-linear-regression-logistic-reg.

So this model specification looks fine to me: model1 <- glmer (response ~ predictor + predictor + (1| random), data = data, family = gaussian (link = 'logit')). Note the quotation marks around logit (' or " work the same), this might be why it is not working for you in lme4? Otherwise I'm not sure why it wouldn't work.... Then if it runs, check to see if its a good fit by looking at normality and heteroscedasticity of model residuals, but I think glmms are fairly robust to deviations from at least normality?
In regards to convergence issues and rescaling predictors, have you tried centring and scaling them by 2 standard devations (see attached paper)? You could use code something like: variable.cs <- scale(data$variable, center = TRUE, scale = 2*sd(na.omit(data$variable))). I found this solved a lot of my convergence issues, and it also makes model coefficients much easier to compare across predictors and models. There would also be other ways to standardise predictors, but the idea is to try to get them all to the same scale of values.
Regarding automating model selection, I have not used the step() function so cannot comment on that, anyone else? I currently use the packages MuMIn and AICmodelavg for model averaging and selection of sets of models based on model weighting (AIC etc.). I have some code for that if you are interested.
Cheers,
Mark


On 19 May 2015 at 22:03, Keren Raiter <keren.raiter at research.uwa.edu.au<mailto:keren.raiter at research.uwa.edu.au>> wrote:
Hi again!

Peter Yeeles has very kindly suggested that the binomial error distribution should only be used for true proportion (ie numerator and denominator) data, and that the best way to do this might look something like this:

model1 <- lmer (response ~ predictor + predictor + (1| random), data = data, family = gaussian (link=logit))

lme4 told me that "calling lmer with 'family' is deprecated; please use glmer() instead". So I'm now using glmer instead.

My problems are now that:

a) lme4 doesn't offer the option of using a logit link with the gaussian family. It tells me that the available links are identity, log, and inverse. From my exploration of the data, I don't think that a log transformtion is enough, it really needs something more along the lines of a logit, which will bound the predicted responses at 0 and 1, and deal well with the skewness of the data. What do you recommend? will an inverse transformation do the job?

b) When I do run it with the inverse transformation, or with the log transformation, I get the following error messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00179055 (tol = 0.001, component 8)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

I've looked on the web for info about this  'failed to converge' issue - a few discussions indicated that it may be an error that I can ignore but nothing said so definitively, nor gave clear reasons why: can anyone advise what I should do about this non-convergence? Also regarding the large eigenvalue ratio; I've painstakingly rescaled every one of my numerical variables such that they all range between 0-3 and 0-10 - no orders of magnitude differences in scales. Why am I still getting this issue and what can I do about it?

...can't wait to get on to model validation! ;)

Keren

__________________________________________________________________________________________

On 19 May 2015 at 20:23, Keren Raiter <keren.raiter at research.uwa.edu.au<mailto:keren.raiter at research.uwa.edu.au>> wrote:
Hi team!

I'm a bit stuck on something and wonder if someone could point me in the right direction.

I'm trying to fit a model with a whole lot of predictors (mainly categorical and a few numerical). Afterwards I will want to reduce the model to a few significant explanatory variables.

The response variable is percentage data eg 0.03, 0.15, 0.48...

I have a random factor to include as well.

I'm trying to fit a GLMM and am unsure how to specify it. My data exploration revealed that the response variable is mainly squished on one side (ie lots of very small values) and that a logit transformation makes it nicely spread out. And since it's percent data and is bounded by 0 and 1, I figured I should do a logistic regression. I tried to use a quasibinomial link but glmer doesn't take it; I tried using family=binomial but it returns the "non-integer # successes in a binomial glm!" error.

I have even tried manually transforming the response variable with the logit transformation and fitting a linear mixed-effects model (R allows me to do this; I haven't validated this model though), but I can't imagine that that's really the best way of going about it...

Any ideas on how to fit a glmm to my data with percentage response variable? any advice on which package to use and an example of the notation would be really appreciated too.

Lastly, I'm also wondering how to automate model selection once I have the full model (and validated it); the step() function doesn't seem to work with these, can anyone recommend another? One internet source says to use step() in the lmerTest package, is that what others would recommend?

Thank you in advance,
Keren

--


Keren Raiter | PhD candidate | University of Western Australia
Ecosystem Restoration and Intervention Ecology<http://www.erie-research.org/>, ARC Centre of Excellence for Environmental Decisions<http://www.ceed.edu.au/>, & NERP Environmental Decisions Hub<http://www.nerpdecisions.edu.au/>
0401681752<tel:0401681752> | keren.raiter at research.uwa.edu.au<mailto:keren.raiter at research.uwa.edu.au> | sustainingecology.com/research<http://sustainingecology.com/research>




--


Keren Raiter | PhD candidate | University of Western Australia
Ecosystem Restoration and Intervention Ecology<http://www.erie-research.org/>, ARC Centre of Excellence for Environmental Decisions<http://www.ceed.edu.au/>, & NERP Environmental Decisions Hub<http://www.nerpdecisions.edu.au/>
0401681752<tel:0401681752> | keren.raiter at research.uwa.edu.au<mailto:keren.raiter at research.uwa.edu.au> | sustainingecology.com/research<http://sustainingecology.com/research>


_______________________________________________
Ready to present an R meeting? Book a date on
http://goo.gl/tws96 and send the group a timely
announcement to the R-group mailing list
(R-group at maillists.uwa.edu.au<mailto:R-group at maillists.uwa.edu.au>).

You can subscribe and unsubscribe via our web page:
https://maillists.uwa.edu.au/mailman/listinfo/r-group



--
Mark Murphy MSc
PhD Candidate - Insect Ecology and Management Group
School of Animal Biology (M092)
The University of Western Australia
35 Stirling Highway, WA 6009
animals.uwa.edu.au/research/postgrads?profile/1/id/3808<http://www.animals.uwa.edu.au/research/postgrads?profile/1/id/3808>
animals.uwa.edu.au/research/ecology#ins<http://www.animals.uwa.edu.au/research/ecology#ins>

_______________________________________________
Ready to present an R meeting? Book a date on
http://goo.gl/tws96 and send the group a timely
announcement to the R-group mailing list
(R-group at maillists.uwa.edu.au<mailto:R-group at maillists.uwa.edu.au>).

You can subscribe and unsubscribe via our web page:
https://maillists.uwa.edu.au/mailman/listinfo/r-group


_______________________________________________
Ready to present an R meeting? Book a date on
http://goo.gl/tws96 and send the group a timely
announcement to the R-group mailing list
(R-group at maillists.uwa.edu.au<mailto:R-group at maillists.uwa.edu.au>).

You can subscribe and unsubscribe via our web page:
https://maillists.uwa.edu.au/mailman/listinfo/r-group



--
Mark Murphy MSc
PhD Candidate - Insect Ecology and Management Group
School of Animal Biology (M092)
The University of Western Australia
35 Stirling Highway, WA 6009
animals.uwa.edu.au/research/postgrads?profile/1/id/3808<http://www.animals.uwa.edu.au/research/postgrads?profile/1/id/3808>
animals.uwa.edu.au/research/ecology#ins<http://www.animals.uwa.edu.au/research/ecology#ins>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://maillists.uwa.edu.au/pipermail/r-group/attachments/20150521/35bb788d/attachment-0001.html>


More information about the R-group mailing list