Help with model selection - truncated negative-binomial or truncated linear?

#1
Hi Everyone,

I have a question about the best model to use for this analysis.

The question I am trying to analyze is - What is the association between Indicator 1 and age, sex, and group?
  1. Data is measured on a facility/site. This is a random effect. The data are aggregated within the site by group, age, and sex.
  2. Indicator 1 is defined as Indicator A at a certain time point / Indicator B at a time point 6 months prior. Both Indicators A and B are counts. Therefore Indicator 1 is a proportion.
Problem: While the value 0 does exist in life for Indicators A and B they do not exist in the dataset. The data system that centralizes the collection of the data made a policy decision to not include 0 counts as a way to reduce the size of the datasets. Therefore, there are times when the data for either Indicator A or B are missing, about 10% of the data. Could be a 0. Could be true missing. If I had to make an educated guess I would say they're mostly 0s.
However, as it stands, when the data is processed all values for A and B are >0. If it's missing in A but not in B, or vice versa, it's excluded since that means Indicator 1 isn't estimable. It's messy data, for sure.

So, my initial thought was to model this as a truncated negative binomial. In terms of R code:

f1 <- glmmTMB(IndA ~ agecat*sex*group + offset(log(IndB)) + (1 | site), zi = ~ 0, disp= ~agecat + sex + group, family=truncated_nbinom2, data=dta)

But, then I wondered if it would be reasonable to assume all the missing are 0, impute them as 0, and run the same model as zero-inflated? I don't love this because it requires too many naïve assumptions. But it was a thought.

And then I began wondering if I should be modeling Indicator 1 (the ratio of A and B), rather than as A with B as the offset. A and B are extremely right skewed with a lot of low counts. The ratio is a funky but more centralized shape:

Both A and B look like this:

A/B looks like this:

So, I've got myself turned inside out. Really I think the decision is between leaving the data as truncated and either modeling A with B as offset as NB or A/B as truncated linear regression.

I thought I'd ask people's opinions. I can provide more info, if necessary.

I appreciate any responses I get!
 

fed2

Active Member
#4
And then I began wondering if I should be modeling Indicator 1 (the ratio of A and B), rather than as A with B as the offset. A and B are extremely right skewed with a lot of low counts. The ratio is a funky but more centralized shape:
I think that is not exactly what you are wanting because you would be treating B as covariate, ie ignoring its sampling variation. Another possibility you may not have considered is to create Y = A for half the dataset, then rest of rows B, with an indicator as to which count it is. ie, in long form.

Well with that down then you could fit the model and estimate the the ratio of A and B in standard fashion as the rate ratio associated with the indicator created above.

I guess it would still be truncated.
 
#5
I think that is not exactly what you are wanting because you would be treating B as covariate, ie ignoring its sampling variation. Another possibility you may not have considered is to create Y = A for half the dataset, then rest of rows B, with an indicator as to which count it is. ie, in long form.

Well with that down then you could fit the model and estimate the the ratio of A and B in standard fashion as the rate ratio associated with the indicator created above.

I guess it would still be truncated.
Thanks! So, leaning towards a truncated linear regression or truncated Gamma? Thanks for your input!
 
#6
I actually think a truncated beta regression would be the best fit.... The only option in R is with GLMMadaptive and I can't get it to work (prob user error). Since it's a proportion it's bound by 1, but 0 is truncated out....
 

fed2

Active Member
#7
Not sure, I would think truncated negative binomial. I think the important part is the log-link, it is likely to give very similar estimates in terms of point estimates regardless of distribution. Beta sounds good too. Another option might be an over-dispersed poisson. I don't know if that comes in truncated flavor, but I tend to like it because it gives less headaches on model fitting. Unfortunate for you I never use R for this sort of thing so I couldn't tell you the right functions.
 
#8
Thanks for your input. I know how to fit those in R, even as truncated. I'll compare them. I don't think anything is going to fit this data perfectly but I can't spend the next whole year fitting models...