Hello,
My response variables are derived from count data. I have the biennial winter population counts of a hibernating mammalian species (sites / n =226). My predictor variables are from two time steps (i.e. 1992 and 2001).
I created my response variables from the following to correspond with the predictor variables' time steps and lag:
Using those sets of years I found the mean and trend (SLOPE via Excel 2007) of the count data for each location. If any site have more than 2 missing values for the count data for a given response variable it was removed from analysis. (years were notated as 1, 3, 5, ... 19 for trend calculations).
I then natural log transformed the count data and and added a 1 to all values to account for any counts of zero. I then proceeded to find the mean and trend the same way as above.
I natural log transformed the data because that is what people often do with count data and thought it would improve my normality.
I plan on performing linear regression (lm() in R) between the response and predictor variables to look for any relationships. Does that sound good? Is it the right regression for the task?
I've read this article: Do not log-transform count data by O'Hara and Kotze (2010) (summary below). I was already curious if my approach was sound and this article has brought up some valid points.
How might you proceed with my dataset?
Thank you kindly,
Mike
My response variables are derived from count data. I have the biennial winter population counts of a hibernating mammalian species (sites / n =226). My predictor variables are from two time steps (i.e. 1992 and 2001).
I created my response variables from the following to correspond with the predictor variables' time steps and lag:
Code:
- For the predictor variables from 1992 I used the counts for years 1988, 1990, 1992, 1994 and 1996
- For the predictor variables from 2001 I used the counts for years 1998, 2000, 2002, 2004 and 2006
Code:
range skew kurtosis se n
slope92 7533.05 -2.71 12.71 124.25 64
mean92 76525 4.67 25.03 1093.86 93
slope01 9046.3 1.64 12.52 105.09 86
mean01 58727.2 4.26 19.4 801.71 118
Code:
range skew kurtosis se n
lnslope92 1 -1.02 1.76 0.02 64
lnmean92 7.91 0.46 -0.69 0.24 71
lnslope01 0.78 -0.22 0.99 0.01 86
lnmean01 10.37 0.12 -0.48 0.22 116
I plan on performing linear regression (lm() in R) between the response and predictor variables to look for any relationships. Does that sound good? Is it the right regression for the task?
I've read this article: Do not log-transform count data by O'Hara and Kotze (2010) (summary below). I was already curious if my approach was sound and this article has brought up some valid points.
How might you proceed with my dataset?
Thank you kindly,
Mike
Summary
1. Ecological count data (e.g. number of individuals or species) are often log-transformed to satisfy parametric test assumptions.
2. Apart from the fact that generalized linear models are better suited in dealing with count data, a log-transformation of counts has the additional quandary in how to deal with zero observations. With just one zero observation (if this observation represents a sampling unit), the whole data set needs to be fudged by adding a value (usually 1) before transformation.
3. Simulating data from a negative binomial distribution, we compared the outcome of fitting models that were transformed in various ways (log, square root) with results from fitting models using quasi-Poisson and negative binomial models to untransformed count data.
4. We found that the transformations performed poorly, except when the dispersion was small and the mean counts were large. The quasi-Poisson and negative binomial models consistently performed well, with little bias.
5. We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.
http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/full
1. Ecological count data (e.g. number of individuals or species) are often log-transformed to satisfy parametric test assumptions.
2. Apart from the fact that generalized linear models are better suited in dealing with count data, a log-transformation of counts has the additional quandary in how to deal with zero observations. With just one zero observation (if this observation represents a sampling unit), the whole data set needs to be fudged by adding a value (usually 1) before transformation.
3. Simulating data from a negative binomial distribution, we compared the outcome of fitting models that were transformed in various ways (log, square root) with results from fitting models using quasi-Poisson and negative binomial models to untransformed count data.
4. We found that the transformations performed poorly, except when the dispersion was small and the mean counts were large. The quasi-Poisson and negative binomial models consistently performed well, with little bias.
5. We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.
http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/full
Last edited: