# Non-linear comparisons

#### SergeantMike

##### New Member
I have insect emergence data that comes from 6 different varieties of plants and the curve very much fits a Gompertz curve formula. I want to statistically compare the 6 to see if there is significant difference of the timing of the emergence between the cultivars. When I run an ANOVA on arcsine transformed data I get significance but that is using a linear model only and I am wondering if there was a test that allowed me to compare the emergences using the Gompertz model. Note the R2's for the Gompertz model ranges from 0.97 to 0.99 which is intriguing where the transformed data doesn't have that same R2's

#### katxt

##### Well-Known Member
What are you using to fit the Gompertz model? Can you post a typical full output from one fit.

#### SergeantMike

##### New Member
katxt,
Thanks for the response
I used R with the non-linear analysis package eznls. Here is the output:

Jerseymodel=nlsfit(Jersey_Round_1,model=10,start=c(a = 4000 , b = 1, c = 0.34))
> Jerseymodel
$Model [1] "y~a*exp(-b*exp(-c*x)"$Parameters
Count per_total
coefficient a 4.945101e+03 1.00448931
coefficient b 1.590095e+01 15.90094853
coefficient c 9.917900e-01 0.99179001
p-value t.test for a 0.000000e+00 0.00000000
p-value t.test for b 5.276940e-03 0.00527694
p-value t.test for c 3.390000e-06 0.00000339
r-squared 9.942000e-01 0.99420000
AIC 1.468117e+02 -40.22515597
BIC 1.484032e+02 -38.63357487

and the plot:

The actual data plot

I hope this is what you are looking for.

#### katxt

##### Well-Known Member
Thanks Mike. R is being as terse as ever. I was hoping for some standard errors and degrees of freedom. They must have been calculated to get the p values. Perhaps they just need to be asked for. Looking at the plots, I imagine that you had 6 data points with three parameters giving 3 df for the errors. See if you can find the SE's for the parameters.
How do you get the emergence time from a, b and c? kat

Last edited:

#### SergeantMike

##### New Member
Actually I had 12 datapoints. See the attached Excel file.

Geeze, you're forcing me back into my notes pretty far.

the formula is

t=ln(ln(y/a)/-b)/-c
where y/a is the proportion of the emergence you are looking for, so if you are looking for the time when 1/2 of the emergence has occurred then y/a=0.5.

I do not see any degrees of freedom or standard errors that I can dig out of the response.

#### Attachments

• 8.1 KB Views: 1

#### katxt

##### Well-Known Member
t=ln(ln(y/a)/-b)/-c
where y/a is the proportion of the emergence you are looking for, so if you are looking for the time when 1/2 of the emergence has occurred then y/a=0.5.
You can't really compare the t's unless you know how accurate your estimates of a, b and c are in each case. That is what the SE tells us. I'll think about things and get back later. kat

#### SergeantMike

##### New Member
There seems to be several options for the Gomphertz Curve. I am going to dig into those and see what I can find out about SE

#### katxt

##### Well-Known Member
I have been looking at your data and back engineering how R is doing it. It is ordinary least squares with some sort of search routine to minimize the sum of squares error. It's a bit weak on day 1 but quite good from there on.
Day Count CountHat
1 241 14
2 522 555
3 2287 2197
4 3412 3660
5 4680 4423
6 4795 4745
7 4845 4870
8 4887 4917
9 4902 4935
11 4920 4944
12 4923 4945
t(50%) turns out to be 3.159.
One way to get the standard error for t directly would be to do a jackknife analysis. Either write some R code (too hard for me) or just do the nonlinear regression 12 times, leaving out one line each time. Or there is probably an R function.
(I realize now the simply getting the SEs for a, b and c would not work well because they will be correlated.)
Once you have the SE for each set, then we can continue. kat

#### SergeantMike

##### New Member
Yes that is the number of days that I calculated also.

I know what you mean about the R code. It is something that I am trying to learn but it is often counter intutitave but I don't know if that is the stats jargon or just my poor programming skills.

Just one quick question, what is CountHat column?

#### katxt

##### Well-Known Member
CountHat is the count predicted by plugging x (days) into the equation. In stats the symbol for the predicted value is often written with a little hat (a circumflex is it?) over the symbol.

#### SergeantMike

##### New Member
Is Bias the same as SE with Jackknife results?

Jackknife statistics
Estimates Bias
a 4937.931475 7.16936606
b 12.842378 3.05857135
c 0.959727 0.03206308

------
Jackknife confidence intervals
Low Up
a 4868.2965580 5007.566392
b -5.8231224 31.507878
c 0.5593873 1.360067

#### katxt

##### Well-Known Member
These are the sort of things, but they are much wider than I thought. Also b can never be negative in your situation or else the counts would drop down to 5000 not up to 5000, so something funny there.
Can you calculate the t(50%) for each group and we'll see how close they are. kat

#### SergeantMike

##### New Member
Ok, I think I have it for all six varieties. The numbers are different because I used a more complete dataset, and there are the same number of data points for each variety, even if that value was zero. And the Results are
Row │ variety a b c t50
│ String Float16 Float16 Float16 Float16
──┼──────────────────────────────────────────────────
1 │ Aurora 668.0 14.56 0.4888 6.23
2 │ BlueJay 3014 7243 0.7944 11.65
3 │ Duke 2240 28910 0.8096 13.14
4 │ Jersey 11440 3.742 0.2094 8.055
5 │ Liberty 1057 9.13 0.3445 7.49
6 │ Northland 1033 10.836 0.328 8.38

This was generated by a little Julia script I wrote, hence the wonkyness of the table.

So what I was thinking, and tell me if I am thinking wrongly, that if I calculated the CountHat for each variety at each time point and subtracted that from the actual point and squared it that would give me the MSE basically, (y-ŷ)². From there I can calculate the SE by simply taking the square root.

#### katxt

##### Well-Known Member
if I calculated the CountHat for each variety at each time point and subtracted that from the actual point and squared it that would give me the MSE basically, (y-ŷ)². From there I can calculate the SE by simply taking the square root.
I'm afraid not. What the R non linear routine is doing is taking some initial values for a, b and c, working out the yhat or CountHat for each point (incidentally, how did you get that hat in (y-ŷ)²?), then subtracting and squaring as you have suggested, and adding to get SSE (sum of square errors). Then it mucks about with the values of a, b and c until SSE is the smallest possible value. Then it prints a, b and c and calculates t50.
You want to compare the various t50s so you first need to know how accurate each estimate is, that is you need the SE of each t50. In some special simple situations there are formulas for the SE. Not here, alas.
You do have estimates of the SEs for a, b and c done by the jackknife method, but I have to say I don't trust them much but even so. they will be correlated and can't be used to give you SE t50.
You can do your own jackknife estimation for SE t50.
Take Aurora with say 12 data points. You already have found t50 for the full data set.
Remove point 1. Do the same regression on the reduced 11 number set. Find a, b and c and so t50#1. Record t50#1
Replace point 1 and remove point 2. Do the regression on the reduced 11 number set. Find a, b and c and so t50#2. Record t50#2
Replace point 2 and remove point3. Get t50#3. And so on to t50#12. Miss a different one out each time.
Find the SD of the 12 T50s and multiply by sqrt(11^2/12). This is your SE t50 for Aurora.
Whew. https://www.statisticshowto.com/jackknife-estimator/ The formula is equivalent to last instruction.
When these are done, you can start to compare the t50s.

#### SergeantMike

##### New Member
Whist I digest that amount of information i will tell you that if you have a windows 10 machine, if you go to Windows Accessories, there's an app called Keycaps. If you one that you will see all manner of "hidden" characters. All you do is highlight it, click select, then click copy. Then go your document and paste the character in.

If you have a Mac I'm afraid I can't help you but there is most assuredly an equilvalet app

Thank you for the information. It's going to take a bit to digest it all. I will likely develop more questions.

#### katxt

##### Well-Known Member
Great. Thanks for that. For ever learning. kat

#### SergeantMike

##### New Member
Ok, Kat. It took a while but here are the results of the SD calculations. Duke is a clearly out of whack and I need to go back and look at the original data but the others seem pretty consistent. I am assuming the units are days here.

Variety Aurora Standard Error: 0.03783832826315897
Variety BlueJay Standard Error: 0.027158613730570626
Variety Duke Standard Error: 105.92287905900504
Variety Jersey Standard Error: 0.0468244596798439
Variety Liberty Standard Error: 0.022382778209230184
Variety Northland Standard Error: 0.05161874603472565

#### katxt

##### Well-Known Member
Now, assuming that all your calculations for each t50 and its SE are correct, we can test to see if there is a significant t50 difference between any two groups.
Aurora vs BlueJay
Difference = 11.65 - 6.23 = 5.42
SEDifference = sqrt(0.0378^2+.0271^2) = 0.0465
t =5.42/0.0465 = 117 (this t is the statistics t of the t test, not the emergence time)
df (assuming 12 points for each) = 2*(12-1)=22
p =TDIST(117,22,2) in Excel = 3x10^-32. Very small p, very large (certain) chance of a difference.
One problem is the cutoff p for no significant difference. Commonly p = 0.05 is used but you have many tests to do. By the time you have tested each against the others you have 15 tests so your chance of a false positive is much more than 0.05. I would use p = 0.01 as a test. p<0.01 different. p>0.01 not proved different.
Just eyeballing your results, I would say that all are statistically different even though the last three may not be practically different'
Cheers, kat

#### SergeantMike

##### New Member
This may be a job for R but I will look into that.

In the data I have been analyzing has an N=22.

The analysis you did above makes sense when looking at the attached graph. There seems to a break out into two groups, with Duke being the one furtherest away from the "main body?

Now here is the 64,000 reference question. Is the analysis you have shown me, more accurate or robust than doing an arcsine transformation then running an ANOVA using a linear model? I think it better describes what is seen but there is a lot of transforming going on.

#### katxt

##### Well-Known Member
At least the Gompbertz analysis has a theoretical model behind it though looking at the graphs, the fit to that model is nowhere near as close as your first post implied. Any thoughts on the nasty kinks? No model is going to fit those.
What are you actually looking to prove that isn't plain from the graphs?