# Logistic Regression problems

#### stame

##### New Member
Hello,

I'm stuck with a - i think very basic - problem, but I'm not getting any further for weeks now, thats why I finally decided to ask. I have some data which represent damage done by game in fields.

The damage was maped. If there where many small damages they were maped as one big one and the overall intensity was estimated (5 classes). I took a grid of 1x1m and for every cell I calculated the distance class (1 = 1-10m, 2=10-10m) from different structural parameters (forest, wheat, maiz, roads etc.).

Question 1 (not the most important one): According to the damage intensity I copied the rows to finally get a 1/5m² resolution.
Code:
Intensity 1 (0-20% damage)    → *5
Intensity 2 (21-40% damage)  → *4
Intensity 3 (41-60% damage)  → *3
Intensity 4 (61-80% damage)  → *2
Intensity 5 (81-100% damage) → *1
An example:
Code:
| Intensity | dist_forest | dist_maiz | dist_roads |
|1             |             50|             20|              70|
|2             |             40|             10|              90|
|5             |             20|             20|              40|
thats the outcome:

Code:
| Intensity | dist_forest | dist_maiz | dist_roads |
|1             |             50|             20|              70|
|1             |             50|             20|              70|
|1             |             50|             20|              70|
|1             |             50|             20|              70|
|1             |             50|             20|              70|
|2             |             40|             10|              90|
|2             |             40|             10|              90|
|2             |             40|             10|              90|
|2             |             40|             10|              90|
|5             |             20|             20|              40|
all no damage cells where copied 5 times.

Is that a valid method, or do I somehow rack the statistical output.

Question 2:
My procedure above no leads to data like this:
Code:
| damage  | dist_forest | dist_maiz | dist_roads |
|0             |             30|             20|              70|
|0             |             20|             10|              60|
|0             |             60|             10|              80|
|0             |             40|             70|              10|
|0             |             20|             60|              50|
|1             |             10|             10|              50|
|1             |             05|             20|              30|
|1             |             20|             30|              20|
|1             |             30|             20|              90|
|1             |             40|             10|              10|
the table is 250000 rows long.

Now I would like to know if any of the parameters has a significant influence if damage occures or not. Therefor I would use a binary logistic regression, in R like this:
Code:
glm(damage ~ dist_forest + dist_maiz, dist_roads, family=binomial(logit), data=data)
and now every parameter is significant and most of them even highly significant (***). What do I do wrong and how to make it better?

Btw.: my sample size (number of recorded damages) is in real not very big, only the 1x1m resolution makes the dataset big.

Stame

#### noetsi

##### No cake for spunky
Why are you copying the no damage cells five times? I don't understand that. I also don't understand why you are coverting an ordinal logistic regression into binary logistic regression. Why not simply run ordered logistic regression with five levels to the DV.

#### stame

##### New Member
Why are you copying the no damage cells five times? I don't understand that. I also don't understand why you are coverting an ordinal logistic regression into binary logistic regression. Why not simply run ordered logistic regression with five levels to the DV.
Well, with copying the no-damage cells I wasn't sure anyway, but the idea was to get the same "resolution".

And the idea with transforming the ordinal logistic regression into binary logistic regression is because the intensity is actually no parameter influenced by the game but by the way I maped the data.