Nonparametric data analysis

Mykola

New Member
I'm asking for advice. Topic of my research is brain iron accumulation. My task is the next . I have one dependent variable that is cathegorical and binominal(patient has the pathology or does not). And six independent variables that are continuous and nonpatametric.
The question is :how can i predict the dependent variable using my data.
Thanks

GretaGarbo

Human
Search for logistic regression or logit regression (the same thing). Probit regression is very similar.

(That is the most natural choice. But it is not non-parametric analysis.)

Karabiner

TS Contributor
And six independent variables that are continuous and nonpatametric.
A thing like "nonparametric variables" does not exist. Instead, there are non-parameteric statistical analyses (analyses which do not make certain assumptions).

With kind regards

Karabiner

Mykola

New Member
Search for logistic regression or logit regression (the same thing). Probit regression is very similar.

(That is the most natural choice. But it is not non-parametric analysis.)
Is there any way to do the analisis without converting. The distribution of data to normal? And if there is now wayto do sow, how can i convert them?(transform to quintels?)

Karabiner

TS Contributor
What do you mean by "data"? Do you mean the independent variables (predictors)? No "parametric" test assumes them to be normally distributed.

If you mean the dependent variable (DV): there is no need for a dependent variable to be normally distributed. For parametric models, instead it is the distribution of the model's prediction errors (residuals) which matters, not the distribution of the DV itself.

And if your sample size is large enough (say, n > 30 or 40 or so), then even normally distributed residuals are not necessary for "parametric" analyses.

With kind regards

Karabiner

Mykola

New Member
What do you mean by "data"? Do you mean the independent variables (predictors)? No "parametric" test assumes them to be normally distributed.

If you mean the dependent variable (DV): there is no need for a dependent variable to be normally distributed. For parametric models, instead it is the distribution of the model's prediction errors (residuals) which matters, not the distribution of the DV itself.

And if your sample size is large enough (say, n > 30 or 40 or so), then even normally distributed residuals are not necessary for "parametric" analyses.

With kind regards

Karabiner
Independent variables are not normaly distributed. But i can transfere them to quintiles.

GretaGarbo

Human
Independent variables are not normaly distributed. But i can transfere them to quintiles.
There are no (distributional) assumptions about the independent variables in regression. The independent variables are assumed to be fixed values (and thus have no distribution).

In logistic regression the dependent variable is assumed to be binomial distributed. There is no assumption about the normal distribution and no need to try to transform to normal distribution.

In general:
Some people seems to believe (after having read an elementary course) that there are only two possibilities; either normal-distribution-methods or non-parametrics. That is wrong. There are many parametric distributions (that are skewed and so on) that does not look like the normal distribution (e.g. binomial distribution, Poisson distribution, exponential distribution).

Mykola

New Member
There are no (distributional) assumptions about the independent variables in regression. The independent variables are assumed to be fixed values (and thus have no distribution).

In logistic regression the dependent variable is assumed to be binomial distributed. There is no assumption about the normal distribution and no need to try to transform to normal distribution.

In general:
Some people seems to believe (after having read an elementary course) that there are only two possibilities; either normal-distribution-methods or non-parametrics. That is wrong. There are many parametric distributions (that are skewed and so on) that does not look like the normal distribution (e.g. binomial distribution, Poisson distribution, exponential distribution).
You are absolutely right about my statistic skils,
BUT when i was comparing two groups (one of them had the pathology and the other did not) using Mann Whitney test i've got got 3 independent variables that were differtent in two groups, and the
difference was statistically significant (in the begining i had six independent variables) .
So now i whant to analize the power of the ifluence of each of those statistically significant independent variable (or find the coeficient of correlation , or represent it as the odd ratios or some other mystic **** )on the depemdent variable. And i think that it will be rather small because there are at least 10 more independent variables that can also influense the dependent variable that i study. For example as my study is connected with brain iron deposition i have some patients (thete were much fewer of them ), who had a lot of iron in their brain but didn't have any signs of pathology(and i think that's the reason of skewenesss)because of the other independent variablest that i dont have.
So, if you will give me an advice or just some link that will help me to dig out some gems out of all the mud that i'm digging in, i'll be very greatfull.

GretaGarbo

Human
The idea is that the dependent variable (DV) is explained by the independent variables (IV1, IV2, ..., IV6). So that the "arrow" goes from the IVs to the DV.

DV <--- IV1, IV2, ...,IV6

So that pathology or non-pathology is explained by e.g. age and exercise etc. But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age. That does not make sense. Mann Whitney is just irrelevant here. (It is by the way sensitive to "spread", so it certainly has its assumptions (that is often violated) .)

The correlation is by the way a parameter. If you want that you do parametric estimation.

Go ahead and do a multiple logistic regression. Then you will also get an odds ratio.

Mykola

New Member
The idea is that the dependent variable (DV) is explained by the independent variables (IV1, IV2, ..., IV6). So that the "arrow" goes from the IVs to the DV.

DV <--- IV1, IV2, ...,IV6

So that pathology or non-pathology is explained by e.g. age and exercise etc. But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age. That does not make sense. Mann Whitney is just irrelevant here. (It is by the way sensitive to "spread", so it certainly has its assumptions (that is often violated) .)

The correlation is by the way a parameter. If you want that you do parametric estimation.

Go ahead and do a multiple logistic regression. Then you will also get an odds ratio.
I was performing M-W test just to detect are there any differences among two groups because at firs it was only hypotesis that the parameters that i used to reveal intergroup groups can differ, whats wrong with that?(For example i could compare the length of fingers among people with heart desease and without- that would be a nonsense)
In all medical scientific articls ive read the pathological state is presented as dependen variable

Last edited:

GretaGarbo

Human
In all medical scientific articls ive read the pathological state is presented as dependen variable
Yes, I agree. But with Mann Whitney the pathological state is the independent variable. So when someone get sick you try to explain how that changes his age.

Karabiner

TS Contributor
But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age.
But if you do a Mann-Whitney test here, you just investigate whether the two groups differ with respect to age. I.e. whether there's an association. The test itself does not say anything about influences. That is a matter of design and of interpretation.

E.g. one can conduct a radomized experiment with, say, 7 groups receiving different dosages of a toxic agent, and measure whether subjects (plants) are killed or not during the experiment. Then a M-W test can be used to investigate whether those plants which were killed had received higher dosages than the survivors. If yes, then the interpretation is straightforward (IMO): higher dosages here led to more deaths.

With kind regards

Karabiner

ondansetron

TS Contributor
In all medical scientific articls ive read the pathological state is presented as dependen variable
Here is a general comment, not particular to your post. Medical literature doesn't exactly use the right methods at the right time or in the right way (nor do they recognize statistics as something that does not follow a cookbook approach). So, the argument that medical publications use one method or do something one way is a poor argument. There is a lot of "oh, this group published with this analysis, that must be the right way to do it."

hlsmith

Omega Contributor
To piggyback on ondan's comment; many times reviewers of submitted papers will encourage authors to change their analyses to something the reviewer is more familiar with. A heads-up, you do not have to change your analyses if you can justify their appropriateness. But if two approaches are comparable in generated output, at times it may not be a bad idea to mirror existing literature to make your methods a little closer to theirs in order to be able to compare them better. But as ondan wrote, there are many ways to do things and not all are correct.

GretaGarbo

Human
But if you do a Mann-Whitney test here, you just investigate whether the two groups differ with respect to age. I.e. whether there's an association. The test itself does not say anything about influences. That is a matter of design and of interpretation.
I agree that there is something in what Karabiner says. I guess that in many cases the Mann Whitney would give the same result (sig/no sig) for MW as a logit model.

But the null hypothesis in Mann Whitney is P(x1 > x0) = 0.5, where x0 is the age of those who are not sick and x1 is the age of those who are sick. But the age (or the relevant IV in this case) is not normal according to OP and possibly skewed and heteroscedastic, and Mann Whitney is sensitive to that (Search for Fagerland-Sandvik).

E.g. one can conduct a randomized experiment with, say, 7 groups receiving different dosages of a toxic agent, and measure whether subjects (plants) are killed or not during the experiment. Then a M-W test can be used to investigate whether those plants which were killed had received higher dosages than the survivors. If yes, then the interpretation is straightforward (IMO): higher dosages here led to more deaths.
Suppose that there is just one designed variable and it is only on high/low levels. That would be a very unnatural Mann Whitney test.

Note that the OP said:
I have one dependent variable that is cathegorical and binominal(patient has the pathology or does not). And six independent variables
In contrast, one can evaluate all seven IV:s with the dependent variable sick/not sick in a logit model.

But this is about optimal inference. It is known that the dependent variable is binomial. Logit is estimated with maximum likelihood (ML). ML gives consistent and efficient estimates. How could anything be better than maximum likelihood? And by Neyman Pearsons lemma it would give the most powerful test.

Mykola

New Member
GretaGarbo But the null hypothesis in Mann Whitney is P(x1 > x0) = 0.5 said:
age[/U] of those who are not sick and x1 is the age of those who are sick. But the age (or the relevant IV in this case) is not normal according to OP and possibly skewed and heteroscedastic, and Mann Whitney is sensitive to that (Search for Fagerland-Sandvik).
In my research i have two different groups.one has the pathologic condition, another doesnot. Every of those groups can be described by 6(or more) parameters and these parameters in my case are amounts of iron deposition in differen brain regions. THe data of brain iron deposition is skewed, but the result of MW test showed the the brain iron deposition (in somebrain regions)differs statistically in those two groups, so i can use them for differentiating my two groups/

THE QUESTION

1/wHATS WRONG WHIS MY ANALISYS?
2/ cAN I DO Something else besides this?
Thanks alot/

Last edited:

Mykola

New Member
Here is a general comment, not particular to your post. Medical literature doesn't exactly use the right methods at the right time or in the right way (nor do they recognize statistics as something that does not follow a cookbook approach). So, the argument that medical publications use one method or do something one way is a poor argument. There is a lot of "oh, this group published with this analysis, that must be the right way to do it."
tottally agree,and iv seen those stuff many times (like the patients in the first group had 20+\-25 teeth) but as an examples i use only those articles where the main authors hi inedx is above 30, so i think theyre trying to use statistic in a correct way.

Last edited:

ondansetron

TS Contributor
tottally agree,and iv seen those stuff many times (like the patients in the first group had 20+\-25 teeth) but as an examples i use only those articles where the main authors hi inedx is above 30, so i think theyre trying to use statistic in a correct way.
If you are talking about an impact factor or something similar when you say "index" I can tell you that it doesn't matter as much. I've seen top journals with bad stats in articles by prominent universities and prominent researchers. It be very careful of equating publications with "good" statistical practice and interpretation.

Mykola

New Member
If you are talking about an impact factor or something similar when you say "index" I can tell you that it doesn't matter as much. I've seen top journals with bad stats in articles by prominent universities and prominent researchers. It be very careful of equating publications with "good" statistical practice and interpretation.
i can show u the data, will anybody try to help?