Non parametric test for control/ test dataset provenient from review

#1
I have been working with a dataset that contains a pair of control/ test variable that represents the anthropization effect on water quality. This dataset was obtained through a literature systematic review, in which I reunited published data that showed natural and anthropized (impacted) environments. My main issue in analysing this dataset is that the data are not normally distributed, and my samples are not independent, since I have registers of before/ after impact combined with situations where the "control" environment was used as a reference for two different impacts, generating a dependence of the same "controls" for two results at the other column. In this case, neither my rows nor columns are completely independent and at this point, I do not have enough info to separate the dataset.
I have been looking for nonparametric solutions to compare the means of the two groups but from what I have found Wilcoxon and Kruskal- Wallis tests assume that the data are independent.
Is there a test that suits my goal?
 
#4
Data do not have to be normally distributed for statistical analyses.

Regarding the rest of your contribution, I'm afraid that I do not understand your description.

With kind regards

Karabiner

Basically, I need to compare two groups with a non-parametric test that has no data independence assumptions and I don't know what test can suit my goal.
 
#6
So you have n (how many?) controls, and there are different kinds of impact (how many different kinds are there?
and/or does the kind of impact matter here at all?). And there are cases (how many?) with just 1 impact, and there
are cases (how many?) with two impacts?

With kind regards

Karabiner
 
#7
So you have n (how many?) controls, and there are different kinds of impact (how many different kinds are there?
and/or does the kind of impact matter here at all?). And there are cases (how many?) with just 1 impact, and there
are cases (how many?) with two impacts?

With kind regards

Karabiner

I have a dataset with paired numerical variables [control\ impacted] and a categorical variable "impact" that includes 17 different impacts. Each impact category has at least 8 observations but some have 30. I want to know if the impact modified the natural environment. Each impact is different from the other, which means that I can't combine the impacts with a few cases to increase the dataset and I need the effect of each one of the impacts.

Example:
| Control | Impacted | Impact |
| 2 | 10 | mining |
| 2 | 5 | mining |
| 3 | 50 | sewage |
| 3 | 40 | urbanization |
| 5 | 2 | impoundment |

In the example, sewage and urbanization came from the same study and the natural system is the control for the two impacts. Another situation of dependence is in "impoundment"in which the control is before impoundment and impacted is measured at the same place, but after impoundment.
 
Last edited:
#8
According to your table, you do not include "pre-impact" values, but just determine the magnitude of effect of the impacts?
So, I am not sure what you mean if you say that you want to compare "means of two groups". There is a list of values, and some
of these observations belong to the same environment/study.

And it is not clear how the two impacts in the concerning studies are related - did they take place at the same time? As a sequence?
Overlapping? Does the effect of sewage in case #3 affect how much effect urbanization could have, or vice versa?

With kind regards

Karabiner
 
#9
According to your table, you do not include "pre-impact" values, but just determine the magnitude of effect of the impacts?
So, I am not sure what you mean if you say that you want to compare "means of two groups". There is a list of values, and some
of these observations belong to the same environment/study.

And it is not clear how the two impacts in the concerning studies are related - did they take place at the same time? As a sequence?
Overlapping? Does the effect of sewage in case #3 affect how much effect urbanization could have, or vice versa?

With kind regards

Karabiner

I have two kinds of situations in the same dataset: In some observations, the "control" corresponds to the pre-impact value, and in others, it corresponds to a natural comparable example.

I say I want to compare the means because some tests use the means of all observations in the variables to compare them, but actually, this is just a way to try to explain what I need. A test that compares the groups and tells me if they are equal or different would attend me.

The impacts are not necessarily related, one does not interfere with the other, and the organization in the same column is just for categorization.
 

katxt

Well-Known Member
#12
It appears that you hope to include all the different impact types into the same data set. If so, I can't really see the difficulty you are having with the combining suggestion in post #3.
Also, it is the differences that need to be well behaved, rather than the before/after data itself. What do these differences look like?
 
#13
I don´t necessarily need all the impacts in the same dataset. I could separate smaller datasets for each impact.

The reason why I can´t combine effects is that they are observations of different impacts. There is not a case when the same control is used for two observations of the same impact.
 

katxt

Well-Known Member
#14
I think there's something about the situation that I haven't grasped.
So, imagine that each pair had a different control and the data were normal. What would you do?
 

katxt

Well-Known Member
#16
That sounds like you are happy to treat all matched pairs as part of the same population, regardless of the type of impact. If that is the case, you can average the results from a single control and use either a paired t test if the differences are normal(ish) or the Wilcoxon signed-rank test if they are not.
Or another option, design a randomization test that preserves the connection between the common control group.
 
#17
That sounds like you are happy to treat all matched pairs as part of the same population, regardless of the type of impact. If that is the case, you can average the results from a single control and use either a paired t test if the differences are normal(ish) or the Wilcoxon signed-rank test if they are not.
Or another option, design a randomization test that preserves the connection between the common control group.
Thank you!