Analysing relationships between numerical data and non- numerical data.

Hi all, new user here hoping to become part of the community,

I have recently purchased a book reccommended to me - Discovering statistics using SPSS (Field, 2013) with a view to knuckling down on statistical analysis of my data that i have collected for a Master's project.

I have various sets of data on the basic chemistry of water (e.g. pH, temperature, conductivity) and levels of some contaminants in the water (given in mg/l). I want to look at the impact that vegetation cover of a streams catchment has upon the quality of water in the stream.

I have used GIS to obtain data on both the soil type, vegetation type and underlying geology, and i would like to see whether these have any relationship with the levels of contaminants recorded. However one of the major issues that i see is that the data obtained on vegetation, soil, geology etc is not numerical, i.e. geology of stream a1 is granite, vegetation of stream ce2 is primarily blanket bog etc.

How would i turn this data into numerical data? What types of data do i have (e.g. ratio, continuous, etc) and what tests are the best to test for relationships between these two data sets?

Any help is greatly appreciated



Less is more. Stay pure. Stay poor.
It sounds like you are going to compare continous variables by categorical variables (groups). Does this seem right? For example, pH level versus type of vegetation.

It is best to first decide what the dependent variable is, or otherwise what variable is going to be predicted. For example, you may say vegetation (categorical) predicts the pH (continous). You can then begin to explore options for categorical x continous analyses (e.g., t-test (if two categories), ANOVA (if greater than two categories), or Wilcoxon rank sum or Kruskal Wallis if continous variable's residual are not normally distributed).

Let us know if you have questions or if I misinterpreted what you are planning on doing.
Thanks for your reply.

If i say the Vegetation type is continuous, then i could test pH, conductivity, temperature and then different concentration of contaminants to see whether it is affected by vegetation type etc, you are saying that it would be best to run ANOVA tests, if i understand what you wrote correctly?

IF this is correct, then i will run ANOVA tests on the group, or t-tests if i run just the 2 variables together.

This throws up another couple of questions:
I collect data on a monthly basis and i am looking to see if, overall, the vegetation dictates the qualities of the water. Would i compile each data from each month into one large test (As i am not factoring month into the equation - simply want to know whether vegetation impacts upon, for example, pH)?

What figures am i looking for in ANOVA tests. E.G. what does the 'f' figure denote, and how do i work out from the test results whether there is an impact or whether there isn't an impact?

Many thanks