BIOCHEMISTRY DATA WITH ORDINAL LIKERT GROUPING

#1
I am writing with regards to a data set for wine characteristics, for 1600 different wines, each with a value of pH, acetic acid concentration (mg/L) and quality from an ordinal likert 1-10 ranking scale from a taste test. The data for acetic acid is heavily negatively skewed and the pH is less skewed. Both are therefore non parametric. The quality ranking data is also discernibly non parametric.

I am looking for a statistical test to determine the significance of a relationship being: acetic acid content in each group. And also pH in each group.

I know from box plots that there is a clear relationship of decreasing acetic acid content mean and median, smaller IQR and less outliers, as the rank of quality increases. So I am expecting a significant relationship. as you look at the quality ranks 1-10 (of which the 1600 wines are categorised) by increasing , acetic acid concentration decreases. As does pH.

Any advice appreciated! Searching for this test has made me and my partner question every facet of our statistics understanding!
 

Karabiner

TS Contributor
#2
The data for acetic acid is heavily negatively skewed and the pH is less skewed. Both are therefore non parametric.
Keep in mind please that there is no such thing as non-parametric data. There are non-
parametric test, though, which have less ssumptions than parametric tests (which for
example might assume normal, poisson, gamma etc. distribution of errors). Whether
you should use them, does not simply depend on skewness.
I know from box plots that there is a clear relationship of decreasing acetic acid content mean and median, smaller IQR and less outliers, as the rank of quality increases. So I am expecting a significant relationship. as you look at the quality ranks 1-10 (of which the 1600 wines are categorised) by increasing , acetic acid concentration decreases. As does pH.
If I understand you correctely, you want to correlate acid and pH with
quality. If we assume that quality ranking is an ordinal scaled variable,
then the Spearman rank correlation coefficient rho would be an option.

Alternatively, you could contemplate to take the logartithm of acid, and
maybe of pH, and perform Pearson-correlations with quality.

The interesting part could be to determine how acid and pH jointly predict
quality. This could be done e.g. by using multiple linear regression with log(acid),
pH, and the product log(acid) * pH as three predictors.

With kind regards

Karabiner
 
#3
Keep in mind please that there is no such thing as non-parametric data. There are non-
parametric test, though, which have less ssumptions than parametric tests (which for
example might assume normal, poisson, gamma etc. distribution of errors). Whether
you should use them, does not simply depend on skewness.

If I understand you correctely, you want to correlate acid and pH with
quality. If we assume that quality ranking is an ordinal scaled variable,
then the Spearman rank correlation coefficient rho would be an option.

Alternatively, you could contemplate to take the logartithm of acid, and
maybe of pH, and perform Pearson-correlations with quality.

The interesting part could be to determine how acid and pH jointly predict
quality. This could be done e.g. by using multiple linear regression with log(acid),
pH, and the product log(acid) * pH as three predictors.

With kind regards

Karabiner
Thank you so much for your reply! This is exactly what I am looking for. And I appreciate the quick response immensely!

I have much to learn in the world of stats it appears!