I was wondering how to interpret the outcome of a multinomial logistic regression with a categorical outcome variable, when there's no 'natural' meaningful category one could set as reference.

Example (hypothetical):

People are presented brief excerpts from different songs, say Song A, B, C, and D.

They hear an excerpt and then have to tick one of four boxes with the songs' names: Song A, Song B, Song C, Song D

There are various excerpts from each song (so recognition may be good for some (if reflecting well-known chorus) and poor for others). There are many iterations (trials) per participant.

Questions:

1. Are people able to identify the songs (Or: Is each song recognised above the chance level of 25%)?

2. Do the songs differ in their recognition accuracy?

An easy and straightforward approach would be to code each response as either correct (1) or incorrect (0), depending on whether "presented song" equals "answered song" or not, calculate percentage correct (hit rate) for each song, compare those for each song versus 25% (one-sample t-test, are they recognised above chance?) and then compare the songs for their hit rates (Song A vs B, A vs C, etc; paired-sample t-tests)

However, these are percentages / frequencies / proportions, which are bounded between 0 and 1 (or 0% and 100%) which violates some assumptions of t-tests. Arcsine transformation is a little out of fashion, and multinomial logistic regression is suggested instead.

However, I find it difficult to interpret the outcome. I presume the regression model is set up in the following way:

- Predictor/Independent variable: Presented song (= a categorical variable with 4 levels)

- Outcome/Dependent variable: Answered song (= categorical variable with 4 levels)

In SPSS, I would go to Analyse => Regression => Multinomial logistic... and specify "Answered song" as Dependent and "Presented Song" as Factor (not as Covariate, because it's not a continuous variable). Before doing this, I have re-coded the songs as 0 - 3 for the four songs, for both, IV and DV.

For the Dependent variable, one has to specify a Reference Category (first, last, custom). Let's say we use Song A for this. In textbooks, virtually always binary outcomes are used, which are simply defined as 0=No outcome, 1=Desired outcome or alike, for which the interpretation is way more straightforward. But here, I struggle with the interpretation.

So I presume the overall Model Fitting Information provides information that the regression model can explain a significant amount of variance, which means that knowing which song has been presented is informative for predicting the answer of the participant. This should be roughly equivalent to a significant overall recognition rate (averaged across all 4 songs).

But then I'm a little lost, in the Parameter Estimates table we get 3 "blocks" of output, one for each DV level (aside from the reference), so we have a block for Answered Song 1, 2, and 3, but not 0 as the reference.

In each block, we have five rows, one for Intercept, and one for each Hypothesis, i.e. "Presented song = 0", "Presented song = 1", etc. For all "DV-blocks", the last category (Presented song = 3) is not calculated ("This parameter is set to zero because it is redundant").

Naturally, I would say that I read the Parameter Estimates table a little like a cross table: For the "Answered Song (DV) = 1", I check whether the "Presented Song = 1" parameter is significant. If so, then knowing that Song 1 has been presented helped significantly to predict that the answer was Song 1 (= correct answer). Thus, the song could be recognised significantly above chance level. Then the same for "Answered = 2", checking for the parameter estimate of "Presented = 2". For "Answered = 3" it doesn't work, because "Presented = 3" is set to zero because it is redundant (see above). The remaining combinations are representing the confusion matrix, so for example if "Answered = 2" and "Presented = 1" has a significant parameter estimate, then it means that Song 1 has significantly often been confused with Song 1.

However, I think that this is not correct, because it ignores the fact that all presented data are in relation to the reference category (Answered Song 0). My understanding is that if in the "Response/DV = 1" block the Intercept is significant, then this means that the recognition rate for Response 1 is significantly higher than for Response 0?? This doesn't make sense to me. And how then come the further rows of "Presented = 0" etc are considered in the interpretation? How can I make an inference about "Answered Song 0" only? How can I make an inference about "Presented Song 3" (which was set to 0 because it is redundant)?

I've searched quite a bit and looked for textbooks, etc, but I haven't come across such a case. They all use way more simplified examples and just say "more complex examples work equivalently" or alike, but I can't see the equivalence.

Any help would greatly be appreciated!

(And apologies for the long post)

Kind Regards,

Andre