What suitable statistical test would be applied in this situation?

#1
Hi,

I just need some reassurance in regards to whether what I am thinking is right? For some reason , I don't think I am .

I am planning to assess the mean difference between groups. In particular, my aim is to determine what features present in users' posts ( total word count, words related to fruit, words related to colour, words related to cars) differentiate strongly between users' personality traits (openness, neuroticism, and extraversion). I am using a software that can automatically detect this words and outputs the frequencies that these words appear per post.

From my understanding ( please correct me if I am wrong):
Features is a continuous numeric variable
Personality Traits is a categorical variable

My dependent variable is Features, while my independent variable is Personality Traits.

Therefore, ANOVA is a potential statistical test to for example determine the difference in means of words related to fruit between personality traits? OR should I use Multinomial Logistic Regression?

Thank you in advance.
 

noetsi

Fortran must die
#2
It depends on how features is actually counted. If its a frequency then yes it would be interval. But I think this might be count data in which case you might have to do Poisson regression rather than regular regression (although I am not an expert in that so this could be wrong).

Personality trait is a categorical variable. It is probably nominal since it can not be ordered (this is greater than that).

I don't think counts of data would be logistic regression, unless somehow you are making it 1 or 0.

You are I think making an assumption that all uses of a word are essentially the same, so context does not matter only counts. Is that true?
 

Karabiner

TS Contributor
#3
Personality Traits is a categorical variable
It depends on how you measure it. Are they really measured or stored as yes-no variables (such as "person is open/person is not open")?
Usually, the big5 personality traits are measured using continuous scales, therefore you would have interval scaled data.

My dependent variable is Features,
Do you conceptualize as 1 variable, which is measured by 4 aspects? Or don't you rather want to analyse 4 distinct features?

With kind regards

Karabiner
 
#4
It depends on how features is actually counted. If its a frequency then yes it would be interval. But I think this might be count data in which case you might have to do Poisson regression rather than regular regression (although I am not an expert in that so this could be wrong).

Personality trait is a categorical variable. It is probably nominal since it can not be ordered (this is greater than that).

I don't think counts of data would be logistic regression, unless somehow you are making it 1 or 0.

You are I think making an assumption that all uses of a word are essentially the same, so context does not matter only counts. Is that true?
Hi @noetsi thank you for your advice. That's correct context does not matter, I am only assessing for example how many times users that are under the neuroticism, openness, neuroticism, and extraversion use the words related to fruit. And then assess the difference of mean words related to fruit to identify whether there are significant differences between personality traits.
 
#5
It depends on how you measure it. Are they really measured or stored as yes-no variables (such as "person is open/person is not open")?
Usually, the big5 personality traits are measured using continuous scales, therefore you would have interval scaled data.


Do you conceptualize as 1 variable, which is measured by 4 aspects? Or don't you rather want to analyse 4 distinct features?

With kind regards

Karabiner
Hi @Karabiner thank you so much for this. Good point that I've forgot to add. So basically, users' pots were first classified as personality traits ( basically assessed through a multiple question survey were participants had to select one personality trait according to the post's content . After that I am using a software that assesses how many times each feature appears in each category. In this way, I will be able to assess whether mean words per features are significant different between personality traits (e.g. whether mean words related to fruit are significant different between personality traits).

I am not using the big5 personality traits I have just used it here as an example for my question. Hope this clarifies it.

Kind Regards
 

Karabiner

TS Contributor
#6
Sorry, but I must admit that I am not able to understand your design and your measurements.
Probably too far away from my own areas of study.

With kind regards

Karabiner