Hello,
I am trying to make a factor analysis with 32 categorical variables. They can be ordered and found from a questionnaire that answer options. Different questions had different options. The options were coded as 1,2,3,4,5 etc. So, each number representing a category for a question is different in meaning from the other question. I mean this is not likert type where , for example, 1 means good for all questions, 2 means moderate for all questions, 3 means bad for all questions etc. It is like for one question:
"How often do senior management visit the wards to talk to staff?"
rarely or never ..................... 1
around once a year................... 2
around once a month.................. 3
around once a week................... 4
For another question:
"What is the average amount of training (per person) received by a management staff?"
Less than a day ..................... 1
Less than a week .................... 2
One to two weeks .................... 3
Etc.
Moreover I have missing values of two kinds. Some of them are simply because of non-response. The respondent did not filled up any answer option for that question. Some of them are due to questions of the following type:
5) Do you create formal work teams in your institution?
1="NO" 2="YES"
(Please skip question number 6 and 7 whose answer to this question is 1="NO")
6) How many members form the work team? (for example)
7) What is the criterion of selecting team members? (for example)
Now those who answered "NO" for question number 5 will not answer 6 and 7. He will again start from 8. This is another source of missing information or gap in the data set. Because of specially this type of missing values if I omit missings listwise a lots of information is missed.
My actual number of observations is 212, but it reduces to only 42 when I use na.omit(data).
So, I want to ask two things-
1) What kind of correlation I should put as an input for factor analysis? Some have suggested me polychoric correlation. But can I really make assumption of underlying normality for these categorical variables?
2) How do I adjust the missing values for categorical variables?
Waiting for your reply.
Best,
Blain Waan
I am trying to make a factor analysis with 32 categorical variables. They can be ordered and found from a questionnaire that answer options. Different questions had different options. The options were coded as 1,2,3,4,5 etc. So, each number representing a category for a question is different in meaning from the other question. I mean this is not likert type where , for example, 1 means good for all questions, 2 means moderate for all questions, 3 means bad for all questions etc. It is like for one question:
"How often do senior management visit the wards to talk to staff?"
rarely or never ..................... 1
around once a year................... 2
around once a month.................. 3
around once a week................... 4
For another question:
"What is the average amount of training (per person) received by a management staff?"
Less than a day ..................... 1
Less than a week .................... 2
One to two weeks .................... 3
Etc.
Moreover I have missing values of two kinds. Some of them are simply because of non-response. The respondent did not filled up any answer option for that question. Some of them are due to questions of the following type:
5) Do you create formal work teams in your institution?
1="NO" 2="YES"
(Please skip question number 6 and 7 whose answer to this question is 1="NO")
6) How many members form the work team? (for example)
7) What is the criterion of selecting team members? (for example)
Now those who answered "NO" for question number 5 will not answer 6 and 7. He will again start from 8. This is another source of missing information or gap in the data set. Because of specially this type of missing values if I omit missings listwise a lots of information is missed.
My actual number of observations is 212, but it reduces to only 42 when I use na.omit(data).
So, I want to ask two things-
1) What kind of correlation I should put as an input for factor analysis? Some have suggested me polychoric correlation. But can I really make assumption of underlying normality for these categorical variables?
2) How do I adjust the missing values for categorical variables?
Waiting for your reply.
Best,
Blain Waan