# Collinearity using clogit

#### JMH11

##### New Member
Hi,

My statistical support has gone on long term sick, and I am kind of stuck mid-analysis, so was hoping someone here could help me. I am a health professional, and my basic grasp of stats is not helping me with this problem! I did do a search for my question, both via Google and on the search bar here, but apologies if I missed a thread that answers my query already.

I have conducted a discrete choice experiment. The experiment asks people to compare two health screening tests. Focus group work identified four key attributes - accuracy, level of information, invasiveness and follow-up. The focus group work identified that some people preferred accuracy presented as sensitivity, while others preferred positive predictive value (PPV). Therefore participants were randomly allocated to receive choice sets of either sensitivity or PPV.

A factorial design informed by Burgess and Street was used, and each participant was given 8 choice sets. A binary choice was used as it was felt that a ‘no screen’ option was not applicable in this case. Validity of responses was tested in two ways - internal consistency and confidence ratings.

Alongside the main population comparisons (recruited from currently healthy, previously exposed and health professionals), I intend to conduct subgroup analysis on certain factors for the non-health professionals. This includes anxiety, perception of risk, and prior exposure status, all of which were assessed during the questionnaire.

I have attached a jpeg of how I have input the data into Stata (version 12). ChoiceNum reflects the choice set of that response, while C_ID is the cumulative choice set for all participants. Accuracy has one column, reflecting the accuracy of the test, while the categorical data of the other attributes have a column each. The 'sure' column indicates the confidence rating of that participant.

The 300 odd participants have been entered in the same way, using seperate databases for each of the accuracy conditions. I have attempted to use the clogit command to analyse the data set, using the following command:

clogit choice accuracy info_group info_numerical test_medhx test_blood follow_asreq follow_setschedule, group(c_id)

The results read out advises that info_numerical, test_blood and follow_setschedule have been omitted because of collinearity - I assume this is bad and that I have done something wrong??

I am hoping someone can spot something glaringly obvious and help, because I have been trying to find the answer for a couple of weeks now!

Many thanks in advance, and seasons greetings!
James

#### RedOwl

##### New Member
Try it again adding -cluster(id)- to the options to
indicate that you are clustering on individuals (id).
(That assumes that ID is labeled "id" in your data set.)

Code:
clogit choice accuracy info_group info_numerical test_medhx test_blood follow_asreq follow_setschedule, group(c_id) cluster(id)
Please let us know if that works.

#### JMH11

##### New Member
Hi there, thank you for taking the time to reply - unfortunately, no joy, same collinearity error appears.

#### RedOwl

##### New Member
OK, I had hoped that would fix the within-subjects collinearity if that were the sole problem.
I still think that was part of the problem and still recommend you maintain the -cluster(id)-
option.

That said, your problem may be more complicated and is possibly related to the issue
described in the Stata FAQ at
http://www.stata.com/support/faqs/statistics/within-group-collinearity-and-clogit.
See especially sections 2.5 and 3 in the FAQ.

Hope that helps.