# Adjusted Chi-squared test for clustered binary / categorical data

#### ugm6hr

##### New Member
I'm looking for some assistance in statistical analysis with R (ideally), but also some general stats advice. This follows from a review which identified the need for me to adjust for clustering of relatives within family groups in my data set.

I am investigating cardiac phenotypes (I'm a cardiologist) in blood relatives (individuals) of sequential cases of premature sudden death. The cases of sudden death are categorised into 2 groups: 1. explained sudden death; 2. unexplained sudden death.

The groups are unmatched (sequential cases). Within each group, individuals are clustered in family subgroups / strata (of between 1 and 10 individuals).

All individuals / relatives are investigated for evidence of cardiac disease and categorised as "affected" or "unaffected."

I want to report the difference (or not) in proportion of blood relatives who are "affected" between the 2 groups.

For example:
Group 1 consists of 157 individuals comprised of 41 family clusters
Group 2 consists of 463 individuals comprised of 163 family clusters
Proportion "affected" in Group 1 = 22.9%
Proportion "affected" in Group 2 = 24.6%
I had initially used a simple Fisher / chi-squared test of proportions (group vs affected status in a 2x2 contingecy table). However, it is clear I need to adjust for the clustering of relatives within family groups.

What test is most appropriate in this circumstance, and which package in R provides the easiest way to account for this?

Having looked around (Google etc), I have found:

• Ratio estimate chi-square test
• Generalized estimating equation

However, I have no experience in this at all.

I believe that the Donner (1989) or Rao & Scott (1992) modifications of chi-squared may be appropriate. I have found package(aod) which includes functions donner() and raoscott()

I would certainly appreciate a second opinion on which (if either) to use, and what options are appropriate. I'm currently leaning to Donner, given its prior use in clinical medical / vet / dental research.

My current plan:

Code:
donner(cbind(y,n-y) ~ group, data=matrix)
raoscott(cbind(y,n-y) ~ group, data=matrix)
The data "matrix" will be 1 column per family / case with 4 columns: ID, group, n (number of relatives in family), y (number affected in family).

I am very grateful for any advice.

Many thanks.