# evaluating performance

#### jamesmartinn

##### Member
Hi all:

We've run up against a bit of a difficult scenario and we're looking for input on how to proceed. I will give some background on our data and goals:

N=15000 doctors (Unit of analysis)

For each doctor, we have the size of the patient practice (# number of patients they take care of)

For each doctor, we also have the number of patients who have an attribute we're interested in measuring (e.g. number of patients they take care of who are receiving a drug).

A sample data matrix looks like this.

Code:
ID      #Patients     #Patients on drug       #Proportion
1             20            5                               25%
2           100           25                              25%
3              40          20                              50%
.
.
.
.
15000        5            1                               20%

Our goals are to:

(1) Develop a classification analysis to classify doctors as having low, average/medium, high and very high rates of their patients on the drug.

(2) Visualize the data.

Issues:
-We have lots of data, N=15,000 points
-There is considerable spread between the number of patients cared for by physicians and so the proportion of those on the drug too- some physicians take care of like 50 patients, some in the hundreds and many in the thousands.

What we've done:
-Our attempt was to use a funnel plot for binomial data (#drug/#patients) at the physician level. We wanted to draw 95% and 99.8% confidence limits and see who falls where, under the null hypothesis of the overall (pooled) proportion of patients on the drug. (That is, sum up all the denominators across doctors, sum up all the numerators across doctors, divide this and get an ''average'' proportion)

This naturally lends itself to a classification method -> e.g. doctor X is above the upper 99.8% control limit -> let's call him extreme.

What we've noticed is that this methodology doesn't work, even when visualizing the data. Too many physicians are being classified as above the 99.8% limit and it's ultimately not useful. Even the final funnel plot graph is indecipherable, so one of the other issues is that we can't even get a clear picture of what the data is showing.

I'm just wondering if anyone had any approaches to visualizing the data or analyzing. The data simple but the volume and heterogeneity is causing an issue.

Thanks!

#### hlsmith

##### Omega Contributor
Can you upload an example funnel plot you are working with? So are you talking about a plot similar to what is used in meta-analyses?

So the confidence limits are not working because the CLT isn't coming into play given variability. Are you wanting to control practice size, so not just the crude outcome of percent prescribing? Can you do something with beta regression controlling for number of patients?

#### jamesmartinn

##### Member
Thank you for the reply Hlsmith!

Here is an example of the funnel plot. I can't post the actual data I have due to privacy of my organization but you can imagine the plot below being so heavily dense with data points all over the place that it's essentially not interpretable for our purposes.

For our setup, we took the overall proportion of the data (sum of all patients getting drug/sum of all patients) as the null hypothesis. This corresponds to the "mean" in the graph above.

Using this estimate, we use a large-sample approximation of the binomial to the normal to calculate confidence limits at the 95th and 99.8th points. On the y axis, we would then have the sample rate of each doctor as a function of their patient panel size on the x; the individual data points in the scatter part of the graph correspond the physician.

What we were hoping to do is use those falling in between the confidence limits as a natural classifier. For example, if you're above the upper 99.8 confidence limit, we deem you extreme. If you're in between the upper and lower 95% confidence limits, you might be "average" or "in control".

I think this is an appropriate way to do this theoretically. however given the volume and the spread of the data points, it's not giving anything useful as a lot of physicians are being classified as extreme.

I've thought about changing the null hypothesis (of which the limits are calculated under and thus classification) from the overall sample proportion (which is probably misleading) though I'm not sure what I can change it to as this is more clinical than statistical.

Just wondering if there are any other approaches for the problem at hand (both in terms of classifying and even visualizing the data) you can think of. you mentioned beta regression? I haven't worked with this type of generalized linear model before, what could I potentially get from here?

Thanks!