Need help choosing proper analysis test

#1
I am currently developing an assay that tests biological samples and would like some help in figuring out how to statistically determine what data are outliers and to better define the parameters of a “successful” experiment.

The exact experiment is not important, but I have established both a negative and a positive control for the assay which is used during testing of all other variables. However, with source material coming from individual biological donors, there is variation, and at times the experiment just does not work due to unforeseen biological circumstances (i.e the positive control does not induce proper response, or the negative control and all other test conditions perform similar to the positive control). These data therefore cannot be used and are discarded (which is an acceptable outcome), but sometimes the response falls into an intermediate range. My question is then how can I decide if these intermediate data should be further analyzed.

I have over 300 paired data points for the positive and negative controls (which includes the experiments that did not work as defined above). What test would be best to determine the outliers? What test would be best to determine cut-off values (data ranges) for the positive and negative controls. For what its worth - a successful test would easily show a factor of 5 or (usually) greater difference between positive and negative results. I have analyzed the data using a box/whisker plot which helps with the outliers and reports back the min/max values. Is there any other analysis test that is better suited?

Thank you in advance for taking the time to read my post and for you help.
 

TheEcologist

Global Moderator
#2
I am currently developing an assay that tests
Thank you in advance for taking the time to read my post and for you help.
One way to identify possible outliers is to asses their influence on your parameter of interest. You can assess the influence of the outliers through removal, this is an interactive approach were you remove a single value from the dataset and calculate your parameter of interest and compare it to the value with the data point included. You do this for every value in your dataset (see an example script below [R-code] see also the Jackknife approach).

Data points that heavily influence your parameters of interest (e.g. mean) are candidates for complete removal.

You can also try transformations that reduce the influence of outliers

some links that can help:

http://ai.ijs.si/Branax/idamap-2000_AcceptedPapers/Laurikkala.pdf
http://www2.tltc.ttu.edu/westfall/images/5349/outliers_what_to_do.htm

Hope this helps.




R-script:

#a = dataset with one extreme outlier
a=c(rnorm(100),100)

outlier.id=function(a){
r=rep(0,length(a))
for(i in 1:length(a)){
b=a[-i]
#ratio of difference between the two means
r=mean(a)/mean(b)
}
return(r)
}

outlier.id(a)