Determining p-values, significant cutoffs and false discovery in non-normal data

#1
Hi,

I have a set of data points from a genetic study. I wish to ascertain which of these data points is statistically significant. The data is not normally distributed. Values can range from zero to the low thousands, with there being a large number of zero values.

At present, I am using a bootstrap method to determine which values are significant. A new data set is formed by sampling the original data set 100,000 times without replacement, and the distribution of the newly generated data set used to determine significant cut-off values for the original data set (i.e. the 1% cut off value is the 1000th highest value in the re-sampled data set). Individual p-values are similarly determined for each data point in the original data set by where they fall in the distribution of values in the re-sampled data set.

I am concerned that the excess of zero values and the fact that the original data set does not follow a normal distribution will mean that this method of determining significance will be biased or inaccurate.

Is this something I should be concerned about, and if so does anyone know a way around it?

Also, can anyone recommend a good method for adjusting for false discovery rates in the 1% cut-off value or the individual-values? I tried using the BY method for the individual p-values, but it converted every value to 1.
 

CB

Super Moderator
#2
I wish to ascertain which of these data points is statistically significant.
What do you mean by this exactly? As in, forgetting about the term "significance", how would you describe what you are trying to find out here in layman's terms?
 
#3
I want to find out which data points from my study are high enough to indicate that there is some kind of interesting result that warrants further investigation