Comparing differences in variable number of replicates from the same location but different times

#1
Hello,

I'm not in any way a statistician (so I'm hoping for a relatively simple analysis/ses if possible), but my boss asked me to investigate the differences/implications of having varying numbers of replicates (between 1 and 4). They suggested using a scatterplot to compare data from the same locations that were collected 2 different times. Most of those locations had (up to) 4 replicates during one collection and most only had 1-2 replicates during the other collection time (note: collected a month apart during the same year; due to environmental circumstances - normally go for 4 where possible).

Any ideas for how best to explore and explain the differences and accuracy between these 2 datasets? How can I explore the confidence we can have in the dataset that has fewer replicates based on the dataset that has up to 4?

Thanks!
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Can you provide a little more context or analog, so could we say these are franchises reporting revenues or something to understand what we are talking about, since that will influence our suggestions.
 
#3
Yes! I'm attaching a file with a subset of real data for this exercise, so we can start with that. The notes I include in the spreadsheet are: In July 2018, we were able to collect animals (very small!) from 5 sites (there are many more, this is just an example) and base the average weight on 4 replicates. When visiting these same sites in August 2018, we were only able to collect 1 sample from each site. How can we address this to be able to compare these weights - are the values from August 2018 similar enough to be reliable estimates with only one sample? Can we estimate the error or confidence or...???

1664911888773.png
 

Attachments

Last edited:

katxt

Well-Known Member
#4
Do you have the the data from each of the 4 replicates at a site so that we can find the SD of the reps as well as the average?
Do you expect much month to month variation?
In general, 4 reps is 2 times (ie sqrt(4)) more accurate than 1 rep. How many reps you need depends on how accurate your estimates need to be. So, you have to decide how much uncertainty you can tolerate.
 
Last edited:
#5
Hello - yes, sorry I haven't submitted questions to this site before so didn't think through all I should provide at once! The file with all 4 replicates is attached.

Thanks for the summary about the 4 reps. That is a helpful rule of thumb. As for the expected variation - generally, this data is collected July/Aug. so I believe they are expected to be similar. However, I think this exercise will be helpful in comparing data collected in the same month each year...where there are times that not all 4 replicates can be collected...? I think it will be helpful for us to understand (or at least report) how much error or uncertainty we have to accept in some cases where we just can't get all 4. But, just to throw in, we sometimes can only get 2...or 3...
1664990369669.png
 

Attachments

katxt

Well-Known Member
#6
OK. Thanks for that. Your data is very variable, but that is often the case with wildlife data. In cases like this it is often best to work with relative RSD or percentage standard deviations %SD (same thing). Your site data averages out at about 40% RSD.
The accuracy of a sample means is measured in terms of the standard error SE. In this case we are using relative or % standard error, RSE or %SE whichever term you prefer. RSE is connected to the RSD and the number of reps, n. RSE = RSD/sqrt(n)
So for 4 reps, your RSE is about 40%/sqrt(4) = 20%. For 1 rep, the RSE = 40%/sqrt(1) = 40%. You can work out the RSE for any rep number.
Now for the confidence intervals CI using RSE's.
What to do now depends on the number of reps, and the level of confidence you want.
Example. 4 reps, mean 5000, RSD = 40%/sqrt(4) = 20%. Find the 1 RSE or 68% CI.
The "probable" "2/3 sure" "68%" CI is 5000/exp(1x20%) to 5000*exp(1x20%) rounded suitably to 4100 to 6100. The true mean is probably between those two limits. (1/3 of the time it lies outside.)
Example. 1 rep, mean 5000, RSD = 40%/sqrt(1) = 40%. Find the 2 RSE or 95% CI
The "very likely" "95%" CI is 5000/exp(2x40%) to 5000*exp(2x40%), rounded suitably to 2200 to 11100. The true mean is very likely between those two limits.
That's the stats. Choosing the number of reps and CI level is now a management question. You and your boss need to balance the search effort against the accuracy you need.
 
Last edited: