correct approach for working with probably biased sample data


New Member

I want to estimate the CO2eq. potential of a process. In one category I only have data on total kg input, subsuming about a 1000 different inputs. So I need an indicator for kgCO2eq./kg input in that category. I found kgCO2eq./kg values for 33 of them; continuous data. My ultimate goal is to create an indicator for that input category (subsuming the 1000 inputs) on the basis of the values I found for 33 of them, to ideally get mean, upper and lower boundary, to finally provide a "best guess, best and worst case"-scenario.
It is easy to calculate the mean value and a confidence interval on the basis of the 33 values. Is this a legitimate procedure though? Especially, plotted, the distribution of my values is rather U-shaped; can I assume a normal distribution at all? If not, how should I adjust my approach (e.g. should I first try to approximate the "true" -then presumably somehow quadratic- function from my data)? Is there a more appropriate method to build an indicator based on the available data?

I'd appreciate any help, thanks in advance!

PS: I m a first time poster, so please let me know if I should clarify or correct sth about this post.