Is this problem appropriate to statistical methodology?


New Member
Hi. First, thanks to Jin for enabling my account and allowing me to post!

My question is kind of complex, so I will attempt to reduce it to the most salient points. I am not even sure if there is an appropriate statistical methodology to handle my problem, so I will let the learned contributors help me with that determination.

Assume that there exists a batch process that produces a certain product. Assume further that there are three (3) variable characteristics of the product that are of interest (fat content (%), moisture content (%), and pH). Also assume that these variables can be measured at a certain batch temperature (for instance, 160 deg F.), and then measured again at a certain storage temperature (35-45 deg. F.). Assume that the variable characteristics change somewhat when measured at the batch temperature versus when measured at the storage temperature. Finally, assume that the variable characteristics must fall within a certain range at the storage temperature, that it takes a significant amount of time for the batch product to reach the storage temperature (during which time many batches of substandard quality might be produced), and that it is desired to sample the product at the batch temperature and somehow infer the range into which the variable characteristics might fall when the product subsequently cools to storage temperature. In essence, the question then becomes: is there some statistical methodology that will allow the batch operator to predict the values the variable characteristics might take when the product cools to storage temperature if the product is analyzed--and the variable characteristics are ascertained--at the batch temperature?

The design of the study calls for samples to be taken at the batch temperature, divided into two (2) sub-samples, one of which is immediately analyzed for the variable characteristics at the batch temperature and the other of which is analyzed at a later time at the storage temperature. Sample sizes are of at least n>=30 (i.e., economic constraints on sample size are relatively insignificant, and as many samples as necessary to establish some predictive value or algorithm can be analyzed).

I have been playing with "made-up" data and the Data Analysis tools available in MS Excel, but I am not really sure what the results are telling me or even what statistical methodology might be appropriate to apply to this scenario, so any direction or guidance from more learned individuals would be definitely appreciated--even if it is simply that there is no appropriate statistical methodology! Thanks.


TS Contributor

Yes, this can definitely be tackled using statistical methodology - more specifically, through correlation or regression (prediction).

If possible, you should sample items for the attributes of interest (fat content (%), moisture content (%), and pH) at the batch temperature, then measure those same items at the storage temperature - that way you can generate a more accurate prediction of how the attributes change, and what measurements at batch temperature may lead to items of poor quality in storage.

The risk you take with your proposed method is that if the two subsamples are somehow different, the measurement taken at batch temperature may tell you little or nothing about what happens at storage temperature.

If the measurements/analysis are destructive (once you measure an item at batch temperature, it's gone...) then make sure the subsamples are divided at random. In this case, all you'll be able to do is compare average attributes at batch vs average attributes at storage, not individual items - not ideal, but better than doing nothing or guessing...


Welcome to the forum.

Regression is commonly used for predicting future outcomes. Since you have multiple explanatory variables, you would use multiple regression. John's comments on same items and random subsamples are quite important, these steps will help reduce the error. Please post here if you have further questions.


TS Contributor

The variables fat content (%), moisture content (%), and pH aren't really explanatory variables - they're response variables at two different points in his process, so I would recommend doing three separate simple regressions:

(1) fat content at batch temperature vs fat content at storage temperature
(2) moisture content at batch temperature vs moisture content at storage temperature
(3) pH at batch temperature vs pH at storage temperature

My reasoning is that for each of these three, the relationship (y = bx + a) may be different, so he'll need to understand that in order to make accurate predictions.

Oops they are response variables indeed. Simple regression would suffice.

Note to self: never reply/comment after midnight. :yup: