# General Applications of Various Statistical Techniques for Continuous Processes

#### madifeo

##### New Member
My hope is that people can chip away at the questions raised in the article. All responses are welcome!

I have been spending a bit of time trying to teach myself various univariate and multivariate statistical techniques. I would really appreciate a "one stop shop" with a general overview of various statistical techniques and their applications.

I am hoping to use statistics in continuous manufacturing processes, where on-line gauges constantly provide temperature, pressure, yield, pH, and other readings. In general, these readings can be input/output variables (to a reactor or other unit operation), and the readings could be measured every second, 10 seconds, minute, hour, day, month, and so on (I may mistakenly refer to this as time-series data). Input (independent) variables will typically be controlled with a set-point, but other uncontrolled variables are a possibility. Hence, in the case of every-one-second measurements at a particular set-point, there will be repeat measures.

I am currently reading a book on multivariate statistics for quality management, but I feel applications (so far) to repeat measures problems are lacking. Ultimately, it would be nice to have input regarding a starting point (method) so that I can go learn how/why a technique works, and then apply it without rummaging through endless possibilities before finding the right approach.

So, here is what I understand so far, and please correct any erroneous information. All questions I have begin with capital letters, but the other information could also be faulty:

__One-way ANOVA__
Assumptions:
-normal distribution
-standard deviations among levels are "equal"

Qualities:
-tests more than 2 population means to determine if they are significantly different
-one response variable (Y)
-tests using one "factor" or "treatment" (X) with varying levels
-can use repeat measures
-can use replicates
-followed by pairwise comparisons to see which populations differ
-Can replicates and repeat measures both be used in a single experiment?

__N-way ANOVA__
Assumptions:
-normal distribution
-standard deviations among levels are "equal"

Qualities:
-tests more than 2 population means to determine if they are significantly different
-one response variable (Y)
-tests using N "factors" or "treatments" (X(i=1:N)) with varying levels
-followed by pairwise comparisons to see which populations differ
-Do replicates assume you DO NOT have time-series data? In the process industry (assuming you are not collecting time series data), would collecting multiple data points at the same set point for each factor (X) be considered a replicate, even though you are collecting these data on the same process?
-Can this handle repeat measures (time-series data)?

__ANCOVA__
My Understanding:
-same as N-way ANOVA, but applied when the factors are assumed to be correlated with each other (for example, temperature may effect pressure or vice-versa)
-Can this handle repeat measures or replicates?

__MANOVA__
Assumptions:
-multivariate normal distribution
-standard deviations of each dependent variable are equal

Qualities:
-tests more than 2 population means to determine if they are significantly different
-p response variables (Y(i=1: p))
-tests using one "factor" or "treatment" (X) with varying levels
-automatically accounts for covariance or correlation
-Can this be followed by pairwise comparisons? Would it make sense to do so?
-Can this handle repeat measures or replicates?

__N-way MANOVA__
My Understanding:
-same as N-way ANOVA, but has multiple factors (X's) AND multiple dependent variables (Y's)
-Can this handle repeat measures or replicates?

Here is where I am a bit more shaky:

__PCA__
Assumptions:
-multivariate normal distribution
-Feel free to list the others

Qualities:
-variables from data set MUST BE "p" response (dependent) variables
-Assuming one can use independent variables in this analysis, would it make sense to do so if these variables are controlled with a set-point? How could one relate the input variables back to the response variables?
-determines a few principal components that account for the majority of variation in a data set
-these principal components will provide insight as to which response variables are accounting for most of the variation--this is done through variations "modes," so one can determine which of the p variables show variation in each mode
-with this information and knowledge of the process, one can fix the root cause of variation by knowing what directly affects the response variables
-Can this handle repeat measures or replicates?

__CFA (Common Factor Analysis)__
My Understanding:
-very similar to PCA in its theoretical background
-variables from data set MUST BE "p" response (dependent) variables
-used to determine which variables are most highly correlated rather than "modes" of variation for the data set
-Can this handle repeat measures or replicates?

________________________

I understand the dangers of simply "trusting" measurement systems and large logbooks of data. I understand the need to verify measurement systems and that not all answers can be discovered through any of these analyses. I understand that brainstorming root causes and solutions with a team is highly effective. However, it would be great to add more techniques to my statistical toolbox with confidence.

Thank you!

-Mike

Last edited: