# Best way to analyse the data

#### will1234

##### New Member
Hi,
I am new here, so not sure that this is the best place to ask the question or not. Here I go anyways.
As show below, I have a dataset in which the parts made in Tool1and from each run( runs_Tool1- nominal values) goes through Tool2 and runs - nominal value. After going through Tool 2, the parts are measured.
Each run from Tool1 has major impact on the measured data. What is the best way to standardize the data so that I am only looking at the variations from Tool2?
Regards
Will

#### obh

##### Active Member
Hi Will,

I'm not sure exactly what do you want to check?

If for example, you run a regression the dependent variable (y) will take into consideration both tool1 and tool2.

#### will1234

##### New Member
Hi Obh
The runs that I have mentioned are discreet so I can't really get a regression. That is the reason for my question
Regards
Will

#### obh

##### Active Member
How many values for the discreet variable? (dummy variables)

Last edited:

#### will1234

##### New Member
Hi Obh,
Sorry I don't understand your question. Each runs in Tool1 has certain number of parts go through it. The same parts then may be sudivised or all put through Tool
Regards
Will

#### will1234

##### New Member
Hi
For each run from Tool1, I had calculated the median for each Tool2. The problem is if the parts from Tool1 does not get split between the 3 tools in Tool2, then the problem how do I compare the the 3 tools in Too
regards
will

#### obh

##### Active Member
What do you mean 3 tools in tool 2?

#### will1234

##### New Member
Hi Obh,
Hope this is clear

For each run from Tool1, I had calculated the median for each Tool2. The problem is if the parts from Tool1 does not get split between the 3 tools in Tool2, then the problem how do I compare the contributions from Tool2
regards
will

#### hlsmith

##### Less is more. Stay pure. Stay poor.
@will1234 No idea what you are writing about and presenting here. Please take a big step back and tell us what field you work in, what is your overall question, and define these data. Do we need all of these columns, is this time series, is it quality improvement engineering stuff, etc.

#### will1234

##### New Member
Hi Hlsmith/obh
I am an engineer in a manufacturing plant that makes sensors for cars. I recently started there. I know that the Tool1 in column 2 and each run in column 3 has huge impact on the measured data. That is assuming that the tools in column 5 are matched.
As I have stated before, if the parts from the same tool in Tool1 go through 3 or 3 different tools in Tool2 -column 5 then there is a way of comparing the difference or variations from Tool2 's but if it doesn't then the question is how do I compare the Tools in Tool2. So need column 2, column 3,column 5, and column 6
Regards
Will

Last edited:

#### will1234

##### New Member
@will1234 No idea what you are writing about and presenting here. Please take a big step back and tell us what field you work in, what is your overall question, and define these data. Do we need all of these columns, is this time series, is it quality improvement engineering stuff, etc.
Hi Hlsmith
I am an engineer in a manufacturing plant that makes sensors for cars. I recently started there. I know that the Tool1 in column 2 and each run in column 3 has huge impact on the measured data. That is assuming that the tools in column 5 are matched.
As I have stated before, if the parts from the same tool in Tool1 go through 3 or 3 different tools in Tool2 -column 5 then there is a way of comparing the difference or variations from Tool2 's but if it doesn't then the question is how do I compare the Tools in Tool2. So need column 2, column 3,column 5, and column 6
Regards
Will

#### obh

##### Active Member
Not clear yet ...

So if I understand you ...
Each part going through 2 stations: Station1 and station2. (you called in tool1 and tool2)
In each station, you can use one of 3 tools (in station1, the following tools: a, b, c . in station2, the following tools: e, f, g)

If so and if you want to run a regression???? you should use dummy variables. for example, in station1 you should use 2 variables: a,b
if a tool was used: (a=1, b=0)
if b tool was used: (a=0, b=1)
if c tool was used: (a=0, b=0)

And a similar way for station2.

Are a, b, c and e, f, g the same tools? (say a is e,, b is f , c is g)
Do you produce the same product, and you want to choose what is the best combination (a,e) , (a,f), ...(c,g)
do you expect it to be the same measurements?
what does meas mean?
What do you expect meas value to be?
is your goal minimum standard deviation for meas ?

#### will1234

##### New Member
Not clear yet ...

So if I understand you ...
Each part going through 2 stations: Station1 and station2. (you called in tool1 and tool2)
In each station, you can use one of 3 tools (in station1, the following tools: a, b, c . in station2, the following tools: e, f, g)

If so and if you want to run a regression???? you should use dummy variables. for example, in station1 you should use 2 variables: a,b
if a tool was used: (a=1, b=0)
if b tool was used: (a=0, b=1)
if c tool was used: (a=0, b=0)

And a similar way for station2.

Are a, b, c and e, f, g the same tools? (say a is e,, b is f , c is g)
Do you produce the same product, and you want to choose what is the best combination (a,e) , (a,f), ...(c,g)
do you expect it to be the same measurements?
what does meas mean?
What do you expect meas value to be?
is your goal minimum standard deviation for meas ?

Hi Obh,
Thanks for your reply. I am not sure how I can apply to my situation.
Meas means measurements
I have 2 stations
Station 1 has tool A,B&C. Each tool will have a run. Each run in each tool is unique. Let's say a run via tool A is A1,a second run is A2....etc. The same principle applies to tool B & C.
Station 2 has tool E,F,G. Each of these tool will inturn process whatever comes out station 1 tools. Let's say A1 run from tool A station 1 is split into E as E1 run and into F as F1 run. And let's say that the A2 run just goes through G as G1.
How do I compare only the variations on station2 tools without taking into account the input from station1 tools and runs.

Station1 tools and station 2 tools are different

The product is made in station 1 tools but further processing is carried out in station 2 tools before any measurements are carried out

I hope I have explained it well enough for you. If not please let me know and I will try again

Regards
Will

#### will1234

##### New Member
Not clear yet ...

So if I understand you ...
Each part going through 2 stations: Station1 and station2. (you called in tool1 and tool2)
In each station, you can use one of 3 tools (in station1, the following tools: a, b, c . in station2, the following tools: e, f, g)

If so and if you want to run a regression???? you should use dummy variables. for example, in station1 you should use 2 variables: a,b
if a tool was used: (a=1, b=0)
if b tool was used: (a=0, b=1)
if c tool was used: (a=0, b=0)

And a similar way for station2.

Are a, b, c and e, f, g the same tools? (say a is e,, b is f , c is g)
Do you produce the same product, and you want to choose what is the best combination (a,e) , (a,f), ...(c,g)
do you expect it to be the same measurements?
what does meas mean?
What do you expect meas value to be?
is your goal minimum standard deviation for meas ?

Hi Obh,
Thanks for your reply. I am not sure how I can apply to my situation.
Meas means measurements
I have 2 stations
Station 1 has tool A,B&C. Each tool will have a run. Each run in each tool is unique. Let's say a run via tool A is A1,a second run is A2....etc. The same principle applies to tool B & C.
Station 2 has tool E,F,G. Each of these tool will inturn process whatever comes out station 1 tools. Let's say A1 run from tool A station 1 is split into E as E1 run and into F as F1 run. And let's say that the A2 run just goes through G as G1.
How do I compare only the variations on station2 tools without taking into account the input from station1 tools and runs.

Station1 tools and station 2 tools are different

The product is made in station 1 tools but further processing is carried out in station 2 tools before any measurements are carried out

I hope I have explained it well enough for you. If not please let me know and I will try again

Regards
Will

#### obh

##### Active Member
So every part run only one tool in station1 (A or B or C) and then one tool in station1 (E or F or G)?

#### will1234

##### New Member
So every part run only one tool in station1 (A or B or C) and then one tool in station1 (E or F or G)?

Hi Obh,
Every part runs only one tool in station 1 (A,B or C )-- yes
Then they can run in one tool in station 2 or split between the tools in station 2 (E,F and G)
Regards
Will

#### obh

##### Active Member
Hi Will,

You may run a regression with dummy variables. if you want to understand how each tool influence the measurement.
Is this what you want ...? please see the example...

for example, in station1 you should use 2 variables: A,B
if A tool was used: (a=1, b=0)
if B tool was used: (a=0, b=1)
if C tool was used: (a=0, b=0)

Example:

The following example in your format:

Station1 Station2 meas
A A 5
A B 6
A C 11
B A 13
C C 21
A A 7
A B 8
A C 12
B A 13
C C 22

Will be translated into the following format:

A1 B1 A2 B2 Meas
1 0 1 0 5
1 0 0 1 6
1 0 0 0 11
0 1 1 0 13
0 0 0 0 21
1 0 1 0 7
1 0 0 1 8
1 0 0 0 12
0 1 1 0 13
0 0 0 0 22

Now run a multiple regression:

Y = 21.5000 - 10.0000 A1 - 3.0000 B1 - 5.5000 A2 - 4.5000 B2

If you use in station 2 tool A the measurement will be smaller by 5.5 related to using C tool.
If you use in station 2 tool B the measurement will be smaller by 4.5 related to using C tool.

http://www.real-statistics.com/mult...ssion-analysis/categorical-coding-regression/
http://www.statskingdom.com/410multi_linear_regression.html