Comparing Two Sets of Data, Slightly Different Measurement Times

#1
Hi, I have two sensors measuring temperature of two items that have a slightly different treatment (to see if affects their thermal properties). The temperature is cyclic in that every day it goes up and down: the temperature varies via natural elements i.e. solar radiation. The temperature of both items is measured every 30 minutes but not necessarily at the same times i.e. Item A measured at 2:34pm, Item B measured at 2:40pm and so on. I want to check two things:
  1. Is the treatment causing a change in the temperature
  2. if 1) = true, what is the mean difference caused by the treatment
I suppose I could estimate the temperature to allow comparison of the temperatures at the same time e.g. if Temperature is 20C at 12:00 and 22C at 12:30 and I want the temperature at 12:15, (22-20)/30mins X 15mins = 21C. Maybe this is overcomplicating the problem or is there a smarter way of handling this?
There is an added complication to the temperature in that the effect, if present, should cause a larger difference when the temperature is warmer during the day. How would I isolate my data for warmer temperatures? The problem as I see it is that I can't select the data directly from both sensors based purely on time because the measurements aren't recorded at exactly the same time. I also can't select the data from both sensors where the temperature > e.g. 20C (because if the treated Item has in fact a lower temperature, I would be excluding this from the analysis: if the effect caused the temperature to be 2C lower, I would need to include data from > 18C
I've attached a very small sample of the data. I have collected at least 8,000 data points for each item.
 

Attachments

Miner

TS Contributor
#4
I have one good idea, plus one not-so-good one. :D

  1. Start with plotting your data (good idea). The small sample you provided shows that there is a definite impact (at least during this time window).
  2. A question regarding the data validity. How large is your measurement variation? Is the temperature dip caused by low resolution of the temperature measurement, or is it caused by a passing cloud that blocks the solar radiation?
  3. For short segments of time, you could fit a polynomial regression (quadratic/cubic) and use it to predict the temperatures for the same moment in time. However, this will probably be impractical for the full data set.

1667858171033.png 1667858656144.png 1667858664643.png
 
#5
Hi @Miner, thanks for the reply. Here is a larger plot of the some of the data. I think it's pretty clear to see that the white (painted) temperature is lower than the blue (unpainted) temperature as the temperature gets higher. My issue is how do I correctly describe this mathematically!!
Data_Subset.jpg

I was thinking that I could use a linear interpolation to estimate the temperature at the same time for painted and unpainted. If I did this, I could just estimate the difference in temperature for every point. Would that make sense? Is there a way I could estimate the error introduced by doing this? My bad sketch of interpolation below!! If I want to estimate the temperature at time t, I assume linearity between the two nearest points and calculate the temp.
Screenshot 2022-11-08 100806.jpg
 

Miner

TS Contributor
#6
I think your idea for interpolation is reasonable. You can use a paired t-test to quantify the difference. You mentioned it in your original post, and I see in the graph that this delta appears larger on the peaks than it appears in the valleys. I would refrain from interpolating in the zones with a steep slope because your error will be greater. For the paired t-test, I would keep the peaks separate from the valleys. You could also plot the deltas by temperature and perform a regression.

Regarding how to quantify the error, the variation in temperature will be much greater than the interpolation error, so I think you can ignore it. If you really need to quantify it, you can select 10 - 20 points and make the interpolation 3 - 5 times. You would need to do this randomly with enough elapsed time so that you do not recall your earlier results and bias the later interpolations. So you cannot look at earlier calculations or record results together. Compile results when complete.

NOTE: These are all practical recommendations based on what I would do in industrial statistics. We do a lot of things that the theoretical statisticians would frown upon, but we do them anyway because they work out for us.
 
#7
Hi, thanks again for the very useful information. The variation in temperature is definitely more pronounced at higher temperatures and this makes sense: when the sun is out, radiation is part of the "heating up" process which is affected by, among other things, colour; when the sun goes in, the ambient air temperature via convection and ground conditions via conduction are the main players which are largely unaffected by colour.
If I was to truncate my data at e.g. >25C and only analyse that, interpolate between values to make pairs to use in my t-test, would that be valid? (not looking to keep the theoretical boys happy, just not do something nonsensical!). I've not done a paired t-test before so I'll follow the instructions here: https://www.jmp.com/en_gb/statistics-knowledge-portal/t-test/paired-t-test.html
 

Miner

TS Contributor
#8
If I was to truncate my data at e.g. >25C and only analyse that, interpolate between values to make pairs to use in my t-test, would that be valid? (not looking to keep the theoretical boys happy, just not do something nonsensical!).
That would work okay for the paired t-test. If you want to see how the difference varies by temperature using regression, you would need the full data set.
 
#9
OK, so analysis is done using the following:
1. Match up the data as closely as possible on date/time for painted and unpainted
2. Write quick function in Excel to estimate the temperature of the unpainted at the painted date/ time using a linear interpolation
3. Run a Paired t-test using Excel's Data Analysis pack
4. Remove all data where unpainted temperature < 30C and run the Paired t-test again
This is what I got:
All Data:
AllData.jpg
Temperature Over 30C:
Data_30.jpg

In terms of interpreting the results, it looks like there is a significant difference. My t Stat is 85.7 for all data and 77.13 but I'm not 100% what the rest means!!
 
#10
And this is what it looks like when the temperature difference is plotted. The correlation value isn't great at 0.56 but it definitely shows the trend I was expecting i.e. the temperature difference growing as the temperature increases. All comments welcome! 1668035055532.png
 

Miner

TS Contributor
#11
OK, so analysis is done using the following:
1. Match up the data as closely as possible on date/time for painted and unpainted
2. Write quick function in Excel to estimate the temperature of the unpainted at the painted date/ time using a linear interpolation
3. Run a Paired t-test using Excel's Data Analysis pack
4. Remove all data where unpainted temperature < 30C and run the Paired t-test again
This is what I got:
All Data:
View attachment 4445
Temperature Over 30C:
View attachment 4446

In terms of interpreting the results, it looks like there is a significant difference. My t Stat is 85.7 for all data and 77.13 but I'm not 100% what the rest means!!
Are there specific line items in which you are interested?

P-values
1-tailed vs. 2-tailed tests.
 

Miner

TS Contributor
#12
And this is what it looks like when the temperature difference is plotted. The correlation value isn't great at 0.56 but it definitely shows the trend I was expecting i.e. the temperature difference growing as the temperature increases. All comments welcome! View attachment 4447
The correlation is impacted by two things:
  1. Pearson correlation is for LINEAR correlations. This is not linear, so you should try a Spearman correlation. Or did you intend R^2 instead of correlation? They are not the same.
  2. There is a lot of scatter about the line. This could be caused by measurement error, or by whether the temperature load is caused by radiation (sunny) or convection (cloudy). If you could add an additional data tag to you data set (radiation=1; convection=0), you could add that as a predictor in a multiple regression, and could probably improve your R^2.
 
#13
Hi, if I look at the >30C result.
The means are 35.19 and 32.70.
The t-Stat is 77.13 which is greater than the t-Critical two tail of 1.96 therefore the means are significantly different and the treatment has an effect. Is that correct? Can I also infer the difference caused is 35.19 - 32.70?
 

Miner

TS Contributor
#14
Hi, if I look at the >30C result.
The means are 35.19 and 32.70.
The t-Stat is 77.13 which is greater than the t-Critical two tail of 1.96 therefore the means are significantly different and the treatment has an effect. Is that correct? Can I also infer the difference caused is 35.19 - 32.70?
Yes to both questions.