Aggregate precision of a measurement instrument based on a sample of measurements whose true errors are known

I want to estimate the accuracy and precision of aggregate data from an instrument that measures a variable that fluctuates over time by comparing a sample of its measurements to concurrent measurements made by a reference instrument.

For this purpose, assume that the reference instrument (hereafter referred to as Ref) has perfect accuracy and precision. Ref can only take manually initiated point-in-time measurements, whereas the subject instrument (Sub) continually takes automatic measurements (at regular intervals that are very short relative to the rate of change, so effectively continuous) and provides aggregate statistics (including mean, SD, CV), but its accuracy and precision are in question.
To gauge the reliability of Sub, I take measurements with Ref at regular intervals and subtract them from Sub's concurrent measurements. Thus, I have a sample of measurements from Sub whose true errors are known.

What I'm primarily interested in is the reliability of Sub's aggregate mean - the mean of all the values it measures over some designated period of time.
Determining the bias from this data, I believe, is very straightforward: The average of the errors of my sample of reference measurements gives me an estimate of Sub's bias. Unless I'm overlooking something, there shouldn't be a difference between the the average point-in-time bias and aggregate bias, so this bias estimate applies to the aggregate mean for any time span.

What about the mean's precision?

If I understand correctly, I can calculate the standard error of the mean (SEM) by dividing the sample SD of the errors (differences between Sub and Ref's measurements, for each measurement taken by Ref) by the square root of the number of data points in the sample. It would then follow that the 95% CI is bounded by subtracting and adding the SEM from/to the bias estimate. So, for example, if the mean of all the error values is -5.72, their sample SD is 19.4, and the number of measurement pairs is 300, SEM = 19.4/√200 = 1.37, and the 95% CI is from -7.09 to -4.35 (meaning that there's a 95% chance that Sub's mean is between 7.09 and 4.35 below the actual mean).

First, are those calculations correct?

Second, what vexes me about this is that given that Sub is subject to random error as well as bias, its aggregate mean should be more precise the longer the time interval - its 10 day mean should be more precise than its 5 day mean, and its 20 day mean should be still more precise - but the calculations above are influenced by the sample size of the reference measurements, not the number of measurements made by Sub in the given time interval. So is what I'm really calculating the SEM and 95% CI of the bias estimate, not for Sub's mean of all the data? If that's the case, should I be using the population SD rather than the sample SD, since I'm calculating the SEM and CI for the entire population of sample measurements, rather than for all the data based on a sample?

And the key question:
How do I determine the precision of Sub's mean for a given time interval?

Would I divide the sample SD of the errors by the square root of the number of discrete measurements made by Sub during the interval? For example, if Sub takes one measurement per minute, then would the SEM for the mean of 12 hours of Sub data be 19.4/√720 = .72, and the 95% CI therefore -6.44 to -5.00? The reasoning is that the sample SD estimates the variability of a single measurement by Sub from Sub's bias, so the number of minutely measurements made by Sub during those 12 hours is the sample size (assuming that the random error for each measurement is independent)? Or is that unsound for some reason?

A followup question (although the bold question above is the main thing I'm hoping to have answered):

Instead of stipulating that Ref is to be regarded as providing perfect measurements, what if I want to take Ref's imprecision into account? Suppose, for example, Ref's measurements have a documented SD of 2.5 and bias of -.08. I presume that to get the combined bias I would simply add Ref's bias to my estimate of Sub's bias relative to Ref, for a total bias of -5.80. How would I incorporate Ref's SD of 2.5 into my precision metrics for Sub's aggregate means?

If you have gotten all the way down here, thank you very much for taking the time to read this. I am neither a student nor a professional researcher, but I have a very important practical need to establish the reliability of aggregate data produced by this instrument. My knowledge of statistics is largely self-taught, so it's possible that I may have misunderstood some concepts and I hope you can help me make sure I'm approaching this problem correctly. This is my first time posting here, so please forgive me (and correct me) if I've committed any kind of faux pas (I did read the guidelines and FAQ before posting).
Last edited: