Which stats test to use?


Hi there,

I want to analyse the impact of turning a production line on and off (variable X, binary) on the environmental quality of wastewater being discharged; more specifically, the suspended solids content (variable Y, continuous). I have a set of daily data for a whole month, which shows if the production line was running or not, and the respective suspended solids content for those days. In other words, I have a data set similar to below:

15th January X=0 (line was not running) Y=110 mg/l
16th January X=1 (line was running) Y=210 mg/l
17th January X=1 Y=245 mg/l
18th January X=0 Y=170 mg/l

And so on.

I’m looking for a statistical test that can analyse the relationship between the X and Y variables; in other words, to statistically prove whether running the production line has a significant impact on suspended solids content.

I was wondering:
a) Would a biserial correlation test be the most appropriate test to use?
b) If this is not the most appropriate test, are you aware of something I could use instead?

Any help would be really appreciated, thanks guys!
How about histograms and boxplots for "line was not running" and "line was running" and then do a t-test on the response variable. Does the data look normally distributed?