Hi all,
This is a beginner's question, I apologize for that.
I have taken Descriptive Statistics and I am on my why to Probability and later Inferential Statistics. But I have a burning question. No, this is not homework or an kind of assignment, but a confusion I cannot shake off.
I want to determine what would be the beginning and end of a rainy season. Imagine I have 10 years of daily data with a simple boolean like Rained: true or false.
Visually one could see that, for instance, from late December to beginning of March there are more rainy days that dry days. At some point, I assume, the probability of rain is above 50%.
But I just don't understand how I would compute the day when the probability starts to be higher than 50% and when it comes down again. Here is why:
I thought about simply getting an average of the individual days over the 10 years, but this is an analysis where no day "knows" of its neighbors. What if there are a couple of days within the rainy season that are below 50% in my samples. That doesn't mean the season stopped in resumed two days later. Therefore, I assume there needs to be some kind of interday knowledge or at least smoothing; would I take averages by weeks? But am I then exposed to how I am arbitrarily determining when a week starts or ends? See how muddle headed this is for me?
Your explanation is much much appreciated! It sucks to be so confused...
This is a beginner's question, I apologize for that.
I have taken Descriptive Statistics and I am on my why to Probability and later Inferential Statistics. But I have a burning question. No, this is not homework or an kind of assignment, but a confusion I cannot shake off.
I want to determine what would be the beginning and end of a rainy season. Imagine I have 10 years of daily data with a simple boolean like Rained: true or false.
Visually one could see that, for instance, from late December to beginning of March there are more rainy days that dry days. At some point, I assume, the probability of rain is above 50%.
But I just don't understand how I would compute the day when the probability starts to be higher than 50% and when it comes down again. Here is why:
I thought about simply getting an average of the individual days over the 10 years, but this is an analysis where no day "knows" of its neighbors. What if there are a couple of days within the rainy season that are below 50% in my samples. That doesn't mean the season stopped in resumed two days later. Therefore, I assume there needs to be some kind of interday knowledge or at least smoothing; would I take averages by weeks? But am I then exposed to how I am arbitrarily determining when a week starts or ends? See how muddle headed this is for me?
Your explanation is much much appreciated! It sucks to be so confused...