Say I have a dataframe that looks like this (wasn't sure how to actually rep. this data, given its size):
The dataframe has +800k rows of continuously measured data in 15 min. increments for the past couple of years at "n" sites.
I'm attempting to make a matrix/table, like the image attached, in R (that was originally created in Excel). Color is not important, I only hope to re-create the table itself. The values in this plot are the maximum/highest number of days (converted from consecutive minutes) where a range of values (sal) were consistently measured (ex. For each site, season, and year, create a table that shows the longest lasting run of continuous time where the values stayed between, for example, 40-50).
To belabor the point: values are only considered consistent if they occur at times, days, months, and years that are in order.
# My attempt:
I found this code somewhat useful, but it doesn't account for the fact that consecutive values need to be by an unbroken chain of the time.
Code:
> head(df)
yr mo day time sal site
2021 8 1 0000 26.614 14
2021 8 1 0015 25.724 14
2021 8 1 0030 25.739 14
2021 8 1 0045 25.831 14
2021 8 1 0100 25.798 14
2021 8 1 0115 25.667 14
I'm attempting to make a matrix/table, like the image attached, in R (that was originally created in Excel). Color is not important, I only hope to re-create the table itself. The values in this plot are the maximum/highest number of days (converted from consecutive minutes) where a range of values (sal) were consistently measured (ex. For each site, season, and year, create a table that shows the longest lasting run of continuous time where the values stayed between, for example, 40-50).
To belabor the point: values are only considered consistent if they occur at times, days, months, and years that are in order.
# My attempt:
I found this code somewhat useful, but it doesn't account for the fact that consecutive values need to be by an unbroken chain of the time.
Code:
df %>% group_by(site, mo, yr) %>%
mutate(high_salinity = between(sal, 40, 50),
high_salinity_duration = cumsum(high_salinity) * high_salinity) %>%
summarise(longest_high_salinity = max(high_salinity_duration))
Attachments
-
28.1 KB Views: 2
Last edited: