--- title: "Introduction" output: markdown::html_format: vignette: > %\VignetteIndexEntry{Introduction} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} %\VignetteDepends{outbreaks, ggplot2} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.width = 7, fig.height = 5 ) ``` # Overview The goal of grates is to make it easy to group dates across a range of different time intervals. It defines a collection of classes and associated methods that, together, formalise the concept of grouped dates and are intuitive to use. To assist in formatting plots of grates objects we also provides x-axis scales that can be used in conjunction with [ggplot2](https://cran.r-project.org/package=ggplot2) output. Currently implemented classes are: - [grates_yearweek](#yearweek); - [grates_epiweek](#yearweek); - [grates_isoweek](#yearweek); - [grates_period](#period); - [grates_yearmonth](#yearxxx); - [grates_yearquarter](#yearxxx); - [grates_year](#yearxxx); and - [grates_month](#month) The underlying implementation for these objects build upon ideas of Davis Vaughan and the unreleased [datea](https://github.com/DavisVaughan/datea/) package as well as Zhian Kamvar and the [aweek](https://cran.r-project.org/package=aweek) package. Note that for brevity in the rest of the vignette, we will drop the `grates_` prefix when discussing the underlying class. # grates objects ## yearweek, epiweek and isoweek {#yearweek} yearweek objects are stored as the number of weeks (starting at 0L) from the date of the `firstday` nearest the Unix Epoch (1970-01-01). Put more simply, the number of seven day periods from: - 1969-12-29 for `firstday` equal to 1 (Monday) - 1969-12-30 for `firstday` equal to 2 (Tuesday) - 1969-12-31 for `firstday` equal to 3 (Wednesday) - 1970-01-01 for `firstday` equal to 4 (Thursday) - 1970-01-02 for `firstday` equal to 5 (Friday) - 1970-01-03 for `firstday` equal to 6 (Saturday) - 1970-01-04 for `firstday` equal to 7 (Sunday) They can be constructed directly from integers via the `new_yearweek()` function but it is generally easier to use the either the `as_yearweek()` coercion function or the `yearweek()` constructor. `as_yearweek()` takes two arguments; `x`, the vector (normally a Date or POSIXt) you wish to group, and `firstday`, the day of the week you wish your weeks to start on. `yearweek()` takes three arguments; `year` and `week` integer vectors and, again, a `firstday` value. The epiweek class is similar to the yearweek class but, by definition, will always begin on a Sunday. They are stored as the integer number of weeks (again starting at 0L) since 1970-01-04 so internally are akin to `grates_yearweek_sunday` objects but with the benefit of slightly more efficient implementations for many of the associated methods. Likewise, the isoweek class is similar to epiweek class but uses the [ISO 8601](https://en.wikipedia.org/wiki/ISO_week_date) definition of a week that will always start on a Monday. Internally they are stored as the integer number of weeks since 1969-12-29. ```{r} library(grates) # Choose some consecutive dates that begin on a Friday first <- as.Date("2021-01-01") weekdays(first) dates <- first + 0:9 # Below we use a Friday-week grouping weeks <- as_yearweek(dates, firstday = 5L) (dat <- data.frame(dates, weeks)) # we can also use the constructor function if we already have weeks and years yearweek(year = c(2020L, 2021L), week = c(1L, 10L), firstday = 5L) # epiweeks always start on a Sunday (epiwk <- as_epiweek(Sys.Date())) weekdays(as.Date(epiwk)) # isoweeks always start on a Sunday (isowk <- as_isoweek(Sys.Date())) weekdays(as.Date(isowk)) ``` By default plots (using ggplot2) will centre yearweek (epiweek / isoweek) labels: ```{r} #| fig.alt: > #| Bar chart of epiweekly incidence (by week of infection) covering 2014-W12 #| to 2015-W17 inclusive. The graph peaks at 2014-W38. The "descent" from the #| peak tapers off slower than the initial "ascent". Six labels of the form #| 'year-week' are evenly spread along the x-axis and centred on the #| corresponding bars. library(ggplot2) # use simulated linelist data from the outbreaks package dat <- outbreaks::ebola_sim_clean dat <- dat$linelist$date_of_infection # calculate the total number for across each week week_dat <- aggregate( list(cases = dat), by = list(week = as_epiweek(dat)), FUN = length ) head(week_dat) # plot the output (week_plot <- ggplot(week_dat, aes(week, cases)) + geom_col(width = 1, colour = "white") + theme_bw()) ``` We can have non-centred date labels on the x_axis by utilising the associated scale_x_grates functions and explicitly specifying a format for the date labels: ```{r} #| fig.alt: > #| Bar chart of epiweekly incidence (by week of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Six labels of the form 'year-month-day' are evenly spread along #| the x-axis and aligned at the start of the corresponding bars. week_plot + scale_x_grates_epiweek(format = "%Y-%m-%d") ``` ## Period {#period} period objects are stored as the integer number, starting at 0L, of periods since the Unix Epoch (1970-01-01) and a specified offset. Here periods are taken to mean groupings of `n` consecutive days. Like yearweek objects, a period object can be constructed directly via a call to `new_period()` but more easily via the `as_period()` coercion function. `as_period()` takes 3 arguments; `x`, the vector (normally a Date or POSIXt) you wish to group, `n`, the integer number of days you wish to group, and `offset`, the value you wish to start counting groups from relative to the Unix Epoch. For convenience, `offset` can be given as a date you want periods to be relative to (internally this date is converted to integer). Note that storage and calculation purposes, `offset` is scaled relative to `n`. I.e. `offset <- offset %% n` and values of `x` stored relative to this scaled offset. ```{r} #| fig.alt: > #| Bar chart of incidence (by period of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Six labels of the form 'year-month-day' are evenly spread along #| the x-axis and aligned at the start of the corresponding bars. # calculate the total number for across 14 day periods with no offset. # note - 0L is the default value for the offset but we specify it explicitly # here for added clarity period_dat <- aggregate( list(cases = dat), by = list(period = as_period(dat, n = 14L, offset = 0L)), FUN = length ) head(period_dat) ggplot(period_dat, aes(period, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` We can also use a date as an offset ```{r} dates <- as.Date("2020-01-03") + 0:9 offset <- as.Date("2020-01-01") data.frame(dates, period = as_period(dates, n = 7L, offset = offset)) ``` ## yearmonth, yearquarter and year {#yearxxx} yearmonth, yearquarter and year objects are stored as the integer number of months/quarters/years (starting at 0L) since the Unix Epoch (1970-01-01). Similar to other grates objects we provide both coercion and construction functions. ```{r} #| fig.alt: > #| Bar chart of monthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Labels of the form 'year-month' are evenly spread along #| the x-axis and aligned at the centred of the corresponding bars. (month_dat <- aggregate( list(cases = dat), by = list(month = as_yearmonth(dat)), FUN = length )) (month_plot <- ggplot(month_dat, aes(month, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("")) ``` Again we can have non-centred date labels by applying the associated scale ```{r} #| fig.alt: > #| Bar chart of monthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Labels of the form 'year-month-day' are evenly spread along #| the x-axis aligned to the start of the corresponding bars. month_plot + scale_x_grates_yearmonth(format = "%Y-%m-%d") ``` yearquarter works similarly ```{r} #| fig.alt: > #| Bar chart of quarterly incidence (by date of infection) covering the time #| from 2014-Q1 to 2015-Q2 inclusive. The graph peaks over quarters 3 and 4 #| in 2014. Labels on the x-axis and of the form 'year-quarter' are centred on #| the corresponding bars. (quarter_dat <- aggregate( list(cases = dat), by = list(quarter = as_yearquarter(dat)), FUN = length )) ggplot(quarter_dat, aes(quarter, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` As does year ```{r} #| fig.alt: > #| Bar chart of yearly incidence (by date of infection) for 2014 and 2015. #| There were lots more cases in 2014 compared to 2015 (Roughly speaking #| 3000 v 700). (year_dat <- aggregate( list(cases = dat), by = list(year = as_year(dat)), length )) ggplot(year_dat, aes(year, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` ```{r} # Construction functions can also be used yearmonth(2022L, 11L) yearquarter(2022L, 4L) year(2022L) ``` ## month {#month} month objects are stored as the integer number of n-month groups (starting at 0L) since the Unix Epoch (1970-01-01). Here n-months is taken to mean a 'grouping of n consecutive months'. month objects can be constructed directly from integers via the `new_month()` function and through coercion via the `as_month()` function. `as_period()` takes two arguments; `x`, the vector (normally a Date or POSIXt) you wish to group and `n`, the integer number of months you wish to group. ```{r} #| fig.alt: > #| Bar chart of bimonthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around #| September/October 2014. Labels of the form 'year-month-day' are evenly #| spread along the x-axis aligned to the start of the corresponding bars. # calculate the bimonthly number of cases (bimonth_dat <- aggregate( list(cases = dat), by = list(group = as_month(dat, n = 2L)), FUN = length )) # by default lower date bounds are used for the x axis (bimonth_plot <- ggplot(bimonth_dat, aes(group, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("")) ``` Note that the default plotting behaviour of non-centred date labels is different to that of the yearweek, yearmonth, yearquarter and year scales where labels are centred by default. To obtain centred labels you must explicitly set the format to NULL in the scale: ```{r} #| fig.alt: > #| Bar chart of bimonthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around #| September/October 2014. Labels of the form 'year-month to year-month' are #| evenly spread along the x-axis centred on the corresponding bars. bimonth_plot + scale_x_grates_month(format = NULL, n = 2L) ``` # Methods and other functionality For all grates objects we have added many methods and operations to ensure logical and consistent behaviour. The following sections utilise the unique epiweeks from the earlier example: ```{r} weeks <- week_dat$week ``` ## Accessing boundary values and checking contents Some times it is useful to access both the starting dates covered by grates objects as well as the end dates. To this end we provide functions `date_start()` and `date_end()`. To find out whether a `grate` object spans a particular date we provide a `%during%` function. ```{r} dat <- weeks[1:5] data.frame( week = dat, start = date_start(dat), end = date_end(dat), contains.2014.04.14 = as.Date("2014-04-14") %during% dat ) ``` Conversion of grate objects back to dates is analogous to `date_start()`. ```{r} identical(as.Date(weeks), date_start(weeks)) ``` ## min, max, range and sequences ```{r} # min, max and range (minw <- min(weeks)) (maxw <- max(weeks)) (rangew <- range(weeks)) # seq method works if both `from` and `to` are epiweeks seq(from = minw, to = maxw, by = 6L) # but will error informatively if `to` is a different class try(seq(from = minw, to = 999, by = 6L)) ``` ## Addition and subtraction Addition (subtraction) of whole numbers will add (subtract) the corresponding number of weeks to (from) the object ```{r} dat <- head(week_dat) (dat <- transform(dat, plus4 = week + 4L, minus4 = week - 4L)) ``` Addition of two yearweek objects will error as the intention is unclear. ```{r} try(transform(dat, willerror = week + week)) ``` Subtraction of two yearweek objects gives the difference in weeks between them ```{r} transform(dat, difference = plus4 - minus4) ``` epiweek objects can be combined with themselves but not other classes (assuming an epiweek object is the first entry). ```{r} c(minw, maxw) identical(c(minw, maxw), rangew) try(c(minw, 1L)) ```