The primary goal of the analysis is to produce a map of summaries of the observed SST change in the Australasian region. The
SST data set is spatial and temporal and can be thought of a large set of time series, one for each spatial grid cell. Each
time series spans a period of approximately 20 years. The SST spatial resolution is high, so there is no need to do spatial
interpolation; analyses on individual spatial locations is sufficient. There are almost 2 million non-empty grid cells with
more than 750 observation days throughout the entire period which are analysed separately. Each of these 2 million grid
cells has a model fitted to it, which is a substantial computational challenge. Any grid cell with less than 750 observation
days is not analysed as the amount of information\text it{may} not be sufficient to support the model. This is a conservative
approach but it excludes only a tiny proportion of grid cells. Summaries of the individual analyses can be represented spatially
to give an idea about spatial variation but neighbouring locations are not incorporated into each grid cell's analysis.
The basic principle is that the temperature time-series, for any spatial location, can be decomposed into:
Inter-annual variability This includes the long-term trend and any variability with multi-year time-scale. This is modelled
as a smooth function of time f (t), say. Here t reflects the day since the startof the observation and 0 = t = 7091 days,
where 7091 is the number of observation days includedin the study.
Annual cycle This is a periodic function with the same timing and amplitude every year. It is assumed to be a smooth function
of day-within-year but not necessarily trigonometric or a function of trigonometric functions. Denote this function as g(d)
say, where 0 = d = 365 days (or 366 days ina leap year).
Residual All random (and some non-random) deviations from the model's expectation. It includes: a) patterns that occur on
a time scale that is shorter than the 1-day data (diurnal effects - a cell is not measured at the same time each day), and
2) non-smooth trends and other model misfit issues.The latter can occur when one of the modelling assumptions fails. An
example is when the annual cycle changes abruptly between years, as can happen in an El Nino year, for example.
The components of variation in the time-series data can be formally included into a statistical model, viz y(t, d) = f (t)
+ g(d) + et
where y(t, d) is the SST observation on the tth day after the time-series starts (0 = t = 7091 days) that is observed on
the dth day of the year (0 = d = 365). The functional form of the longer-term trend, f (t), and the seasonal cycle, g(d),
could take many forms. Here, a penalised cubic regression spline is used for f (t) and a penalised cyclic regression spline
is used for g(d)
|