---
title: "Clustering time series using funtimes package"
author: "Srishti Vishwakarma"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
bibliography: vignrefs.bib
vignette: >
  %\VignetteIndexEntry{Clustering time series using funtimes package}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#",
                      fig.width = 7, 
                      fig.height = 6)
# devtools::load_all(".") #remove this line
```

# Introduction

In this tutorial, two unsupervised clustering algorithms from the `funtimes` package are used to identify clusters of Australia's sea level time series.

## Loading libraries

First, load the essential libraries for the analysis:

```{r echo=TRUE, warning=FALSE}
library(funtimes)
library(ggplot2)
library(gridExtra)
library(readxl)
library(reshape2)
```

# Data

The daily sea level data are available from 1993 to 2012 for 17 locations. The data are obtained from @Maharaj_etal_2019 using the following link <http://www.tsclustering.homepage.pt/index.php?p=3>. Download `Application7_3.zip` folder, where the `Aus Sea Levels 17.xlsx` file contains the sea level records. Annual average is taken to convert the temporal resolution.

```{r, eval=FALSE}
d_org <- readxl::read_xlsx("Aus_Sea_Levels_17.xlsx", skip = 1, n_max = 7300)
# yearly average
d <- data.frame(aggregate(d_org[, 4:20], list(d_org$Year), 
                          FUN = 'mean', na.rm = TRUE)[, -1], 
                row.names = unique(d_org$Year))
```

```{r, echo=FALSE}
# saveRDS(d, "Aus_Sea_Levels_17.rds")
d <- readRDS("Aus_Sea_Levels_17.rds")
```

## Plotting time series

Below is the plot of annual time series of sea level for 17 locations:
```{r}
dlong <- reshape2::melt(t(d))
names(dlong)[1:2] <- c("Location", "Year")
ggplot(dlong) + geom_line(aes(x = Year, y = value, color = Location), size = 1) +
  ylab('Sea level (m)') +
  theme_bw()
```

This plot demonstrates the variation in the sea levels across the locations. It can be seen that not all the time series are having a common trend since 1993. Grouping the locations with a common trend could benefit Australian government to assess and implement climate adaptation strategies for the impact of sea level rise on clustered locations.

# Clustering time series based on trend synchronism

The first function from the package to test is the `sync_cluster` that groups the time series with the common linear trend. The window parameter `w` is set here for number of slides in each window. If the number of years are not enough in the time series, this parameter is required to be set.

```{r}
set.seed(123)
Clus_sync <- sync_cluster(d ~ t, Window = 3, B = 100)
Clus_sync
```

Total `r sum(Clus_sync$cluster != 0)` locations are clustered with a common linear trend, while the remaining `r sum(Clus_sync$cluster == 0)` are not tied to any other location and form so-called noise cluster.

Below is the plot of the clustered time series of sea level, where `Cluster 0` indicates the noise cluster without any common linear trend, while `Cluster 1` shows the time series of locations with a common linear trend:

```{r}
for (i in 0:max(Clus_sync$cluster)) {
  assign(paste('py', i, sep = ''),
         ggplot(melt(t(d[, Clus_sync$cluster == i]))) +
           geom_line(aes(x = Var2,y = value,color = Var1),size = 1) +
           ylab('Sea level (m)') + xlab('Year') +
           theme_bw() + ggtitle(paste('Cluster',i)) +
           theme(axis.text = element_text(size = 13), axis.title.x = element_text(size = 15),
                 axis.title.y = element_text(size = 15), legend.text = element_text(size = 10),
                 legend.title = element_blank(), legend.key.size = unit(0.3, "cm")))
}
grid.arrange(py0, py1)
```


# Clustering time series using a spatiotemporal approach

The `BICC` function applies an unsupervised spatiotemporal clustering algorithm, TRUST, from @Ciampi_etal_2010. The algorithm has a few tuning parameters, and the `BICC` function automatically selects two of those (`Delta` and `Epsilon`; for manual setting of all the parameters, use the lower-level functions `CSlideCluster` and `CWindowCluster`). 
First, the time series are clustered within small slides; the length of the slides is defined with the parameter `p` (i.e., number of time-series observations in each slide). 
Then, slides are aggregated into windows (each window contains `w` consecutive slides), and slide-level cluster assignments are used to cluster the time series at the window level. When defining the windows, the user can also set the step `s`, which is the number of steps used to shift the window (if `s = w`, the windows do not overlap).

```{r}
Clus_BICC <- BICC(as.matrix(d), p = 5, w = 4, s = 4)
Clus_BICC
```

The algorithm detected only one cluster.

# Citation {-}

This vignette belongs to R package `funtimes`. If you wish to cite this page, please cite the package:
```{r}
citation("funtimes")
```

# References