---
title: "opa"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{opa}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

palette(c("#0073C2", "#EFC000", "#868686", "#CD534C"))

library(lattice)
```

## Background

`opa` is an implementation of methods described in publications including [Thorngate (1987)](https://doi.org/10.1016/S0166-4115(08)60083-7) and [Grice et al. (2015)](https://doi.org/10.1177/2158244015604192). Thorngate (1987) attributes the original idea to:

Parsons, D. (1975). _The directory of tunes and musical themes_. S. Brown. 

Ordinal pattern analysis may be useful as an alternative, or addition, to other methods of analyzing repeated measures data such as repeated measures ANOVA and mixed-effects models.

## Modeling repeated measures data

Once installed, you can load `opa` with

```{r load_opa}
library(opa)
```

### Data

For this example we will use childhood growth data reported by Potthoff & Roy (1964) consisting of measures of the distance between the pituitary and the pteryo-maxillary fissure. Distances were recorded in millimeters when each child was 8, 10, 12 and 14 years old. This is same data set available as `Orthodont` from the `nlme` package.

```{r str_data}
str(pituitary)
```


```{r plot_data, fig.width=6, fig.height=6}
xyplot(distance ~ age | individual, pituitary, type = c("g", "p", "l"),
       groups = sex, cex = 1.2, pch = 21,
       fill = c("#EFC00070", "#0073C270"), col = c("#EFC000", "#0073C2"),
       scales = list(x = list(at = c(8, 10, 12, 14))),
       xlab = "Age (years)", ylab = "Distance (mm)",
       main = "Pituitary-Pterygomaxillary Fissure Distance",
       key = list(space = "bottom", columns = 2,
                  text = list(c("Female", "Male")),
                  lines = list(col = c("#EFC000", "#0073C2")),
                  points = list(pch = 21, fill = c("#EFC00070", "#0073C270"),
                                col = c("#EFC000", "#0073C2"))))
```

#### Wide format

`opa` requires data in wide format, with one row per individual and one column per measurement time or condition.

```{r reshape_data}
dat_wide <- reshape(data = pituitary,
                    direction = "wide",
                    idvar = c("individual", "sex"),
                    timevar = "age",
                    v.names = "distance")
```

```{r}
head(dat_wide)
```

### Specifying a hypothesis

For this analysis we will assume a hypothesis of monotonic increase in distance with increasing age. This monotonic increase hypothesis can be encoded using the `hypothesis()` function.

```{r define_h1}
h1 <- hypothesis(c(1, 2, 3, 4), type = "pairwise")
```

Printing the hypothesis object shows that it consists of sub-hypotheses about six ordinal relations. In this case it is hypothesized that each of the six ordinal relations are increases, coded as `1`.

```{r print_h1}
print(h1)
```

The hypothesis can also be visualized with the `plot()` function. Note that the y-axis is unitless; the vertical distance between points has no meaning. The information represented by the y-axis is relative: bigger, smaller or equal.

```{r plot_h1, fig.width=4, fig.height=4}
plot(h1)
```

### Fitting an ordinal pattern analysis model

How well a hypothesis accounts for observed data can be quantified at the individual and group levels using the `opa()` function. The first required argument to `opa()` is a dataframe consisting of only the response variable columns. The second required argument is the hypothesis.

```{r fit_model1}
m1 <- opa(dat_wide[,3:6], h1)
```

The results can be summarized using the `summary()` function.

```{r summary_model1}
summary(m1)
```

The individual-level results can also be visualized using `plot()`.

```{r plot_model1, fig.width=7, fig.height=5}
plot(m1)
```

It is also possible to determine how well the hypothesis accounts for the data at the group level for each of the sub-hypotheses using the `compare_conditions()` function.

```{r compare_conds_model1}
compare_conditions(m1)
```

These results indicate that the hypothesis accounts least well for the relationship between the first two measurement times. 

### Comparing hypotheses

The output of `compare_conditions()` indicates that it may be informative to consider a second, non-monotonic hypothesis:

```{r define_h2}
h2 <- hypothesis(c(2, 1, 3, 4))
```

`h2` contains the sub-hypothesis that the second measurement, at age 10, may be shorter than the first measurement taken at age 8.

```{r plot_h2, fig.width=4, fig.height=4}
plot(h2)
```

The hypothesized decrease between measurements at ages 8 and 10 is encoded as `-1`.

```{r print_h2}
print(h2)
```

To assess the adequacy of `h2` we fit a second `opa()` model, this time passing `h2` as the second argument.

```{r fit_model2}
m2 <- opa(dat_wide[,3:6], h2)
```

The `compare_hypotheses()` function can then be used to compare how well two hypothese account for the observed data.

```{r compare_hypotheses}
(comp_h1_h2 <- compare_hypotheses(m1, m2))
```

These results indicate that the monotonic increase hypothesis encoded by `h1` better acounts for the data than `h2`. However, the difference between these hypotheses is not large. The computed chance value indicates that a difference at least as large could be produced by chance, through random permutations of the data, about one quarter of the time. The calculation of the group-level chance-value can be visualized by plotting the object returned by `compare_hypotheses()`.

```{r}
plot(comp_h1_h2)
```

### Comparing groups

So far the analyses have not considered any possible differences between males and females in terms of how well the hypotheses account for the data. It is possible that a given hypothesis account better for males than for females, or vice-versa. A model which accounts for groups within the data can be fitted by passing a factor vector or dataframe column to the `group` argument.

```{r fit_model3}
m3 <- opa(dat_wide[,3:6], h1, group = dat_wide$sex)
```

The `plot()` function can be used to get a first sense of how PCCs and chance-values differ between groups.

```{r plot_model3}
plot(m3)
```

There is not a clearly visible pattern that appears to distinguish males from females. The `compare_groups()` function may be used to more precisely quantify the difference in hypothesis performance between the groups within the data.

```{r compare_groups}
(comp_m_f <- compare_groups(m3, "M", "F"))
```

In this case, `compare_groups()` indicates that the difference in how well the hypothesis accounts for males and females growth is very small. The chance value shows that a difference at least as great could be produced by chance around 80% of the time. As with the hypothesis comparison, the computation of the chance value for the group comparison can be visualized by plotting the object returned by `compare_groups()`.

```{r}
plot(comp_m_f)
```

### Difference thresholds

Each of the above models has treated any numerical difference in the distance data as a true difference. However, we may wish to treat values that differ by some small amount as equivalent. In this way we may define a threshold of practical or clinical significance. For example, we may decide to consider only differences of at least 1 mm. This can be achieved by passing a threshold value of `1` using the `diff_threshold` argument. 

```{r fit_model4}
m4 <- opa(dat_wide[,3:6], h1, group = dat_wide$sex, diff_threshold = 1)
```

Setting the difference threshold to 1 mm results in a greater difference in hypothesis performance between males and females.

```{r diff_threshold_results}
group_results(m4)
```

However, `compare_groups()` reveals that even this larger difference could occur quite frequently by chance.

```{r compare_groups_with_diff_threshold}
compare_groups(m4, "M", "F")
```

## References

Grice, J. W., Craig, D. P. A., & Abramson, C. I. (2015). A Simple and Transparent Alternative to Repeated Measures ANOVA. SAGE Open, 5(3), 215824401560419. https://doi.org/10.1177/2158244015604192

Parsons, D. (1975). _The directory of tunes and musical themes_. S. Brown.

Potthoff, R. F., & Roy, S. N. (1964). A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems. Biometrika, 51(3/4), 313–326. https://doi.org/10.2307/2334137

Thorngate, W. (1987). Ordinal Pattern Analysis: A Method for Assessing Theory-Data Fit. Advances in Psychology, 40, 345–364. https://doi.org/10.1016/S0166-4115(08)60083-7