Within-Subjects Design

In a within-subjects design, subjects give responses across multiple conditions or across time. In other words, measures are repeated across levels of some condition or across time points. For example, subjects can report how happy they feel when they see a sequence of positive pictures and another sequence of negative pictures. In this case, we’d observe each subjects’ happiness in both positive and negative conditions. As another example, we could measure subjects’ job satisifcation every month for 3 months. Here we’d observe job satisfaction for each subject across time. In both cases, subject scores across conditions or time very likely correlate with eachother – they are dependent.

What are contrasts?

Broadly, contrasts test focused research hypotheses. A contrast comprises a set of weights or numeric values that represent some comparison. For example, when comparing two experimental group means (i.e., control vs. treatment), you can apply weights to each group mean and then sum them up. This is the same thing as subtracting one group’s mean from the other’s.

Here’s a quick demonstration

# group means
control <- 5
treatment <- 3

# apply contrast weights and sum up the results
sum(c(control, treatment) * c(1, -1))

## [1] 2

Model and Conceptual Assumptions for Linear Regression

Correct functional form. Your model variables share linear relationships.

No omitted influences. This one is hard: Your model accounts for all relevant influences on the variables included. All models are wrong, but how wrong is yours?

Accurate measurement. Your measurements are valid and reliable. Note that unreliable measures can’t be valid, and reliable measures don’t necessairly measure just one construct or even your construct.

Well-behaved residuals. Residuals (i.e., prediction errors) aren’t correlated with predictor variables or eachother, and residuals have constant variance across values of your predictor variables.

Model and Conceptual Assumptions for Repeated Measures ANOVA

All change scores variances are equal. Similar to the homogenous group variance assumption in between-subjects ANOVA designs, within-subjects designs require that all change score variances (e.g., subject changes between time points, within-subject differences between conditions) are equal. This means that if you compute the within-subject differences between all pairiwse levels (e.g., timepoints, treatment levels), the variances of those parwises differences must all be equal. For example, if you ask people how satisifed they are with their current job every month for 3 months, then the variance of month 2 - month 1 should equal the variance of month 3 - month 2 and the variance of month 3 - month 1. As you might be thinking, this assumption is very strict and probably not realistic in many cases.

A different take on the homogenous change score variance assumption

Sphericity and a special case, compound symmetry. Sphericity is the matrix algebra equivalent to the homogenous change score variance assumption. Compound symmtry is a special case of sphericity. A variance-covariance matrix that satisifies compound symmetry has equal variances (the values in the diagonal) and equal covariances (the values above or below the diagonal).
Short explanation: Sphericity = homogenous change score variance = compound symmetry

Print an example of a matrix that satisfies compound symmetry

# 3 x 3 matrix, all values = 0.20
covmat <- matrix(0.20, nrow = 3, ncol = 3)

# for kicks, make all the variances = 0.50
diag(covmat) <- rep(0.50, 3)

# print
covmat

##      [,1] [,2] [,3]
## [1,]  0.5  0.2  0.2
## [2,]  0.2  0.5  0.2
## [3,]  0.2  0.2  0.5

Libraries

library(tidyverse)
library(knitr)
library(AMCP)
library(MASS)
library(afex)

# select from dplyr
select <- dplyr::select
recode <- dplyr::recode

Load data

From Chapter 11, Table 5 in Maxwell, Delaney, & Kelley (2018)
From help("C11T5"): “The data show that 12 participants have been observed in each of 4 conditions. To make the example easier to discuss, let’s suppose that the 12 subjects are children who have been observed at 30, 36, 42, and 48 months of age. Essentially, for the present data set, 12 children were each observed four times over an 18 month period. The dependent variable is the age-normed general cognitive score on the McCarthy Scales of Children’s Abilities. Interest is to determine if the children were sampled from a population where growth in cognitive ability is more rapid or less rapid than average.”"

data("C11T5")
C11T5$id <- factor(1:12)

# print entire dataset
C11T5 %>% kable()

Months30	Months36	Months42	Months48	id
108	96	110	122	1
103	117	127	133	2
96	107	106	107	3
84	85	92	99	4
118	125	125	116	5
110	107	96	91	6
129	128	123	128	7
90	84	101	113	8
84	104	100	88	9
96	100	103	105	10
105	114	105	112	11
113	117	132	130	12

Restructure data

lC11T5 <- C11T5 %>%
  gather(key = month, value = score, -id) %>% 
  mutate(monthnum = month %>% recode("Months30" = 0, "Months36" = 6, "Months42" = 12, "Months48" = 18),
         linear = monthnum %>% recode(`0` = -3, `6` = -1, `12` = 1, `18` = 3),
         quadratic = monthnum %>% recode(`0` = 1, `6` = -1, `12` = -1, `18` = 1),
         cubic = monthnum %>% recode(`0` = -1, `6` = 3, `12` = -3, `18` = 1),
         monthfac = month %>% factor(ordered = TRUE))

## Warning: package 'bindrcpp' was built under R version 3.4.4

# print first and last 5 observations from dataset
head(lC11T5) %>% kable()

id	month	score	linear	quadratic	cubic	monthfac
1	Months30	108	-3	1	-1	Months30
2	Months30	103	-3	1	-1	Months30
3	Months30	96	-3	1	-1	Months30
4	Months30	84	-3	1	-1	Months30
5	Months30	118	-3	1	-1	Months30
6	Months30	110	-3	1	-1	Months30

tail(lC11T5) %>% kable()

	id	month	score	monthnum	linear	quadratic	cubic	monthfac
43	7	Months48	128	18	3	1	1	Months48
44	8	Months48	113	18	3	1	1	Months48
45	9	Months48	88	18	3	1	1	Months48
46	10	Months48	105	18	3	1	1	Months48
47	11	Months48	112	18	3	1	1	Months48
48	12	Months48	130	18	3	1	1	Months48

Visualize relationships

It’s always a good idea to look at your data. Check some assumptions.

Children scores + means and 95% confidence intervals at each time point

lC11T5 %>% 
  ggplot(mapping = aes(x = month, y = score)) +
  geom_point(position = position_jitter(0.1)) +
  stat_summary(geom = "point", fun.data = mean_cl_normal, color = "red", size = 2) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, color = "red", width = 0) +
  theme_bw()

QQ-plots

Do observations look normal at each time point?

lC11T5 %>% 
  ggplot(mapping = aes(sample = score)) +
  geom_qq() +
  facet_wrap(facets = ~ month, scales = "free") +
  theme_bw()

Variance-covariance matrix among 4 timepoints

Does this look like compound symmetry?

C11T5 %>% select(-id) %>% cov(use = "pairwise.complete.obs") %>% kable()

	Months30	Months36	Months42	Months48
Months30	188.0000	154.36364	127.3636	121.18182
Months36	154.3636	200.54545	143.6364	97.45455
Months42	127.3636	143.63636	178.0000	168.09091
Months48	121.1818	97.45455	168.0909	218.00000

Test polynomial contrasts with pooled error term

Linear regression method

Below I ID use a blocking factor. This could seem nuts, but this approach “controls” for (i.e., partials out variance due to) subject ID. The polynomial contrast tests are equivalent to those you’d see from a special function or command for repeated measures ANOVA. I’ll demonstrate that later in this post.

lm01 <- lm(score ~ linear + quadratic + cubic + id, data = lC11T5)

ANOVA results

anova(lm01)

## Analysis of Variance Table
## 
## Response: score
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## linear     1    540  540.00  8.8833  0.005369 ** 
## quadratic  1     12   12.00  0.1974  0.659722    
## cubic      1      0    0.00  0.0000  1.000000    
## id        11   6624  602.18  9.9063 1.314e-07 ***
## Residuals 33   2006   60.79                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

We found evidence that children’s age-normed general cognitive score increased linearly over the course of 18 months, F(1, 33) = 8.88, p = .005. However, we found no sufficient evidence for a quadratic trend, F(1, 33) = 0.20, p = .660, nor a cubic trend, F(1, 33) < .001, p > .99.

Get results via algebraic formulas

Below, I test the same polynomial contrasts using algebraic equations. There’s no need to follow every last piece of code or understand each formula. I’m presenting this just to be thorough. If you’re curious, these formulas are explained in Chapter 11 of Maxwell, Dalaney, and Kelly (2018).

# contrasts
linear <- c(-3, -1, 1, 3)
quadratic <- c(1, -1, -1, 1)
cubic <- c(-1, 3, -3, 1)

# estimate (mean difference)
linest <- mean(as.matrix(C11T5[, -5]) %*% linear)
quadest <- mean(as.matrix(C11T5[, -5]) %*% quadratic)
cubest <- mean(as.matrix(C11T5[, -5]) %*% cubic)
est <- c(linest, quadest, cubest)

# number of participants
n <- nrow(C11T5)

# sum of squares for the contrast
linssc <- n * linest^2 / sum(linear^2)
quadssc <- n * quadest^2 / sum(quadratic^2)
cubssc <- n * cubest^2 / sum(cubic^2)
ssc <- c(linssc, quadssc, cubssc)

# mean squared error
mse <- anova(lm(score ~ linear + quadratic + cubic + id, data = lC11T5))$`Mean Sq`[5]

# F-ratio
fratio <- ssc / mse

# number of levels (i.e., timepoints)
nlvls <- nlevels(lC11T5$monthfac)

# degrees of freedom
df1 <- 1
df2 <- (nlvls - 1) * (n - 1)

# p-value
p <- 1 - pf(q = fratio, df1 = df1, df2 = df2)

# alpha
alpha <- 0.05

# F critical
fcrit <- qf(p = 1 - alpha, df1 = df1, df2 = df2)

# standard error
linse <- sqrt(mse * sum(linear^2 / n))
quadse <- sqrt(mse * sum(quadratic^2 / n))
cubse <- sqrt(mse * sum(cubic^2 / n))
se <- c(linse, quadse, cubse)

# lower and upper bound
lwr <- c(linest, quadest, cubest) - sqrt(fcrit) * se
upr <- c(linest, quadest, cubest) + sqrt(fcrit) * se

# table of results
tibble(est, se, fratio, p, lwr, upr)

## # A tibble: 3 x 6
##     est    se fratio       p    lwr   upr
##   <dbl> <dbl>  <dbl>   <dbl>  <dbl> <dbl>
## 1   30. 10.1   8.88  0.00537   9.52 50.5 
## 2   -2.  4.50  0.197 0.660   -11.2   7.16
## 3    0. 10.1   0.    1.00    -20.5  20.5

Get the same results using `afex::aov_car`

Fit model

aovcar1 <- aov_car(score ~ monthfac + Error(id / monthfac), data = lC11T5)

# equivalent
# aovez1 <- aov_ez(id = "id", dv = "score", data = lC11T5, within = "monthfac")
# aovlmer1 <- aov_4(score ~ monthfac + (1 + monthfac | id), data = lC11T5)

Print polynomial contrasts

aovcar1 %>% lsmeans(specs = ~ monthfac) %>% contrast(method = "poly")

##  contrast     estimate        SE df t.ratio p.value
##  linear     3.0000e+01 10.065442 33   2.980  0.0054
##  quadratic -2.0000e+00  4.501403 33  -0.444  0.6597
##  cubic      3.4861e-14 10.065442 33   0.000  1.0000

Test polynomial contrasts with separate error term

Long way

Subset each column of data that represents a given timepoint

Multiply that entire column of data by the contrast weight

Add up the columns (makes one column of data)

Test whether the mean of those values = 0 (i.e., one-sample t-test)

t.test(C11T5[, "Months30"] * -3 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * 1 + C11T5[, "Months48"] * 3)

## 
##  One Sample t-test
## 
## data:  C11T5[, "Months30"] * -3 + C11T5[, "Months36"] * -1 + C11T5[,     "Months42"] * 1 + C11T5[, "Months48"] * 3
## t = 2.2414, df = 11, p-value = 0.04659
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   0.5403653 59.4596347
## sample estimates:
## mean of x 
##        30

t.test(C11T5[, "Months30"] * 1 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * -1 + C11T5[, "Months48"] * 1)

## 
##  One Sample t-test
## 
## data:  C11T5[, "Months30"] * 1 + C11T5[, "Months36"] * -1 + C11T5[,     "Months42"] * -1 + C11T5[, "Months48"] * 1
## t = -0.46749, df = 11, p-value = 0.6493
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -11.416264   7.416264
## sample estimates:
## mean of x 
##        -2

t.test(C11T5[, "Months30"] * -1 + C11T5[, "Months36"] * 3 + C11T5[, "Months42"] * -3 + C11T5[, "Months48"] * 1)

## 
##  One Sample t-test
## 
## data:  C11T5[, "Months30"] * -1 + C11T5[, "Months36"] * 3 + C11T5[,     "Months42"] * -3 + C11T5[, "Months48"] * 1
## t = 0, df = 11, p-value = 1
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -12.69584  12.69584
## sample estimates:
## mean of x 
##         0

Short way

%*% conducts a dot product of the two matrices (i.e., a time point and a contrast). This means it sums up the products of the corresponding entries of the two sequences of numbers. This is equivalent to the method above: Apply contrast weights to each column and add up the results.

t.test(as.matrix(C11T5[, -5]) %*% linear)

## 
##  One Sample t-test
## 
## data:  as.matrix(C11T5[, -5]) %*% linear
## t = 2.2414, df = 11, p-value = 0.04659
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   0.5403653 59.4596347
## sample estimates:
## mean of x 
##        30

t.test(as.matrix(C11T5[, -5]) %*% quadratic)

## 
##  One Sample t-test
## 
## data:  as.matrix(C11T5[, -5]) %*% quadratic
## t = -0.46749, df = 11, p-value = 0.6493
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -11.416264   7.416264
## sample estimates:
## mean of x 
##        -2

t.test(as.matrix(C11T5[, -5]) %*% cubic)

## 
##  One Sample t-test
## 
## data:  as.matrix(C11T5[, -5]) %*% cubic
## t = 0, df = 11, p-value = 1
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -12.69584  12.69584
## sample estimates:
## mean of x 
##         0

Custom function for testing via both methods

This function takes “wide-form” data and a contrast as input and it outputs a matrix with two columns: p-values and which testing approach they came from, pooled or separate error term.

wscont <- function(wdata, cont) {
  # long data
  ldata <- stack(wdata)
  colnames(ldata) <- c("y", "t")
  
  # add id factor; make time a factor
  ldata$id <- factor(rep(1:nrow(wdata), times = ncol(wdata)))
  ldata$t <- factor(ldata$t, ordered = TRUE)
  
  # separate variance contrast p-value
  svp <- t.test(as.matrix(wdata) %*% cont)$p.value
  
  # fit model
  fit <- lm(y ~ t + id, data = ldata)
  
  # extract p-value from linear contrast
  pvp <- summary.aov(fit, split = list(t = list(lin = 1)))[[1]][, "Pr(>F)"]["lin"]
  
  # output
  cbind(sepvar = c(0, 1), p = c(pvp, svp))
}

Simulate false positive error rates that result from both approaches

Maxwell, Dalaney, and Kelly (2018) refer to these approaches as the univariate approach and the multivariate approach. To make the labels more descriptive, I refer to each approach by the error term it uses: pooled vs. seperate (MDK2018 do too)

True Covariance Matrix

Here’s an example of a covariance matrix that satisifes the homogenous change score variance assumption. All timepoint variances are equal and so are the covariances between timepoints = 0. This is also called an identitiy matrix.

(truecov <- diag(1, 4, 4))

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

Sample data

Below I generate data that represent 10 subjects’ scores across 4 times points. The mean score at each timepoint is 0, and the correlations among timepoints stem from the true matrix I saved above.

# set random seed so results can be reproduced
set.seed(8888)

dat <- data.frame(mvrnorm(n = 10, mu = c(0, 0, 0, 0), Sigma = truecov))

Sample test

Here I test out the function I made above.

wscont(wdata = dat, cont = linear)

##     sepvar          p
## lin      0 0.12806078
##          1 0.04189252

Generate data that satisfy compound symmetry

Compound Symmetric Covariance Matrix

(cscovmat <- diag(0.5, nrow = 4, ncol = 4) + 0.5)

##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5
## [4,]  0.5  0.5  0.5  1.0

Save true means over time

truemeans <- c(100, 100, 100, 100)

Data

Create a function returns a data.frame with the requested means and variance-covariance structure (compound symmetry in this case)

Save a data.frame with 10,000 n = 50 replications

# function
gendatafun <- function(truecov, ms, n) {
  data.frame(mvrnorm(n = n, mu = ms, Sigma = truecov))
}

# data generation
simdata1 <- do.call(rbind, replicate(n = 10000, gendatafun(truecov = cscovmat, ms = truemeans, n = 50), simplify = FALSE))

Generate data with the same variance-covariance structure as example data

Save covariance matrix from data used in Chapter 11, Table 5 of Maxwell, Dalaney, and Kelley (2018)

(c11t5covmat <- cov(C11T5[, -5]))

##          Months30  Months36 Months42  Months48
## Months30 188.0000 154.36364 127.3636 121.18182
## Months36 154.3636 200.54545 143.6364  97.45455
## Months42 127.3636 143.63636 178.0000 168.09091
## Months48 121.1818  97.45455 168.0909 218.00000

Data

These data have a variance-covariance structure like the Chapter 11, Table 5 data.

# data generation
simdata2 <- do.call(rbind, replicate(n = 10000, gendatafun(truecov = c11t5covmat, ms = truemeans, n = 50), simplify = FALSE))

Add replicate IDs

I’ll use this column later to subset each replication.

simdata1$i <- rep(1:10000, each = 50)
simdata2$i <- rep(1:10000, each = 50)

Get p-values via custom `wscont()` function

In this last line, I add a column, compsymm, that flags which population variance-covariance matrix the data came from (0 = C11T5 matrix, 1 = compound symmetry).

simp1 <- 1:10000 %>% map_df(function(rep) {
  simdata1 %>% 
    filter(i == rep) %>% 
    select(-5) %>% 
    wscont(cont = linear) %>% 
    as_tibble() %>% 
    mutate(compsymm = 1)
})

simp2 <- 1:10000 %>% map_df(function(rep) {
  simdata2 %>% 
    filter(i == rep) %>% 
    select(-5) %>% 
    wscont(cont = linear) %>% 
    as_tibble() %>% 
    mutate(compsymm = 0)
})

Plot false positive errors

p < .05 should only occur 5% of the time

bind_rows(simp1, simp2) %>% 
  mutate(sig = ifelse(p < .05, 1, 0)) %>% 
  group_by(compsymm, sepvar) %>% 
  summarise(falsepos = mean(sig)) %>% 
  ungroup() %>% 
  mutate(sepvar = sepvar %>% recode(`0` = "Pooled Error Term", `1` = "Separate Error Term"),
         compsymm = compsymm %>% recode(`0` = "Unequal Variances & Covariances", `1` = "Compound Symmetry")) %>% 
  ggplot(mapping = aes(x = compsymm, y = falsepos, fill = sepvar)) +
  geom_bar(stat = "identity", position = position_dodge(0.9)) +
  geom_hline(yintercept = 0.05, color = "red") +
  scale_y_continuous(breaks = seq(0, 0.5, 0.01)) +
  theme_bw() +
  theme(legend.position = "top")

Interpretation

Above, I generated 20,000 datasets with known population variance-covariance matrices and columns means. For each dataset, I tested the linear contrast across time using both error term approaches and saved the p-values. When the true population variance-covariance matrice satisfied compound symmetry, both approaches’ contrast test statistics give practically the same results, on average, but when the true population variance-covariance matrice violated compound symmetry – a very common scenario in real datasets – the pooled error term test statistics resulted in false positives over 14% of the time; the separate error term test statistics resulted in false positives only 5% of the time.

Resources

Boik, R. J. (1981). A priori tests in repeated measures designs: Effects of nonsphericity. Psychometrika, 46(3), 241-255.

Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing experiments and analyzing data: A model comparison perspective (3rd ed.). Routledge.

DESIGNING EXPERIMENTS AND ANALYZING DATA includes computer code for analyses from their book, Shiny apps, and more.

General word of caution

Above, I listed resources prepared by experts on these and related topics. Although I generally do my best to write accurate posts, don’t assume my posts are 100% accurate or that they apply to your data or research questions. Trust statistics and methodology experts, not blog posts.

Months30	Months36	Months42	Months48	id
108	96	110	122	1
103	117	127	133	2
96	107	106	107	3
84	85	92	99	4
118	125	125	116	5
110	107	96	91	6
129	128	123	128	7
90	84	101	113	8
84	104	100	88	9
96	100	103	105	10
105	114	105	112	11
113	117	132	130	12

Months30	Months36	Months42	Months48	id
108	96	110	122	1
103	117	127	133	2
96	107	106	107	3
84	85	92	99	4
118	125	125	116	5
110	107	96	91	6
129	128	123	128	7
90	84	101	113	8
84	104	100	88	9
96	100	103	105	10
105	114	105	112	11
113	117	132	130	12

Testing Within-Subjects Contrasts (Repeated Measures) in R

Within-Subjects Design

What are contrasts?

Here’s a quick demonstration

Model and Conceptual Assumptions for Linear Regression

Model and Conceptual Assumptions for Repeated Measures ANOVA

A different take on the homogenous change score variance assumption

Print an example of a matrix that satisfies compound symmetry

Libraries

Load data

Restructure data

Visualize relationships

Children scores + means and 95% confidence intervals at each time point

QQ-plots

Variance-covariance matrix among 4 timepoints

Test polynomial contrasts with pooled error term

Linear regression method

ANOVA results

Interpretation

Get results via algebraic formulas

Get the same results using afex::aov_car

Fit model

Print polynomial contrasts

Test polynomial contrasts with separate error term

Long way

Short way

Custom function for testing via both methods

Simulate false positive error rates that result from both approaches

True Covariance Matrix

Sample data

Sample test

Generate data that satisfy compound symmetry

Compound Symmetric Covariance Matrix

Save true means over time

Data

Generate data with the same variance-covariance structure as example data

Save covariance matrix from data used in Chapter 11, Table 5 of Maxwell, Dalaney, and Kelley (2018)

Data

Add replicate IDs

Get p-values via custom wscont() function

Plot false positive errors

Interpretation

Resources

General word of caution

Get the same results using `afex::aov_car`

Get p-values via custom `wscont()` function

Months30	Months36	Months42	Months48	id
108	96	110	122	1
103	117	127	133	2
96	107	106	107	3
84	85	92	99	4
118	125	125	116	5
110	107	96	91	6
129	128	123	128	7
90	84	101	113	8
84	104	100	88	9
96	100	103	105	10
105	114	105	112	11
113	117	132	130	12