Tuesday, January 18, 2011

Perform a repeated measures ANOVA with proc mixed

SAS proc mixed is a very powerful procedure for a wide variety of statistical analyses, including repeated measures analysis of variance. We will illustrate how you can perform a repeated measures ANOVA using a standard type of analysis using proc glm and then show how you can perform the same analysis using proc mixed. We use an example of from Design and Analysis by G. Keppel. Pages 414-416. This example contains eight subjects (sub) with one between subjects IV with two levels (group) and one within subjects IV with four levels (indicated by position dv1-dv4). These data are read into a temporary SAS data file called wide below.
DATA wide;

  INPUT sub group dv1 dv2 dv3 dv4;

CARDS;

1 1  3  4  7  3
2 1  6  8 12  9
3 1  7 13 11 11
4 1  0  3  6  6
5 2  5  6 11  7
6 2 10 12 18 15
7 2 10 15 15 14
8 2  5  7 11  9
;
RUN;
 
PROC PRINT DATA=wide ;
RUN; 
OBS    SUB    GROUP    DV1    DV2    DV3    DV4
 1      1       1        3      4      7      3
 2      2       1        6      8     12      9
 3      3       1        7     13     11     11
 4      4       1        0      3      6      6
 5      5       2        5      6     11      7
 6      6       2       10     12     18     15
 7      7       2       10     15     15     14
 8      8       2        5      7     11      9
We start by showing how to perform a standard 2 by 4 (between / within) ANOVA using proc glm.
 PROC GLM DATA=wide;
  CLASS group;
  MODEL dv1-dv4 = group / NOUNI ;
  REPEATED trial 4;
RUN; 
The results of this analysis are shown below.
General Linear Models Procedure
Class Level Information

Class    Levels    Values

GROUP         2    1 2

Number of observations in data set = 8

General Linear Models Procedure
Repeated Measures Analysis of Variance
Repeated Measures Level Information

Dependent Variable        DV1      DV2      DV3      DV4

    Level of TRIAL          1        2        3        4

Manova Test Criteria and Exact F Statistics for
the Hypothesis of no TRIAL Effect
H = Type III SS&CP Matrix for TRIAL   E = Error SS&CP Matrix

S=1    M=0.5    N=1

Statistic                     Value          F      Num DF    Den DF  Pr > F
Wilks' Lambda              0.00829577   159.3911         3         4  0.0001
Pillai's Trace             0.99170423   159.3911         3         4  0.0001
Hotelling-Lawley Trace   119.54335260   159.3911         3         4  0.0001
Roy's Greatest Root      119.54335260   159.3911         3         4  0.0001

Manova Test Criteria and Exact F Statistics for
the Hypothesis of no TRIAL*GROUP Effect
H = Type III SS&CP Matrix for TRIAL*GROUP   E = Error SS&CP Matrix

S=1    M=0.5    N=1

Statistic                     Value          F      Num DF    Den DF  Pr > F

Wilks' Lambda              0.60915493     0.8555         3         4  0.5324
Pillai's Trace             0.39084507     0.8555         3         4  0.5324
Hotelling-Lawley Trace     0.64161850     0.8555         3         4  0.5324
Roy's Greatest Root        0.64161850     0.8555         3         4  0.5324
General Linear Models Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects

Source                  DF      Type III SS     Mean Square   F Value     Pr > F
GROUP                    1     116.28125000    116.28125000      2.51     0.1645
Error                    6     278.43750000     46.40625000

General Linear Models Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects

Source: TRIAL
                                                                   Adj  Pr > F
     DF       Type III SS       Mean Square   F Value   Pr > F    G - G    H - F
      3      129.59375000       43.19791667     22.34   0.0001   0.0001   0.0001

Source: TRIAL*GROUP
                                                                   Adj  Pr > F
     DF       Type III SS       Mean Square   F Value   Pr > F    G - G    H - F
      3        3.34375000        1.11458333      0.58   0.6380   0.5693   0.6380

Source: Error(TRIAL)
     DF       Type III SS       Mean Square
     18       34.81250000        1.93402778

Greenhouse-Geisser Epsilon = 0.6337
       Huynh-Feldt Epsilon = 1.0742
Now, we will illustrate how you can perform this same analysis in proc mixed. First, we need to reshape the data so it is in the shape expected by proc mixed. proc glm expects the data to be in a wide format, where each observation corresponds to a subject. By contrast, proc mixed expects the data to be in a long format where each observation corresponds to a trial. In this case, proc mixed expects that there would be four observations per subject and that each observation would correspond to the measurements on the four different trials. Below we show how you can reshape the data for analysis in proc mixed.
 DATA long ;
  SET Wide;
  dv = dv1; trial = 1; OUTPUT;
  dv = dv2; trial = 2; OUTPUT;
  dv = dv3; trial = 3; OUTPUT;
  dv = dv4; trial = 4; OUTPUT;
  DROP dv1 - dv4 ;
RUN;
 
PROC PRINT DATA=long ;
RUN; 
You can compare the proc print for wide with the proc print for long to verify that the data were properly reshaped.
OBS    SUB    GROUP    DV    TRIAL

  1     1       1       3      1
  2     1       1       4      2
  3     1       1       7      3
  4     1       1       3      4
  5     2       1       6      1
  6     2       1       8      2
  7     2       1      12      3
  8     2       1       9      4
  9     3       1       7      1
 10     3       1      13      2
 11     3       1      11      3
 12     3       1      11      4
 13     4       1       0      1
 14     4       1       3      2
 15     4       1       6      3
 16     4       1       6      4
 17     5       2       5      1
 18     5       2       6      2
 19     5       2      11      3
 20     5       2       7      4
 21     6       2      10      1
 22     6       2      12      2
 23     6       2      18      3
 24     6       2      15      4
 25     7       2      10      1
 26     7       2      15      2
 27     7       2      15      3
 28     7       2      14      4
 29     8       2       5      1
 30     8       2       7      2
 31     8       2      11      3
 32     8       2       9      4
Now that the data are in the proper shape, we can analyze it with proc mixed.
The class and model statements are used much the same as with proc glm. However, the repeated statement is different. The repeated statement is used to indicate the within subjects (repeated) variables, but note that trial is on the class statement, unlike proc glm. This is because the data are in long format and that there indeed is a separate variable indicating the trials.
We also use the repeated statement to indicate which variable indicates the different subjects (via subject=sub) and we can specify the covariance structure among the repeated measures (in this case we choose compound symmetry via type=cs which is the same structure that proc glm uses. Unlike proc glm, proc mixed has a wide variety of covariance structures you can choose from so you can choose one that matches your data (see the proc mixed manual for more information on this).
 PROC MIXED DATA=long;
  CLASS sub group trial;
  MODEL dv = group trial group*trial;
  REPEATED trial / SUBJECT=sub TYPE=CS;
run; 
As you see below, the results correspond to those produced by proc glm. Note that proc mixed does not produce Sums of Squares or Mean Squares. This is because proc mixed uses maximum likelihood estimation instead of a sums of squares style of computation.
                              The MIXED Procedure

                            Class Level Information

                       Class     Levels  Values

                       SUB            8  1 2 3 4 5 6 7 8
                       GROUP          2  1 2
                       TRIAL          4  1 2 3 4

                       REML Estimation Iteration History

               Iteration  Evaluations     Objective     Criterion
                       0            1   96.74510121
                       1            1   69.98784546    0.00000000

Convergence criteria met.

                     Covariance Parameter Estimates (REML)

                     Cov Parm   Subject      Estimate
                     CS         SUB       11.11805556
                     Residual              1.93402778

                        Model Fitting Information for DV

                    Description                        Value
                    Observations                     32.0000
                    Res Log Likelihood              -57.0484
                    Akaike's Information Criterion  -59.0484
                    Schwarz's Bayesian Criterion    -60.2265
                    -2 Res Log Likelihood           114.0969
                    Null Model LRT Chi-Square        26.7573
                    Null Model LRT DF                 1.0000
                    Null Model LRT P-Value            0.0000

                            Tests of Fixed Effects

                  Source        NDF   DDF  Type III F  Pr > F
                  GROUP           1     6        2.51  0.1645
                  TRIAL           3    18       22.34  0.0001
                  GROUP*TRIAL     3    18        0.58  0.6380
Proc mixed is much more powerful than proc glm. Because it is more powerful, it is more complex to use.

No comments:

Post a Comment

I love to hear from you! Leave a comment.
If your question is unrelated to this article, please use my Facebook page.