## Multivariate Repeated Measures

### Multivariate Repeated Measures

Does SAS have a procedure that allows one to analyze the following?

I have a nested-factorial design (Ponds within nested within herbivory,
ponds crossed with plant type.  Each pondXplant has 3 replicate plots.  Two
species of each plant are planted within each plot).  Within each plot, 3
measures will be taken, these measures are of an ordinal scale (0 to 7,
roughly corresponding to the log-odds).  Each measure will describe the
proportion of a specific plant (or open water) within each plot.  For
example, within each plot there are 2 species of plant and open water (3
categories), if each occupied 33% of the plot, we would record 3, 3, 3 for
each category within that plot.  Each plot will be measured each quarter for
2 years, suggesting a repeated measures component.  The questions of
interest relate to species composition within plots through time based on
competition, and treatment differences associated with herbivory.

If I had a single measure for each plot, I would think in terms of Proc
MIXED, but I have 3 measures per plot per pond per time.  The relatedness of
the measures within the plot would seem to negate MIXED, but may not if
someone could advise me.  Because I have mutually exclusive and exhaustive
categories within each plot, I thought maybe CATMOD would be the procedure
to use, but my measures are not multinomial, they are ordinal.  I could use
GLM, but am concerned because it appears I have less ability to alter the
covariance matrix.

Any suggested directions would be appreciated.

Warren Schlechte
Heart of the Hills Research Station
HC7 Box 62
Ingram, Texas 78025
Phone:   830.866.3356
Fax:       830.866.3549

### Multivariate Repeated Measures

Warren,

The MIXED procedure can handle multivariate repeated measures.  I am
not totally clear on the design of the experiment (are there only
two plants species for the entire experiment, or does each plot have
two plant types, with different plots having different combinations
of plant species?).  The experiment would be most easily analyzed
if there were only teo species in the entire experiment, but some
variant of the analysis presented below would extend to experiments
with more than one species.

As I understand the design, you have random herbivories.  Nested
within herbivories, you have several ponds (the number of ponds may
vary across herbivories).  Nested within ponds within herbivory are
plots.  Within each plot you have measures of proportion of plot
coverage for each of the two species as well as proportion of open
water.  One question here: will these proportions sum to 1.0?  If
so, then you can only employ two of these three variables, the third
being a linear combination of the other two.  If other plant species
may find their way into a plot, so that the sum of the three
proportions is not always 1.0, then you may use all three variables
as responses.

Now, let me assume that the sum of the three proportions would not
total 1.0.  Let me further assume that there are only two plant
species with which the experiment is concerned so that every plot
is identical in experimental design.  Now, construct a vector
response and indicator variable as

Response        Indicator  Observation
prop(species 1)     1      (herbivory 1, pond 1, plot 1, time=1)
prop(species 2)     2      (herbivory 1, pond 1, plot 1, time=1)
prop(open H2O)      3      (herbivory 1, pond 1, plot 1, time=1)
prop(species 1)     1      (herbivory 1, pond 1, plot 2, time=1)
prop(species 2)     2      (herbivory 1, pond 1, plot 2, time=1)
prop(open H2O)      3      (herbivory 1, pond 1, plot 2, time=1)
prop(species 1)     1      (herbivory 1, pond 1, plot 3, time=1)
prop(species 2)     2      (herbivory 1, pond 1, plot 3, time=1)
prop(open H2O)      3      (herbivory 1, pond 1, plot 3, time=1)
prop(species 1)     1      (herbivory 1, pond 2, plot 1, time=1)
prop(species 2)     2      (herbivory 1, pond 2, plot 1, time=1)
prop(open H2O)      3      (herbivory 1, pond 2, plot 1, time=1)
prop(species 1)     1      (herbivory 1, pond 2, plot 2, time=1)
prop(species 2)     2      (herbivory 1, pond 2, plot 2, time=1)
prop(open H2O)      3      (herbivory 1, pond 2, plot 2, time=1)
prop(species 1)     1      (herbivory 1, pond 2, plot 3, time=1)
prop(species 2)     2      (herbivory 1, pond 2, plot 3, time=1)
prop(open H2O)      3      (herbivory 1, pond 2, plot 3, time=1)
...

I don't know what covariates you might have for each observation nor
whether you have treatments which you apply to the different plots,
so the code below ignores those effects.  We have just a multivariate
repeated measures intercept only model for each of the three responses.
If you have treatments or covariates which could affect the proportion
of each of the three responses over time, then those variables should
also be included not just as main effects, but as interactions with
species, time, and species by time interaction.  Herewith, the code
which would account for the different correlation structures in your
data.

proc format;           /* Just setting up a format which will aid */
value indic          /* model interpretation.                   */
1 = 'Species 1'
2 = 'Species 2'
3 = 'Open H2O';
run;

proc mixed data=...;
class herbivory pond plot indicator time;

model response = indicator|time;
/* Note that this fixed effect is just an intercept for   */
/* each species at any given time across all herbivories, */
/* ponds, and plots.                                      */

random herbivory pond(herbivory) / group=indicator*time;
/* Note that intercepts are random for each herbivory and   */
/* for each pond within a herbivory.  I allow the variance  */
/* of the herbivory effect to be separately estimated for   */
/* each of the two species and for open water, as well as   */
/* over time.  Hopefully, you have enough                   */
/* observations to estimate the different variances and get */
/* reasonably stable estimates.  You might find that 1) you */
/* don't have enough data to estimate different variances   */
/* at each time (in which case you should see no structure  */
/* to the variances, that is the variances vor a given      */
/* species bounce around), or 2) that you do have sufficient*/
/* data to estimate variances and the variances are fairly  */
/* stable with respect to time.  In either of these cases,  */
/* might then take time out of the group specification.     */
/* You should be able to get a likelihood ratio test of the */
/* importance of time in the group specification by fitting */
/* the random effects specification shown above, and        */
/* fitting a random effects specification with time removed */
/* from the group specification, comparing the difference   */
/* in -2LL to a chi-square distribution with df the         */
/* difference in the number of variances estimated by the   */
/* two models.  Similar comments apply to the plot effects. */

/* Within plot effects which reflect competition between    */
/* species and correlations across time are taken into      */
/* account in the repeated statement given next.            */

repeated indicator*time / subject=plot(pond*herbivory)
type=un@un;  /* or un@cs or un@ar*/

lsmeans indicator*time / diff;
/* lsmeans statement would allow you to test species 1 vs 2 */
/* proportions at time 1, time2, time3, as well as species  */
/* 1 difference in proportions at time 1 vs time 2, at time */
/* 1 vs time 3, at time 2 vs time 3, etc.  You could set up */
/* other contrasts, including multiple df tests of species  */
/* effects at the different times, and time effects among   */
/* the different species employing contrast statements.     */
/* That is too much for me to code here, and you will likely*/
/* have other variables and other contrasts to test, so I   */
/* leave all that to you.  But such tests can be done.      */

format indicator indic.
run;

Let me make a couple of comments about your response.  Proportions
are constrained to lie on the range 0 to 1.  The constraint on the
range is necessarily a violation of normality assumptions upon which
proc mixed is based.  There are a number of other problems with
proportions as responses, among them nonconstant variance.  Similar
issues would affect your ordinal values which are derived from
underlying proportions.  Now, if the proportions of each of the
responses could never go to zero or one, then proportions are better
analyzed when subjected to some transformation.  The arcsin
transformation is variance stabilizing, and it allows the response
to have range negative infinity to positive infinity.  I would
really urge you to analyze the responses subjected to the arcsin
transformation, at least for purposes of testing.  You can back
transform lsmeans for estimation purposes to get estimates on the
original range if you need to be able to interpret the results.  I
would dismiss the ordinal response variable for this problem.  It
throws away information and is, in fact, harder to analyze and harder
to interpret.

There are other transformations which can be used for proportions.
Draper and Smith also mention ln(Y/(1-Y)) which has the advantage
that the parameter estimates can be interpreted as log odds, but the
transformation is not variance stabilizing, so you must perform a
weighted regression.  If the area used to determine the proportions
of the three responses is the same across all plots, then Y(1-Y)
would be the weight variable.  If a different area is employed for
different plots, then the weight variable would be the Y(1-Y)
multiplied by plot area sampled.

Best wishes,

Dale

--- Warren Schlechte <wschl...@KTC.COM> wrote:

> Does SAS have a procedure that allows one to analyze the following?

> I have a nested-factorial design (Ponds within nested within
> herbivory,
> ponds crossed with plant type.  Each pondXplant has 3 replicate
> plots.  Two
> species of each plant are planted within each plot).  Within each
> plot, 3
> measures will be taken, these measures are of an ordinal scale (0 to
> 7,
> roughly corresponding to the log-odds).  Each measure will describe
> the
> proportion of a specific plant (or open water) within each plot.  For
> example, within each plot there are 2 species of plant and open water
> (3
> categories), if each occupied 33% of the plot, we would record 3, 3,
> 3 for
> each category within that plot.  Each plot will be measured each
> quarter for
> 2 years, suggesting a repeated measures component.  The questions of
> interest relate to species composition within plots through time
> based on
> competition, and treatment differences associated with herbivory.

> If I had a single measure for each plot, I would think in terms of
> Proc
> MIXED, but I have 3 measures per plot per pond per time.  The
> relatedness of
> the measures within the plot would seem to negate MIXED, but may not
> if
> someone could advise me.  Because I have mutually exclusive and
> exhaustive
> categories within each plot, I thought maybe CATMOD would be the
> procedure
> to use, but my measures are not multinomial, they are ordinal.  I
> could use
> GLM, but am concerned because it appears I have less ability to alter
> the
> covariance matrix.

> Any suggested directions would be appreciated.

> Warren Schlechte
> Heart of the Hills Research Station

...

read more »

hi, here is my problem:

my data contains 2 class variables -- Denture Treatment (2 levels, A &
B) and
Insulin Treatment (2 levels, 0 for No & 1 for Yes). My response
variables are
measured repeatedly for both before Denture Treatment & after Denture
Treatment.

In SASonlinedoc, an example similar with mine is like this:
( \onlinedoc\sasdoc\sashtml\stat\chap30\sect58.htm
SAS Release 8.1)
=========================================
data Trial;
input Treatment \$ Repetition PreY1 PostY1 FollowY1
PreY2 PostY2 FollowY2;
datalines;
A        1  3  13  9  0  0  9
A        2  0  14 10  6  6  3
A        3  4   6 17  8  2  6
A        4  7   7 13  7  6  4
A        5  3  12 11  6 12  6
A        6 10  14  8 13  3  8
B        1  9  11 17  8 11 27
B        2  4  16 13  9  3 26
B        3  8  10  9 12  0 18
B        4  5   9 13  3  0 14
B        5  0  15 11  3  0 25
B        6  4  11 14  4  2  9
Control  1 10  12 15  4  3  7
Control  2  2   8 12  8  7 20
Control  3  4   9 10  2  0 10
Control  4 10   8  8  5  8 14
Control  5 11  11 11  1  0 11
Control  6  1  5  15  8  9 10
;

proc glm data=Trial;
class Treatment;
model PreY1 PostY1 FollowY1
PreY2 PostY2 FollowY2 = Treatment / nouni;
repeated Response 2 identity, Time 3;
run;
=========================================

For my problem, I just need to add another Class variable in the
class statement. BUT, it seems in this analysis, the Response
variables need to be Continuous data, while my Reponse are
Ordinal data (from multiple-choice questions of sampling paper)

So, is there any other way to do this analysis suitable for my
data?