Scheduled for New Methods for Analyzing and Modeling Complex Data in Interdisciplinary Research, Friday, March 16, 2007, 8:45 AM - 10:00 AM, Convention Center: 328


Potential Application of Generalized Estimating Equation Method in Physical Activity Research: A Tutorial

Yong Gao, University of Illinois-Urbana-Champaign, Urbana, IL and Weimo Zhu, University of Illinois at Urbana-Champaign, Urbana, IL

The Generalized Estimating Equation (GEE) method developed by Liang and Zeger (1986) has been recently used in analysis of longitudinal or clustered data in medical and life sciences. GEE is an extension of generalized linear models. By incorporating within-subject correlation of response variables into models and being flexible for use in analyzing dependent variables that are not normally distributed, the GEE approach can estimate more efficient and unbiased regression parameters (Ballinger, 2004; Diggle, Heagerty, Liang, & Zeger, 2002; Hardin, & Hibe, 2003;). To fit a GEE model, three specifications have to be made: (a) A link function that will “linearize” the regression equation, such as an identity link function for normally distributed data, a logit link for binary dependent variables and a log link for counted response variables etc.; (b) The distribution of the dependent variable(s) since, typically, if the responses are binary data, the binomial distribution should be specified; if counted data, Poisson distribution should be used and if continuous data, normal distribution is more proper; and (c) The correlation structure of within-subject responses (i.e., the “working” correlation matrix). There are four common options for the specification of the correlation structure, including autoregressive correlation structure, exchangeable correlation structure, unstructured correlation matrix and independent structure (Ballinger, 2004; Hardin, & Hibe, 2003). Although dependent variables in many physical activity research studies are not normally distributed and data are correlated within subjects, the application of GEE has been very limited. Instead, suboptimal methods such as repeated measures ANOVA and linear regression approach have often been used, which may violate some important statistical assumptions, lead to incorrect or less efficient estimation of regression model parameters; therefore, complicate interpretations of the results, and lead to erroneous conclusions (Diggle, et al., 2002; Gardner, Milvey, & Shaw, 1995; Harrison, 2002). After providing an overview of GEE's statistical foundation, this presentation will outline the basic steps of GEE and the advantages of using GEE over other methods (e.g., repeated ANOVA) for analyzing longitudinal and clustered data. An example using physical activity research data will be given to demonstrate how to use SAS software to prepare data and apply GEE to analyze data, test hypotheses and interpret the findings.
Keyword(s): interdisciplinary, measurement/evaluation, research

Back to the 2007 AAHPERD National Convention and Exposition (March 13 -- 17, 2007)