In order to provide appropriate physical activity services to individuals with mental retardation (MR), accurately evaluating one’s level of physical activity in a clinical setting becomes an important first step in planning and developing physical intervention program. Methods of measuring physical activity levels in field-based assessment should not only demonstrate good psychometric properties, but also have low cost and be easy to use in practical situations. Systematic observation is a field-based assessment method that records behavior based on specific guidelines and coding procedures, which can easily be used in clinical settings. Many systematic observation tools have been developed to measure levels of physical activity in clinical settings for children without disabilities. Considering individuals with MR have more sedentary lifestyles than individuals without MR, it is not known if systematic observation will be able to discriminate among activity levels for this population. Therefore, the purpose of this study was to examine psychometric properties of the System for Observing Fitness Instruction Time (SOFIT), and the Children’s Activity Rating Scale (CARS) for use with children with MR. Eleven children between the ages of 6 and 14 years with MR participated in this study. Three raters coded the data twice each with SOFIT and CARS. Accelerometer data was synchronized with the coding intervals and used as the criterion measure for concurrent validity. A two-facet generalizability theory analysis was used to examine the sources of variability in assessing physical activity levels. The data were analyzed separately for each instrument using a completely crossed 2 by 3 (trail by rater) ANOVA. Seven sources of variability were estimated from the ANOVA results. The variance associated with participant (σ2p), trial (σ2t), rater (σ2r), three two-way interactions, and the residual term (three-way interaction plus error) were estimated. Validity coefficients were calculated using Pearson product moment correlations between results from Actiwatch® accelerometers and the results from the two systemic observation tools. SOFIT and CARS both demonstrated high generalizability (phi coefficient=0.98 and 0.75) across raters and trials. None of error variance appears to be significant for SOFIT; whereas, for CARS, the high sources of error variance were associated with the rater facet (31.49%) and the participant-by-rater interaction term (15.41%). However, SOFIT demonstrated low concurrent validity (r = 0.05 and 0.10). CARS demonstrated low/moderate concurrent validity (r = 0.44 and 0.61). CARS demonstrated higher validity evidence than SOFT, but rater training is essential before employing CARS as a field-assessment tool.Keyword(s): adapted physical activity, measurement/evaluation, physical activity