Every soldier in the United States Army is required to pass the semi-annual physical fitness test (APFT) which includes a 2-min push-up test. Failure of the push-up test could result in removal from the Army. Push-ups are evaluated by unit officers and proper evaluation is critical. Research has shown that while intra-rater reliability is high (r=0.85-0.97), inter-rater reliability is poor and highly variable (0.36-0.99; Baumgartner, et al., 1995, Murr, 1997, Baumgartner, et al. 2002) rendering scoring highly rater-dependent. Major assessment issues center on evaluation of correct elbow angles (full extension and 90o) and body posture during each repetition being performed at relatively high rate (as high as 1/sec). Typically, studies make comparisons based on total score between the graders, but studies have not been conducted where rater objectivity or reliability is determined over multiple trials against a criterion standard. This may be important is defining the source of poor inter-rater reliability and is the purpose of this study. United States Military Academy (USMA) cadets (n=15) gave informed consent and were videotaped during execution of the 2-min push-up test during their semi-annual APFT. The video tapes were digitized and recorded onto CDs and evaluated by two experts, military officers from the Physical Education Testing Office, USMA, to determine acceptable and unacceptable push-ups and total score and was the criterion standard. Analysis was performed in slow motion to properly assess each push-up. Eight graders were randomly selected from the USMA Physical Education Faculty (who regularly administer the push-up test) and gave informed consent to participate. Graders were shown the digital videos in real time and verbally responded yes or no to each push-up repetition; response recorded by study investigators. Graders evaluated videos (in random order) 4 times with a minimum of 1-week between evaluations. Grader's assessments were analyzed against the criterion standard and were analyzed using repeated measures ANOVA and Pearson product moment correlation. From the video tapes, 2-min push-up performance was as follows: 70+17 (range 32-97) attempted, 59+19 (range 19-91) accepted and 15+21 (range 0-43) rejected. While there were no significant differences between the graders for total score (p=0.35), there was considerable variation in which repetitions were accepted or rejected. For total score, intra-grader reliability was relatively high (r=0.79-0.90), but inter-grader reliability was highly variable (r=0.49-0.95). Interestingly, most of the variation and error occurred in the first minute of the test, when cadets were performing push-ups at a very high rate. Keyword(s): assessment, exercise/fitness/physical activity, physical education PK-12