Evaluating standing balance is an important component of lower extremity neuromuscular function and sport training assessment. One device capable of quantifying balance is the Biodex Stability System (BSS). Reliability measurement is a function of the number of trials performed as well as different days tested. Our purpose was to perform a generalizability theory analysis to determine the amount of variation associated with the facets of Trials and Days and to determine the most reliable protocol for measurement. In Generalizability Theory, distinction is made between two types of studies: G study and D study (Morrow et al, RQES, 57:3, 1986). G-study is used to quantify the amount of variance associated with different facets. D (Decision) study provides information about which protocols are optimal for a particular measurement situation by generating generalizability coefficients (G's), which can be interpreted as reliability coefficients across facets. Forty subjects participated (20 men and 20 women; 22.4±3.4 yrs; 72.45±12.5 kg; 170.25±9.25 cm) without lower extremity injury or vestibular disorder. The BSS is an instrumented circular tilt board that moves in a 360° horizontal plane. The degrees of tilt are measured and a stability index is calculated based on the deviation from horizontal. The stability index was used as datum for analysis. A practice session was conducted to familiarize the subjects with the balance protocol. Subjects were positioned with both feet on the platform and hands at their sides as the platform was unlocked. The position at which they could maintain platform stability and felt comfortable was recorded and used for all trials. Each subject performed three 20-s trials each day at level 6 for 2 consecutive days. G study results found the largest percentage of variance was associated with Subjects (72.54%) followed by Subject x Trial x Day interaction (14.97%). The facets of Trials and Days accounted for 4.21% and 2.18%, respectively, with the Subject x Trials and Subject x Days accounting for a combined .08%. Subject x Day accounted for 6% of the variability. D study showed the largest G coefficient for 4 Trials and 4 Days (G=.96) with the lowest being 1 Trial and 1 Day (G=.775). The data supports the conclusion that sufficient reliability (G>.80) requires scores averaged over a minimum of 2 Trials or 2 Days. Variation due to days influenced measurement error more than trials.Keyword(s): measurement/evaluation, research