A measurement instrument cannot be valid without reliability. Collecting reliability evidence, therefore, has been a common practice in validating physical activity (PA) instruments. Instrument reliability and PA behavior stability, however, are often confounded to each other. Without an appropriate design, the conclusion of reliability or stability is often inappropriate. For example, one of the most common designs to study “reliability” of pedometers is to ask participants to wear pedometers for several days and participants’ day-by-day variations in step counts are mistakenly treated as the reliability of pedometers. In fact, a review of recent published PA validation studies found that many reported instrument reliabilities are indeed of participants’ PA behavior stabilities. By reviewing selected published reliability studies by PA measures (e.g., questionnaires, pedometers, etc), common mistakes in designing reliability studies will be addressed. Appropriate research designs and statistical analyses will be described. The concept of “score reliability” will also be introduced in the context of PA research. Finally, how to design a reliability/stability study under the framework of generalizability theory will be described and illustrated. Keyword(s): . NA