Scheduled for Measurement Free Communications, Saturday, April 5, 2003, 10:15 AM - 11:30 AM, Convention Center: 304


An Experimental Determination of the Best Missing-Value Recovery Method in Assessing Physical Activity Using Pedometers

Minsoo Kang1, Weimo Zhu1, Catrine Tudor-Locke2 and Barbara E. Ainsworth3, (1)University of Illinois at Urbana-Champaign, Urbana, IL, (2)Arizona State University East, Mesa, AZ, (3)University of South Carolina, Columbia, SC

Tracking the number of steps a person walked using pedometers has become one of the most commonly used practical measures in assessing physical activity.  The purpose of this study was to determine empirically the most effective method to recover the missing values in a step-count data set.  A total of 117 participants were measured using pedometers for 21 consecutive days and 54 of them had no missing values.  Fifty-four participants were randomly selected from those who had missing values (n=63) and their missing values patterns were applied to those who had none, i.e., remove the data according to the same missing days.  Nine recovery methods were applied to this artificial missing data set.  They included four individual- and five group-centered methods, i.e., replace a missing value using the mean of: (a) the remaining days, (b) the remaining weekdays or weekends depending on the type of a missing day, (c) the remaining weekdays or weekend in the same week the missing occurred, (d) the same days, but in other weeks, e.g., replace a missing Monday by the means of Mondays in other two weeks, (e) the other participants on the same missing day, (f) the other participants in the entire days, (g) the other participants based on the type of day, (h) the other participants for the particular week or weekend the missing value occured, and (i) the other participants of the common days. Two indexes, Root Mean Square Difference (RMSD), in which the differences between the original and replacement values were squared, averaged, and square rooted, and Mean Signed Difference (MSD), in which the differences were averaged, were used to determine the effectiveness of the recovery methods.  A smaller RMSD, or a close-to-zero MSD, represents a better recovery of the missing values.  The results indicated the individual-centered methods produced a better recovery of the missing values than the group-centered methods. Using the mean of the remaining weekdays or weekends was the most accurate in recovering missing values:

Method      a             b              c             d             e             f             g              h             i

RMSD 3049.33  2822.14  3637.32  3273.01  3934.48  4018.60  3955.86  3956.89  3928.47

MSD    -762.90  -573.07   -702.90   -480.90   -933.18  -994.21   -958.93  -949.00   -947.32

Negative MSDs indicate that predicted missing values tended to be overestimated.  The replacements generated should be used with a caution considering the large RMSD values found. 

This study was supported by a supplement to CDC SIP4-99; U48/CCU409664-06.

Back to the 2003 AAHPERD National Convention and Exposition