Tracking the number
of steps a person walked using pedometers has become one of the most commonly
used practical measures in assessing physical activity. The purpose of this study was to determine
empirically the most effective method to recover the missing values in a step-count
data set. A total of 117 participants
were measured using pedometers for 21 consecutive days and 54 of them had
no missing values. Fifty-four participants
were randomly selected from those who had missing values (n=63) and
their missing values patterns were applied to those who had none, i.e., remove
the data according to the same missing days.
Nine recovery methods were applied to this artificial missing data
set. They included four
individual- and five group-centered methods, i.e., replace a missing value
using the mean of: (a) the remaining days, (b)
the remaining weekdays or weekends depending on the type of a missing day,
(c) the remaining weekdays or weekend in the same week the missing occurred,
(d) the same days, but in other weeks, e.g., replace a missing Monday by the
means of Mondays in other two weeks, (e) the other participants on the same
missing day, (f) the other participants in the entire days, (g) the other
participants based on the type of day, (h) the other participants for the
particular week or weekend the missing value occured, and (i) the other participants
of the common days. Two
indexes, Root Mean Square Difference (RMSD), in which the differences between
the original and replacement values were squared, averaged, and square rooted,
and Mean Signed Difference (MSD), in which the differences were averaged,
were used to determine the effectiveness of the recovery methods. A smaller RMSD, or a close-to-zero MSD, represents a better recovery
of the missing values. The results
indicated the individual-centered methods produced a better recovery of the
missing values than the group-centered methods. Using the mean of the remaining
weekdays or weekends was the most accurate in recovering missing values:
Method a b c d
e f g h
i
RMSD 3049.33 2822.14
3637.32 3273.01 3934.48
4018.60 3955.86 3956.89
3928.47
MSD -762.90 -573.07
-702.90 -480.90 -933.18
-994.21 -958.93 -949.00
-947.32
Negative MSDs indicate that predicted missing values tended
to be overestimated. The replacements
generated should be used with a caution considering the large RMSD values
found.
This study was supported by a supplement to CDC SIP4-99;
U48/CCU409664-06.