This section looks at ways in which it is possible to assess the repeatability and validity of dietary methods.
Assessing the repeatability (also referred to as the reproducibility) of a laboratory method is relatively straightforward because, with care, it is possible to reproduce both what is measured and the conditions of measurement. This is almost always impossible in the case of a dietary intake measurement. Individuals do not eat exactly the same quantities or the same foods on different days or weeks.
All measures of repeatability obtained by applying the same method to the same individuals on more than one occasion include not only measurement error but also real day-to-day or week-to-week vari-ability in intake.
While at first sight it might appear easier to measure the repeatability of recall methods such as the 24 hour recall and diet histories, this process also introduces additional sources of variation since the interviews have to be conducted at different times and possibly by different interviewers. Measures of repeatability for all dietary methods will thus tend to give an over-estimate of the extent of measurement error because they will always include an element of variation due to real differences in what is being measured and in the conditions under which it is being measured.
Usually, the repeatability of a dietary method is determined by repeating the same method on the same individuals on two separate occasions, that is, by a test–retest study. The interval between tests depends on the time-frame of the dietary method being assessed, but should generally be short enough to avoid the effects of seasonal or other changes in food habits and long enough to avoid the possibility of the first interview or recording period influencing the second one.
The difference between the results obtained on the two occasions can be expressed in a number of different ways. Table 10.5, which was compiled from data reported in the literature, shows various measures of repeatability for energy intake obtained with different dietary methods repeated after an interval of time.
The different measures of repeatability provide different information. The correlation coefficient is widely quoted but is not a good measure of repeat-ability since a good correlation may be obtained even if one set of measurements has been systematically biased and has a different mean from the other set. The mean difference is not a good measure of repeat-ability in individuals since it depends primarily on whether the differences are random or systematic. Measures that reflect the differences between repeated measurements within individuals are to be preferred. The coefficient of variation of the differences within individuals and the coefficient of repeatability (which is simply twice the standard deviation of the differ-ences and represents the 95% confidence limits of agreement) give much better measures of their mag-nitude. They are also more readily interpreted in practical terms than either a correlation coefficient or the percentage of individuals classified in the same quintile, quartile, or tertile. If the standard deviation of the difference within individuals is of the order of 20–30% of mean intake, one is unlikely to describe the method as precise or repeatable even if the mean difference at group level is only 1%.
Demonstrating that a dietary method measures what it is intended to measure is even more difficult than demonstrating that a method is repeatable, because in effect it “requires that the truth be known.”
This is almost always impossible unless it is possi-ble to observe, surreptitiously, what is consumed over short periods such as 24 hours or at most a few days. Observation is usually only feasible in institutional settings or in situations specially set up to allow unob-trusive observation of what people eat.
For methods that are designed to obtain informa-tion on habitual longer-term intake, such as the diet history or food frequency questionnaires, unobtru-sive observation is impossible. This is a problem that has been faced by all investigators of dietary assess-ment methods and until relatively recently was usually “solved” by assessing one dietary method in relation to another dietary method, usually a 7 day weighed dietary record, which was considered to be the best available or criterion measure. Comparison with another dietary method provides at best only a rela-tive form of validity and at worst information that is unrelated to validity but reflects either real differences or similar errors between the methods. For example, comparison of data from a single 24 hour recall or a diet history with data from a 7 day weighed record for the same individuals does not compare the same information because the time periods are not con-current. However, because of the lack of a suitable external standard against which true validity could be judged before the 1980s it was usually assumed that most dietary intake data, and weighed records in par-ticular, provided valid data. Usually, a method was judged acceptable if the mean intake, as measured by both methods, did not differ significantly and if cor-relations for nutrient intake in individuals exceeded 0.5. The magnitude of the coefficient of variation of the differences within individuals was generally ignored.
Table 10.6 shows data from three studies that provide additional information on agreement. All three studies compared data from a food frequency questionnaire with multiple days of food intake records. When different methods are compared the mean differences tend to be higher (there is greater bias) than those found in repeatability studies.
However, the range of values obtained for other measures of agreement is generally similar to that obtained in repeatability studies. Agreement at the individual level is also not high, with coefficients of variation for differences in individuals ranging from 17% to 33% in these studies and less than 50% of respondents classified in the same quintile of intake. Note that even good agreement between two dietary methods does not necessarily indicate validity, but may merely indicate similar errors.