Challenges for Intervention Studies to Improve Successful Longevity

March 25, 2015

By Neil Charness

One of ISL’s goals involves finding effective “holistic” interventions to improve functional, and particularly, cognitive capabilities as people age. Surveys have shown that fear of loss of cognition, and particularly development of dementia, ranks as a top concern for seniors. (It is more feared than loss of vision.)

For cognition, there are now many interventions that have shown efficacy in the scientific literature. By efficacy I mean that an experimental study with random assignment to treatment conditions has shown that a particular treatment, for instance, aerobic exercise, is superior to a control condition, for instance, being in a waiting group. Superior means that a statistical test (an inferential statistic, such as t-test or ANOVA) indicates that the post-treatment difference (in cognition) is highly unlikely under an assumption that the scores from the two conditions come from a common distribution of scores (the null hypothesis distribution). Unlikely is conventionally taken to be less than 5 chances in 100, p < .05. The statistics are usually a bit more complicated, taking into account initial scores and looking to see if the difference between pretest and post-test score is greater in the treatment than the control condition, technically, an interaction effect of treatment by time for a pre-post design. Why bother to do a pretest? Random assignment to conditions if you have large enough sample sizes usually assures that people in the two groups are roughly the same on the variable(s) of interest, here cognition, at the beginning of treatment. Thus, you can infer that any differences at post-test must be due to treatment effects. However, measuring and controlling for any pretest differences that might have occurred (by chance) provides a fairer test of the hypothesis about treatment effects. Here is where things get sticky. First, the people who volunteer for such studies are not usually a random sample from the general population. They are often the better off seniors to begin with so it may be difficult to generalize results back to the general population. Some might argue, though, that if you can achieve improvements in a more fit group the effects should be even stronger in a less fit one. But, one could also argue that a more fit group is much more likely to adhere to treatment so the treatment is likely to be less efficacious in a harder to recruit segment of the population. Second, there are many threats to making valid causal inferences about treatment. Any longitudinal study design (pre and post treatment testing) can have differential dropout between conditions that make interpretations difficult. If people find exercise difficult to perform, they may simply stop participating in the study. Those in the waiting control group could also drop out by failing to show up at time 2, but are less likely to do so. Worse yet, those who drop out might do so based on their initial level of cognition. Maybe people at lower levels of cognition find it more difficult to remember to go to the training sessions or to participate in them so become discouraged and leave the study because of the treatment. So you may end up comparing those with higher level cognition who complete treatment with a mixed group in the control condition and it is more difficult to get gains in cognition if you are already at a high level of performance initially. Further, simply taking a test several times results in improvements, so-called practice or retest effects, which can be maintained over decade intervals between testing as Pat Rabbitt and colleagues have shown in longitudinal studies in Manchester. Another threat to internal validity is the choice of control group. As my colleague Wally Boot has written, passive control groups with no treatment at all are not a good choice. Active control groups are needed to control for things such as social contact because epidemiological studies indicate that more social contact is associated with better cognition. Expectation effects on the part of participants may somehow influence their final scores. They may try harder, or less hard, depending on what they think their treatment should be doing for them. it is very difficult to keep participants in intervention studies blind to their treatment condition, hence they may end up with different expectations. Usually consent forms that they sign on study entry indicate that there are different treatments that they may be randomly assigned to. Whole articles and monographs are written on threats to internal validity and how to deal with them, including different types of statistical analysis for "intent-to-treat" (keep everyone in the analysis whether they followed instructions or not) versus analyses that attempt to estimate treatment effects based on those who actually completed treatments, the compliers, through CACE modeling. But, one external validity threat seems to have been given inadequate attention in my view, namely, generalizability of the outcome measures. Most often for cognitive interventions, researchers use psychometrically reliable measures that attempt to tap constructs such as memory, attention, executive control, speed of processing, spatial ability, etc. that are considered to tap important aspects of cognition. But do they? How well do such measures line up with the difficulties with cognition that people experience in everyday life? Would a third of a standard deviation improvement in cognition using these types of measures make the difference between living independently in the community or not? Prevent someone from falling victim to financial fraud? Make them a safer driver? On the latter measure there is some evidence suggesting that speed of processing training results in fewer at-fault crashes compared to control groups for the ACTIVE clinical trial. However, we are a long way from showing efficacy to providing strong evidence of cost effectiveness of interventions of this type and more importantly, comparative effectiveness across different types of interventions. With commercial companies jumping into the fray to provide "brain training", assessing what works has become extremely important and controversial as different groups have argued that even the evidence for efficacy is not yet adequate, whereas others suggest the evidence is encouraging.

We are certainly at an exciting junction in research where we can progress from chronicling what goes wrong as people age to what we can do about mitigating negative age-related changes. ISL hopes to be at the forefront in addressing this challenge.