Cognitive data: multiple outcome measures, several explanatory factors and repeated testing.

I have a dataset that I will analyze, but since the complexity level is…well complex, I would appreciate some feedback.

Sorry for the Wall of Text, but I wanted to give a comprehensive picture of the material.

Outcome measures are test results from all in all 9 cognitive tests. For each test there are multiple outcomes, and there is a sad lack of consistence in which outcome measures are used and presented in the literature. One of the tests is RAVLT (classic verbal memory test from 1952 or something), and even for an old and well-established test such as RAVLT, people seem to be very creative when it comes to which outcome of this test to use.

Examples of additional test outcomes are number or percentage of correct responses, or total number of trials to complete a test. In some cases there are also outcomes that reflect increasing difficulty levels of a test. I do not know to what extent and how much the test results co-vary and how much. Additionally, I reckon some measures could co-vary (e.g. correct latencies) between tests while outcome measures within a test might not (e.g. latency and outcome in the most difficult version of a test).

There are two treatment groups. N ≈ 100 / group.

One treatment is known to negatively affect cognition – at least some cognitive functions (perhaps not all), but of course not everyone experience this. Furthermore, some individuals are so sick at baseline (before receiving any treatments) that they display dramatic improvements in cognitive functioning, which overshadow any amnesic effects that they might also suffer from simultaneously. The second treatment probably has, if anything, pro-cognitve effects.

Depression severity (quantitated by MADRS) is an important factor, but as indicated in the preceding paragraph the relationship is not straightforward, and the level of cognitive disability at baseline is not really reflected by the depression score as a whole, but two sub-measures (“concentration difficulties” and “lack of initiative” ) intuitively relate more directly to cognitive functioning.

Tests were done at baseline, during the treatment period, and at a number of follow-up sessions. An individual could maximally participate in six test sessions.Over time, the number of patients participating decrease, and of course, there is a selection bias of participants. I assume competitive or dutiful personalities were more likely to agree to participate, as well as individuals who benefitted from their treatments. Many of the participants who did not respond at all to their allocated treatment were only tested at baseline or at baseline and at the session during the period they received their treatment. Other participants were unwilling to be tested, some wanted only to participate in verbal memory testing while others were willing to participate except for verbal memory testing. There is thus missing data, that are missing - sometimes because of unwillingness, sometimes because of inability, and sometimes for other reasons. At each time point tested there is a depression severity score (or sub-scale score).

So, in summary: two treatment groups, and a multitude of cognitive outcomes, that are measured repeatedly over. Age, treatment and treatment outcome as well as MADRS scores at the time of testing (so MADRS as an independent co-variable that is repeatedly measured and can fluctuate over time) are obvious variables of importance.
Gender, site and perhaps other variables that I have not thought or do not remembered right now (previous number of depressive episodes, treatments received during the follow-up period, definite relapses – as opposed to mere mood fluctuations etc.) that might be important.

I am a bit wary of modeling insofar as I can conceive of different variables cancelling each other out without the model being able to pick it up (if this makes sense?), and I often have the feeling that something is lacking when the results from more advanced statistical modelling is presented in e.g. clinical trials (and that many authors still fall back on discussing the results as if they had not used models that did not just compare this particular measure between two groups; but I stray and might actually be writing nonsense, so sorry...)

What I am interested in here is of course to either pick up signals (negative or positive) I would not have picked up myself otherwise, and compare the cognitive (side-)effects between the groups for the tests that have been used. I am also interested in co-variations as such. Are there outcomes that co-vary as functions of response to the treatments received, and as functions of the treatment per se.We have a group of patients that report persisting cognitive deficits, so there is a particular interest both in whether it is possible to detect any objective measurements that support this experience, and if so, if this is detectable in several test results. The two treatments are not equally effective in regards to remission and response levels, and in reducing symptomatic relief; and certainly have differential cognitive side effect profiles as well as an age-dependent therapeutic effect.

What is the best way to approach the material?

I understand that MANOVAs / MANCOVAs were developed to multiple dependent variables, but perhaps there are drawbacks that I at least should be aware of with these methods? Are Linear Mixed Models performed “per outcome” an option? I am not expecting definite answers of course; but any pointers and general advice about which directions to go and what to bear in mind is much appreciated!

I use SPSS, and remember I am more or less an idiot when it comes to statistics (so be kind).