Power issues with mixed models

Hi everyone, I have an experiment in which we're interested in how working memory load affects performance in a theory-of-mind task. Independent variable is working memory load, dependent variable is theory-of-mind task performance.

In our within-subjects design, we have 2 working memory conditions (high load and low load). We have 2 stimulus sets (one for each working memory condition, counterbalanced across participants) to prevent carry-over effects between working memory conditions.

The problem is that performance in our task is driven not only by working memory condition, but also by stimulus set to a large extent. My colleague believes this can be dealt with using mixed models. However, I have some doubts about taking this approach, and would rather take the time to adjust the stimulus sets so that performance within each subject is as comparable as possible between both stimulus sets (under the same working memory condition).

If I were to take my colleague's advice, would there be any issues for example with power? And are there resources that you could point me to so that I can understand this issue better?
