Measuring equivalence/reliability (ICC and consorts)

For a reliability study I’m administering patient reported outcomes (questionnaires) with 2 modes of administration (on paper and via an electronic device). I’d like to measure equivalence/reliability of the 2 modes of administration. In other words: does the electronic version of the questionnaire yield comparable results as the paper version (for an individual patient)?

Study set-up:
Patients are randomised in either of 2 groups: (1) a group that fills in the paper based version first and (2) a group that fills in the electronic version first. Both groups complete both modes of administration. There are 10 patients in both groups (total n=20).

About the questionnaires:
Patients completed 3 questionnaires in both modes of administration. Each question in the questionnaires were scored on a 1-10, 0-3 or 1-5 scale. Total scores and subscale scores (if present) were calculated.

My intuition tells me to use the ICC to measure equivalence. However, the data is not always normally distributed (performed a KS test of the difference between the 2 scores since the data is paired). Because of this, it seems wrong to use ICC as a statistical method to measure equivalence. Which methods can I use to correctly assess equivalence/reliability between the 2 modes of administration (paper vs. electronic questionnaire)? Keep in mind I only have 10 patients per group and not all variables/scores are distributed normally.

I’m using SPSS for the statistical analysis.

Kind regards,