Performance review

I have a dataset with a lot of performance reviews. Reviewers have answered 20 questions (composing five different 4-item scales) about 7 other people. Consequently, every respondent is also reviewed by 7 others.

Now, we aim to do a couple of tests to check the quality of the survey, like:
- Basic validity of questions
- Ceiling effects
- Inter-rater reliability
- Do the items load properly, and are the 4-item scales reliable?

However, we keep running into the problem that people only review a maximum of 7 people, and therefore we can't analyze larger cells than 7.

Does anyone have recommendations on useful tests or methods? Thanks in advance!