Comparisons of same number of measures emanating from different underlying N.

I have data collected from a previous research by Brielmann et Pelli (2019) on the rating of "beauty", "valence" and "arousal" of 900 images, on a Likert scale
(1-5). Total N = 757. Each image was scored by a different number of this N. There are 4 subsets of images (The first subset was rated by n=368, the second by n=104, etc.) For example, the first image which belongs to the second subset was rated by 140 people, the second which belongs to the fourth subset by 104 and so on. I quote you all of the data:
The purpose of my research is to compare the scores of a sample of meditating individuals with the scores of the above research. The alternative hypothesis of my research is that my sample will differ from the above population in the "beauty" and "valence" ratings. As it is not possible for my sample to rate 900 images, I will take a random sample of images (eg 40 images) and present them to my sample (eg. 30 people) from which I will receive my data. The comparisons will be on the means of these images (second and fifth column)
The procedure should be as follows: I will take the existing data for each image I randomly selected (40 different scores) and compare it with the data I will receive from my sample (for the same 40 images). My guess is that, beauty and valence ratings of my sample will differ from the population of the first survey.
My problem is this: The comparison is not about participants but about scores (40 and 40 scores). These scores, however, have been taken from different populations (eg for one image, 368 and 30 people respectively, for another 140 and 30 people, and so on). The variances for each measurement are obviously different, but the general comparison concerns equal ratings (40 and 40). What statistical methodology is the most appropriate in this situation? What is the role of the underlying samples and how do they affect my research?
A statistician that i know told me that: «I NEED PROFILE ANALYSIS WITH REPEATED MEASURES, WHERE EVERY "PROFILE" WILL REFER TO EACH SAMPLE-GROUP AND EVERY "REPEATED" MEASUREMENT WILL REFER TO THE TWO MEASURES THAT I HAVE FOR EVERY SAMPLE.» Can someone decipher what does that means? Do you have any other reccomendations or objections? Should i just use a higher sample (more than 104 people) and compare it to the pictures rated by the subset with the lowest n (subset 4=104) so that i have the same underlying populations ?