There are 141 pieces of patients' data, and I want to use 2 formulas to analyze them. Formula A only needs 2 variables (Va, Vb), Formula B need 5 variables (Va, Vb, Vc, Vd, Ve).

As you can see, the 2 variables Va, Vb needed in Formula A are also contained in Formula B.

All the 141 pieces of patients' data have the Va, Vb, but only 121 of them have the Vc, Vd, Ve. That is to say, data of Vc, Vd, Ve are missing in 20 patients.

So I decided to call the 121 patients with a complete dataset as Subgroup 1, and the 20 patients with incomplete data as Subgroup 2.

I want to know whether age distribution is different in the two groups or not, so I do the student T-test and found that although there is no significant difference between Subgroup 1 and the whole population (p=0.3), significant differences (p<0.001) are observed between Subgroup 2 & Subgroup 1 and Subgroup 2& the whole population.

Here comes my problem: Can the Formula B result from Subgroup 1 efficiently represent the whole population?

*I cannot just dismiss the 20 pieces of patient data with incomplete variables, because the result of Formula A is still needed in my study.

Wish everyone has a good day!