Hello, I have a clinical data set that consists of 5 clinical measurements on thousands of tissue samples. Furthermore, each sample has a pathology diagnosis that is 1 of 5 possible diagnoses (all different types of tumors). I am interested in predicting which pathologic class future samples will belong to based on the 5 clinical measurements. I recognize the predictor can be built using machine learning, and I will apply both decision trees and a deep learning method to the data soon. However, I first wanted to explore more simple analyses that could be used to compare the machine learning findings to. An example of the structure of the data are below, averaged across all samples. The numbers are fake.
Is there a statistical approach to take that might say which clinical tests are "important" for which diagnoses? For example, to determine that Tumor Type 5 is best classified by Blood Test 2 > 15, Biopsy Test 1 < 13, and Imaging Test 1 > 2?
Do any other analysis methods jump out at you besides machine learning that I should consider?
Thanks,
Jason
Code:
Tumor Type 1 Tumor Type 2 Tumor Type 3 Tumor Type 4 Tumor Type 5
Blood Test 1 15.3 +/- 3.2 21.8 +/- 4.3 8.2 +/- 2.3 8.2 +/- 2.3 8.2 +/- 2.3
Blood Test 2 13.4 +/- 3.8 15.9 +/- 3.2 22.8 +/- 11.1 8.2 +/- 2.3 8.2 +/- 2.3
Biopsy Test 1 3.2 +/- 1.3 10.2 +/- 2.9 23.9 +/- 1.2 8.2 +/- 2.3 8.2 +/- 2.3
Biopsy Test 2 3.2 +/- 1.3 10.2 +/- 2.9 23.9 +/- 1.2 8.2 +/- 2.3 8.2 +/- 2.3
Imaging Test 1 3.2 +/- 1.3 10.2 +/- 2.9 23.9 +/- 1.2 8.2 +/- 2.3 8.2 +/- 2.3
Do any other analysis methods jump out at you besides machine learning that I should consider?
Thanks,
Jason