Selecting the best subject's data and features to optimize the analysis


New Member
I am not good at statistical analysis. So I am posting here my case and looking for your kind suggestions.
My case: I have data from subjects, which each subject has two similar runs that were performed at different times. These data are for 78 items, which belongs to two different categories (category1=38 items, category2= 40 items).

My purpose: I need to run an analysis to sort those items based on their category, which category1 suppose to have the highest rating after performing the analysis, as the experiment was designed for that. It's fine to exclude subjects, features, and items.

My idea is to choose the best subjects with a similar pattern of values. Then reasonably choose the best features of each category for each subject. Then select the items with the good pattern of features. After that, I perform any method that can sort the items on the subject's level and then at the group level.
Actually, I tried to use SVM classification, but the data is non-linearley separable and the accuracy was very low (about 50%). So most of the predictions weren't correct and didn't achieve my expected results.

Any suggestions will be appreciated.


Why do you say it is fine to exclude subjects? It's almost never ok to just exclude subjects - especially if the reason is that they don't meet your expectations. The issue is if you cherry pick like that then any model you build will be completely fictitious and not representative/predictive of any future values (overfitting your sample dataset).


New Member
What is plotted on the axes of the uploaded figure?
I used function of a toolbox to plot the data.
% View the distribution of data in feature space with projection of two
% dimensional plane specified by 'dim'.
slr_view_data(labels, training_data) %View features value of first two dimension of "x".