Data splitting

UofA

New Member
#1
Hello all,

I am trying to split data into a training data set for regression and a prediction data set to test my regression. I have 130 samples, i need the prediction data set to have 25 observations and the rest go to the training data set.
I am following 4 steps of a DUPLEX algorithm for data splitting; for anyone's interest, is decribed in: Montgomery, D et all. "introduction to linear regression analysis" 2001.
So far I have completed 3 of the four steps and have obtained a matrix of orthonormalized points.
Next I have to use the orthonormalized points and calculate the euclidean distance between all rows, this i think can be done using the program R.

My question is once I have obtained the distances I want to be able to write a program in R or C++, that will move the largest distance and move it to my training data set and remove it from the other distance sets, then add the second largest distance to the prediction data set and remove it from the other distance sets, and so on until I get 25 observations into my prediction data set.

Would someone be able to help me?

Thanks,
UofA