# K-Fold Cross validation into groups

#### NewGeneration

##### New Member
Hi All,

I'm wondering if I can get some assistances on how I can employ K-fold cross validation to data that needs to be split into groups. I know how to employ the K-fold cross validation for standard data where each row is an independent event, however in the example of horse racing context what do I need to do to my code to modify it to suit grouped data to avoid mixing horses from one race to another and even mixing between test and training samples? Each race/independent event in my data has a unique identifier denoted as 'RaceNo" so I know I have to do something with that. Example of how my data is structured is as follows;

RaceNo# HorseNo.# Winner Variable 1 Variable 2 Variable 3 ....
1-------------1------------0--------1-----------13---------35
1-------------2------------1--------7----------15----------37
1-------------3------------0--------9----------21----------40
2-------------1------------0--------8----------13----------50
2-------------2------------0--------3----------14----------13
2-------------3------------1--------2----------5-----------17

Below is also the standard K-fold cross validation code I have. I am hoping its an easier manipulation of this code to make it suitable?

IndexMatrix <- createDataPartition(HorseData$Winner, p=0.8, list=FALSE, times=1) HorseData<- as.data.frame(HorseData) TrainData <- HorseData[IndexMatrix,] TestData <- HorseData[-IndexMatrix,] TrainData$Winner[TrainData$Winner==1] <- "Win" TrainData$Winner[TrainData$Winner==0] <- "Lose" TestData$Winner[TestData$Winner==1] <- "Win" TestData$Winner[TestData$Winner==0] <- "Lose" TrainData$Winner <- as.factor(TrainData$Winner) TestData$Winner <- as.factor(TestData$Winner) cntrlspecs <- trainControl(method="cv", number=10, savePredictions="all", classProbs=TRUE) set.seed(2000) LogitModel <- train(Winner~"VARIABLES", data=TrainData, method="glm", family=binomial) print(LogitModel) summary(LogitModel) varImp(LogitModel) Prediction <- predict(LogitModel, newdata=TestData) confusionMatrix(data=Prediction, TestData$Winner)