**Warning message: In probFunction(method, modelFit, ppUnk) : kernlab class probability calculations failed; returning NAs**
library(caret) trainset <- data.frame( class=factor(c("Good", "Bad", "Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Bad", "Bad", "Bad")), age=c(67, 22, 49, 45, 53, 35, 53, 35, 61, 28, 25, 24)) testset <- data.frame( class=factor(c("Good", "Bad", "Good" )), age=c(64, 23, 50)) library(kernlab) set.seed(231) ### finding optimal value of a tuning parameter sigDist <- sigest(class ~ ., data = trainset, frac = 1) ### creating a grid of two tuning parameters, .sigma comes from the earlier line. we are trying to find best value of .C svmTuneGrid <- data.frame(.sigma = sigDist[1], .C = 2^(-2:7)) set.seed(1056) svmFit <- train(class ~ ., data = trainset, method = "svmRadial", preProc = c("center", "scale"), tuneGrid = svmTuneGrid, trControl = trainControl(method = "repeatedcv", repeats = 5)) ### svmFit finds the optimal values of tuning parameters and builds the model using the best parameters ### to predict class of test data predictedClasses <- predict(svmFit, testset ) str(predictedClasses) ### predict probablities but i get an error predictedProbs <- predict(svmFit, newdata = testset , type = "prob") head(predictedProbs)
svmFit$finalModel
The problem is your y variable. When you are asking for the class probabilities, the train and / or the predict function puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). See also this post.
If you change this line in your code it should work:
a[,1] = factor(a[,1], labels = c("no", "yes"))
svmFit <- train(class ~ .,
data = trainset,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmTuneGrid,
trControl = trainControl(method = "repeatedcv", repeats = 5,
classProbs = TRUE))
predictedClasses <- predict(svmFit, testset )
predictedProbs <- predict(svmFit, newdata = testset , type = "prob")
giving the probabilities of being in the Bad or Good class in the test dataset as:
print(predictedProbs)
Bad Good
1 0.2302979 0.7697021
2 0.7135050 0.2864950
3 0.2230889 0.7769111