  • i have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to finalize values of various tuning parameters and got the final Support vector machine . For the test data i can predict the 'class'. But when i try to predict probabilities for test data, i get below error (for example my model tells me that 1st data point in test data has y='good', but i want to know what is the probability of getting 'good' ...generally in case of support vector machine, model will calculate probability of prediction..if Y variable has 2 outcomes then model will predict probability of each outcome. The outcome which has the maximum probability is considered as the final solution)
    **Warning message: In probFunction(method, modelFit, ppUnk) : kernlab class probability calculations failed; returning NAs**
    sample code as below
    library(caret) trainset <- data.frame( class=factor(c("Good", "Bad", "Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Bad", "Bad", "Bad")), age=c(67, 22, 49, 45, 53, 35, 53, 35, 61, 28, 25, 24)) testset <- data.frame( class=factor(c("Good", "Bad", "Good" )), age=c(64, 23, 50)) library(kernlab) set.seed(231) ### finding optimal value of a tuning parameter sigDist <- sigest(class ~ ., data = trainset, frac = 1) ### creating a grid of two tuning parameters, .sigma comes from the earlier line. we are trying to find best value of .C svmTuneGrid <- data.frame(.sigma = sigDist[1], .C = 2^(-2:7)) set.seed(1056) svmFit <- train(class ~ ., data = trainset, method = "svmRadial", preProc = c("center", "scale"), tuneGrid = svmTuneGrid, trControl = trainControl(method = "repeatedcv", repeats = 5)) ### svmFit finds the optimal values of tuning parameters and builds the model using the best parameters ### to predict class of test data predictedClasses <- predict(svmFit, testset ) str(predictedClasses) ### predict probablities but i get an error predictedProbs <- predict(svmFit, newdata = testset , type = "prob") head(predictedProbs)
    new question below this line: as per below output there are 9 support vectors. how to recognize out of 12 training data points which are those 9?
    Support Vector Machine object of class "ksvm"
    SV type: C-svc (classification) parameter : cost C = 1
    Gaussian Radial Basis kernel function. Hyperparameter : sigma = 0.72640759446315
    Number of Support Vectors : 9
    Objective Function Value : -5.6994 Training error : 0.083333
      August 31, 2021 3:45 PM IST
      September 12, 2021 1:02 AM IST
  • The problem is your y variable. When you are asking for the class probabilities, the train and / or the predict function puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). See also this post.

    If you change this line in your code it should work:

    a[,1] = factor(a[,1], labels = c("no", "yes"))
      January 11, 2022 3:47 PM IST
  • In the train control statement, you have to specify if you want the class probabilities classProbs = TRUE returned.

    svmFit <- train(class ~ .,
        data = trainset,
        method = "svmRadial",
        preProc = c("center", "scale"),
        tuneGrid = svmTuneGrid,
        trControl = trainControl(method = "repeatedcv", repeats = 5, 
    classProbs =  TRUE))
    predictedClasses <- predict(svmFit, testset )
    predictedProbs <- predict(svmFit, newdata = testset , type = "prob")​

    giving the probabilities of being in the Bad or Good class in the test dataset as:

        Bad      Good
    1 0.2302979 0.7697021
    2 0.7135050 0.2864950
    3 0.2230889 0.7769111

    To answer your new question, you can access the position of the support vectors in your original data set with alphaindex(svmFit$finalModel) with coefficients coef(svmFit$finalModel).
      September 1, 2021 1:18 PM IST