QBoard » Artificial Intelligence & ML » AI and ML - R » Error in ConfusionMatrix the data and reference factors must have the same number of levels

Error in ConfusionMatrix the data and reference factors must have the same number of levels

  • I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:

    Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

     

    prob <- 0.5 #Specify class split singleSplit <- createDataPartition(modellingData2$category, p=prob, times=1, list=FALSE) cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) traindata <- modellingData2[singleSplit,] testdata <- modellingData2[-singleSplit,] treeFit <- train(traindata$category~., data=traindata, trControl=cvControl, method="rpart", tuneLength=10) predictionsTree <- predict(treeFit, testdata) confusionMatrix(predictionsTree, testdata$catgeory)


    The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!

    > str(predictionsTree) Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ... > str(testdata$category) Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ... > levels(predictionsTree) [1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge" [6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International" [11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts" [16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers" [21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers" [26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised" > levels(testdata$category) [1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge" [6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International" [11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts" [16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers" [21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers" [26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
      September 15, 2020 4:37 PM IST
    0
    • Pranav B
      Pranav B @Laksh Nath ,Maybe your model is not predicting a certain factor. Use the table() function instead of confusionMatrix() to see if that is the problem
      September 16, 2020
  • Do table(pred) and table(testing$Final). You will see that there is at least one number in the testing set that is never predicted (i.e. never present in pred). This is what is meant why "different number of levels". There is an example of a custom made function to get around this problem here.

    However, I found that this trick works fine:

    table(factor(pred, levels=min(test):max(test)), 
          factor(test, levels=min(test):max(test)))

     

    It should give you exactly the same confusion matrix as with the function.

     
      November 30, 2021 11:54 AM IST
    0
  • Try use:

    confusionMatrix(table(Argument 1, Argument 2)) 

    Thats worked for me.

      September 16, 2020 11:56 AM IST
    0
  • Try specifying na.pass for the na.action option:
    predictionsTree <- predict(treeFit, testdata,na.action = na.pass)​
      September 16, 2020 12:00 PM IST
    0
  • Change them into a data frame and then use them in confusionMatrix function:

     



    pridicted <- factor(predict(treeFit, testdata))
    real <- factor(testdata$catgeory)
    
    my_data1 <- data.frame(data = pridicted, type = "prediction")
    my_data2 <- data.frame(data = real, type = "real")
    my_data3 <- rbind(my_data1,my_data2)
    
    # Check if the levels are identical
    identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))
    
    confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))
    ​
     
      September 16, 2020 12:19 PM IST
    0
  • make sure you installed the package with all its dependencies:

    install.packages('caret', dependencies = TRUE)
    
    confusionMatrix( table(prediction, true_value) )
      September 16, 2020 12:20 PM IST
    0