QBoard » Artificial Intelligence & ML » AI and ML - R » Calculate AUC in R?

Calculate AUC in R?

  • Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

    Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand

    R(Actual == 1))​

    Because R (not to be confused with the R language) is defined a vector but used as a function?

    This post was edited by Pranav B at October 15, 2020 3:02 PM IST
      September 15, 2020 4:30 PM IST
    0
  • With the package pROC you can use the function auc() like this example from the help page:

    > data(aSAH)
    > 
    > # Syntax (response, predictor):
    > auc(aSAH$outcome, aSAH$s100b)
    Area under the curve: 0.7314
      October 15, 2020 2:59 PM IST
    0
  • Without any additional packages:

    true_Y = c(1,1,1,1,2,1,2,1,2,2)
    probs = c(1,0.999,0.999,0.973,0.568,0.421,0.382,0.377,0.146,0.11)
    
    getROC_AUC = function(probs, true_Y){
        probsSort = sort(probs, decreasing = TRUE, index.return = TRUE)
        val = unlist(probsSort$x)
        idx = unlist(probsSort$ix)  
    
        roc_y = true_Y[idx];
        stack_x = cumsum(roc_y == 2)/sum(roc_y == 2)
        stack_y = cumsum(roc_y == 1)/sum(roc_y == 1)    
    
        auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)])
        return(list(stack_x=stack_x, stack_y=stack_y, auc=auc))
    }
    
    aList = getROC_AUC(probs, true_Y) 
    
    stack_x = unlist(aList$stack_x)
    stack_y = unlist(aList$stack_y)
    auc = unlist(aList$auc)
    
    plot(stack_x, stack_y, type = "l", col = "blue", xlab = "False Positive Rate", ylab = "True Positive Rate", main = "ROC")
    axis(1, seq(0.0,1.0,0.1))
    axis(2, seq(0.0,1.0,0.1))
    abline(h=seq(0.0,1.0,0.1), v=seq(0.0,1.0,0.1), col="gray", lty=3)
    legend(0.7, 0.3, sprintf("%3.3f",auc), lty=c(1,1), lwd=c(2.5,2.5), col="blue", title = "AUC")


      October 15, 2020 3:19 PM IST
    0
  • You can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

    You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

    For example, if pos.scores is a vector containing a score of the positive examples, and neg.scores is a vector containing the negative examples then the AUC is approximated by:

    > mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
    [1] 0.7261

    will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:

    > aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))
     
    This post was edited by Raji Reddy A at October 15, 2020 3:42 PM IST
      October 15, 2020 3:26 PM IST
    0
  • I found some of the solutions here to be slow and/or confusing (and some of them don't handle ties correctly)  I found  data.table based function auc_roc() in  R package mltools.

    library(data.table)
    library(mltools)
    
    preds <- c(.1, .3, .3, .9)
    actuals <- c(0, 0, 1, 1)
    
    auc_roc(preds, actuals)  # 0.875
    
    auc_roc(preds, actuals, returnDT=TRUE)
       Pred CountFalse CountTrue CumulativeFPR CumulativeTPR AdditionalArea CumulativeArea
    1:  0.9          0         1           0.0           0.5          0.000          0.000
    2:  0.3          1         1           0.5           1.0          0.375          0.375
    3:  0.1          1         0           1.0           1.0          0.500          0.875
     
    This post was edited by Rakesh Racharla at October 15, 2020 3:45 PM IST
      October 15, 2020 3:41 PM IST
    0
  • The ROCR package will calculate the AUC among other statistics:

    auc.tmp <- performance(pred,"auc"); auc <- as.numeric(auc.tmp@y.values)
     
      October 15, 2020 3:48 PM IST
    0
  • Currently top voted answer is incorrect, because it disregards ties. When positive and negative scores are equal, then AUC should be 0.5. Below is corrected example.

    computeAUC <- function(pos.scores, neg.scores, n_sample=100000) {
      # Args:
      #   pos.scores: scores of positive observations
      #   neg.scores: scores of negative observations
      #   n_samples : number of samples to approximate AUC
    
      pos.sample <- sample(pos.scores, n_sample, replace=T)
      neg.sample <- sample(neg.scores, n_sample, replace=T)
      mean(1.0*(pos.sample > neg.sample) + 0.5*(pos.sample==neg.sample))
    }
      December 9, 2021 12:42 PM IST
    0