Calculate AUC in R?

QBoard » Artificial Intelligence & ML » AI and ML - R » Calculate AUC in R?

User Dashboard

Back To Topics

Tags : R auc machine-learning data-mining

Pranav B

106 5
Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand
```
R(Actual == 1))
```
Because R (not to be confused with the R language) is defined a vector but used as a function?
This post was edited by Pranav B at October 15, 2020 3:02 PM IST
September 15, 2020 4:30 PM IST

0
Nitara Bobal

53
With the package pROC you can use the function auc() like this example from the help page:
```
> data(aSAH)
> 
> # Syntax (response, predictor):
> auc(aSAH$outcome, aSAH$s100b)
Area under the curve: 0.7314
```
October 15, 2020 2:59 PM IST

0

Raji Reddy A

90 6

Without any additional packages:

true_Y = c(1,1,1,1,2,1,2,1,2,2)
probs = c(1,0.999,0.999,0.973,0.568,0.421,0.382,0.377,0.146,0.11)

getROC_AUC = function(probs, true_Y){
    probsSort = sort(probs, decreasing = TRUE, index.return = TRUE)
    val = unlist(probsSort$x)
    idx = unlist(probsSort$ix)  

    roc_y = true_Y[idx];
    stack_x = cumsum(roc_y == 2)/sum(roc_y == 2)
    stack_y = cumsum(roc_y == 1)/sum(roc_y == 1)    

    auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)])
    return(list(stack_x=stack_x, stack_y=stack_y, auc=auc))
}

aList = getROC_AUC(probs, true_Y) 

stack_x = unlist(aList$stack_x)
stack_y = unlist(aList$stack_y)
auc = unlist(aList$auc)

plot(stack_x, stack_y, type = "l", col = "blue", xlab = "False Positive Rate", ylab = "True Positive Rate", main = "ROC")
axis(1, seq(0.0,1.0,0.1))
axis(2, seq(0.0,1.0,0.1))
abline(h=seq(0.0,1.0,0.1), v=seq(0.0,1.0,0.1), col="gray", lty=3)
legend(0.7, 0.3, sprintf("%3.3f",auc), lty=c(1,1), lwd=c(2.5,2.5), col="blue", title = "AUC")

October 15, 2020 3:19 PM IST

Raji Reddy A

90 6
You can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

For example, if pos.scores is a vector containing a score of the positive examples, and neg.scores is a vector containing the negative examples then the AUC is approximated by:
```
> mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
[1] 0.7261
```
will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:
```
> aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))
```
This post was edited by Raji Reddy A at October 15, 2020 3:42 PM IST
October 15, 2020 3:26 PM IST

0

Rakesh Racharla

129 8

I found some of the solutions here to be slow and/or confusing (and some of them don't handle ties correctly) I found data.table based function auc_roc() in R package mltools.

library(data.table)
library(mltools)

preds <- c(.1, .3, .3, .9)
actuals <- c(0, 0, 1, 1)

auc_roc(preds, actuals)  # 0.875

auc_roc(preds, actuals, returnDT=TRUE)
   Pred CountFalse CountTrue CumulativeFPR CumulativeTPR AdditionalArea CumulativeArea
1:  0.9          0         1           0.0           0.5          0.000          0.000
2:  0.3          1         1           0.5           1.0          0.375          0.375
3:  0.1          1         0           1.0           1.0          0.500          0.875

This post was edited by Rakesh Racharla at October 15, 2020 3:45 PM IST

October 15, 2020 3:41 PM IST

Advika Banerjee

319 1
The ROCR package will calculate the AUC among other statistics:
```
auc.tmp <- performance(pred,"auc"); auc <- as.numeric(auc.tmp@y.values)
```
```
 
```
October 15, 2020 3:48 PM IST

0

Viaan Prakash

461

Currently top voted answer is incorrect, because it disregards ties. When positive and negative scores are equal, then AUC should be 0.5. Below is corrected example.

computeAUC <- function(pos.scores, neg.scores, n_sample=100000) {
  # Args:
  #   pos.scores: scores of positive observations
  #   neg.scores: scores of negative observations
  #   n_samples : number of samples to approximate AUC

  pos.sample <- sample(pos.scores, n_sample, replace=T)
  neg.sample <- sample(neg.scores, n_sample, replace=T)
  mean(1.0*(pos.sample > neg.sample) + 0.5*(pos.sample==neg.sample))
}

December 9, 2021 12:42 PM IST

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

Calculate AUC in R?

Connect With Us