QBoard » Artificial Intelligence & ML » AI and ML - Python » Roc curve and cut off point using Python

User Dashboard

Roc curve and cut off point using Python

Back To Topics

Tags : python ROC logistic-regression cutt_off_point

Nitara Bobal

53
I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve:
```
from sklearn import metrics
 fpr, tpr, thresholds = metrics.roc_curve(Y_test,p)
```
I know metrics.roc_auc_score gives the area under the ROC curve. Can anyone tell me what command will find the optimal cut-off point (threshold value)?
October 13, 2020 12:19 PM IST

0
- Advika Banerjee The simple answer to your question is , np.argmax(tpr - fpr)
  
  or cancel
  
  October 13, 2020
Laksh Nath

126
Vanilla Python Implementation of Youden's J-Score
```
def cutoff_youdens_j(fpr,tpr,thresholds):
    j_scores = tpr-fpr
    j_ordered = sorted(zip(j_scores,thresholds))
    return j_ordered[-1][1]
```
October 13, 2020 5:32 PM IST

0
Pranav B

106 5
It can be computed as:
```
# calculate the g-mean for each threshold
gmeans = sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = argmax(gmeans)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds[ix], gmeans[ix]))
```
October 13, 2020 5:33 PM IST

0

Maryam Bains

317

You can do this using the epi package in R, however I could not find similar package or example in Python.

The optimal cut off point would be where “true positive rate” is high and the “false positive rate” is low. Based on this logic, I have pulled an example below to find optimal threshold.
Python code:

import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
from sklearn.metrics import roc_curve, auc

# read the data in
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")

# rename the 'rank' column because there is also a DataFrame method called 'rank'
df.columns = ["admit", "gre", "gpa", "prestige"]
# dummify rank
dummy_ranks = pd.get_dummies(df['prestige'], prefix='prestige')
# create a clean data frame for the regression
cols_to_keep = ['admit', 'gre', 'gpa']
data = df[cols_to_keep].join(dummy_ranks.iloc[:, 'prestige_2':])

# manually add the intercept
data['intercept'] = 1.0

train_cols = data.columns[1:]
# fit the model
result = sm.Logit(data['admit'], data[train_cols]).fit()
print result.summary()

# Add prediction to dataframe
data['pred'] = result.predict(data[train_cols])

fpr, tpr, thresholds =roc_curve(data['admit'], data['pred'])
roc_auc = auc(fpr, tpr)
print("Area under the ROC curve : %f" % roc_auc)

####################################
# The optimal cut off would be where tpr is high and fpr is low
# tpr - (1-fpr) is zero or near to zero is the optimal cut off point
####################################
i = np.arange(len(tpr)) # index for df
roc = pd.DataFrame({'fpr' : pd.Series(fpr, index=i),'tpr' : pd.Series(tpr, index = i), '1-fpr' : pd.Series(1-fpr, index = i), 'tf' : pd.Series(tpr - (1-fpr), index = i), 'thresholds' : pd.Series(thresholds, index = i)})
roc.iloc[(roc.tf-0).abs().argsort()[:1]]

# Plot tpr vs 1-fpr
fig, ax = pl.subplots()
pl.plot(roc['tpr'])
pl.plot(roc['1-fpr'], color = 'red')
pl.xlabel('1-False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic')
ax.set_xticklabels([])

The optimal cut off point is 0.317628, so anything above this can be labeled as 1 else 0. You can see from the output/chart that where TPR is crossing 1-FPR the TPR is 63%, FPR is 36% and TPR-(1-FPR) is nearest to zero in the current example.

Output:

        1-fpr       fpr        tf     thresholds       tpr
  171  0.637363  0.362637  0.000433    0.317628     0.637795

Hope this is helpful.

October 13, 2020 5:36 PM IST

Rakesh Racharla

129 8
Given tpr, fpr, thresholds from your question, the answer for the optimal threshold is just:
```
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold = thresholds[optimal_idx]
```
```
 
```
October 13, 2020 5:37 PM IST

0

Member Sign In

Member Sign In

Create Account

Roc curve and cut off point using Python

Output:

Connect With Us