본문 바로가기
Python

[python] best threshold & roc-curveBest threshold를 찾고 roc curve에 표시하기

by Chandler.j 2021. 1. 18.
반응형

fig1. title

binary classification에서 best threshold를 찾고 roc-curve에 표시해보자

best threshold는 Youden’s J statistic를 이용한다.

 

 

참고: en.wikipedia.org/wiki/Youden%27s_J_statistic

 

Youden's J statistic - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search Index that describes the performance of a dichotomous diagnostic test Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichoto

en.wikipedia.org


#1. best threshold 찾기

classifier modeling 후 best threshold에서의 sensitivityspecificity를 확인한 후

classification reporting을 해보자

from numpy import sqrt
from numpy import argmax
from sklearn.metrics import classification_report
from sklearn.metrics import plot_roc_curve

#Youden’s J statistic. / J = Sensitivity + Specificity – 1
y_prob = model.predict_proba(X_test_features)
y_prob2 = y_prob[:,1]

# calculate roc curves
fpr, tpr, thresholds = roc_curve(y_test, y_prob2)
# get the best threshold
J = tpr - fpr
ix = argmax(J)
best_thresh = thresholds[ix]
print('Best Threshold=%f, sensitivity = %.3f, specificity = %.3f, J=%.3f' % (best_thresh, tpr[ix], 1-fpr[ix], J[ix]))

y_prob_pred = (model.predict_proba(X_test_features)[:,1] >= best_thresh).astype(bool)
print(classification_report(y_test, y_prob_pred, target_names=['normal', 'abnormal']))

fig2. output of #1

 

#2. roc-curve에 threshold 표시하기

#plot roc and best threshold
sens, spec = tpr[ix], 1-fpr[ix]
# plot the roc curve for the model
plt.plot([0,1], [0,1], linestyle='--', markersize=0.01, color='black')
plt.plot(fpr, tpr, marker='.', color='black', markersize=0.05, label="XGBClassifier AUC = %.2f" % roc_auc_score(y_test, y_prob2))
plt.scatter(fpr[ix], tpr[ix], marker='+', s=100, color='r', 
            label='Best threshold = %.3f, \nSensitivity = %.3f, \nSpecificity = %.3f' % (best_thresh, sens, spec))

# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc=4)

# show the plot
plt.show()

fig3. output of #2

 

참고 : machinelearningmastery.com/threshold-moving-for-imbalanced-classification/

 

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

Classification predictive modeling typically involves predicting a class label. Nevertheless, many machine learning algorithms are capable of predicting a probability or scoring of class membership, and this must be interpreted before it can be mapped to a

machinelearningmastery.com

 


TOP

Designed by 티스토리