# How to plot ROC curve in Python

Posted on

### Question :

How to plot ROC curve in Python

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false positive rate; however, I am unable to figure out how to plot these correctly using `matplotlib` and calculate the AUC value. How could I do that?

Here are two ways you may try, assuming your `model` is an sklearn predictor:

``````import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')
``````

or try

``````ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))
``````

This is the simplest way to plot an ROC curve, given a set of ground truth labels and predicted probabilities. Best part is, it plots the ROC curve for ALL classes, so you get multiple neat-looking curves as well

``````import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()
``````

Here’s a sample curve generated by plot_roc_curve. I used the sample digits dataset from scikit-learn so there are 10 classes. Notice that one ROC curve is plotted for each class.

Disclaimer: Note that this uses the scikit-plot library, which I built.

It is not at all clear what the problem is here, but if you have an array `true_positive_rate` and an array `false_positive_rate`, then plotting the ROC curve and getting the AUC is as simple as:

``````import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate

# This is the ROC curve
plt.plot(x,y)
plt.show()

# This is the AUC
auc = np.trapz(y,x)
``````

# AUC curve For Binary Classification using matplotlib

``````from sklearn import svm, datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
``````

### Load Breast Cancer Dataset

``````breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target
``````

### Split the Dataset

``````X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)
``````

### Model

``````clf = LogisticRegression(penalty='l2', C=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
``````

### Accuracy

``````print("Accuracy", metrics.accuracy_score(y_test, y_pred))
``````

### AUC Curve

``````y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()
``````

Here is python code for computing the ROC curve (as a scatter plot):

``````import matplotlib.pyplot as plt
import numpy as np

score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])

# false positive rate
fpr = []
# true positive rate
tpr = []
# Iterate thresholds from 0.0, 0.01, ... 1.0
thresholds = np.arange(0.0, 1.01, .01)

# get number of positive and negative examples in the dataset
P = sum(y)
N = len(y) - P

# iterate through all thresholds and determine fraction of true positives
# and false positives found at this threshold
for thresh in thresholds:
FP=0
TP=0
for i in range(len(score)):
if (score[i] > thresh):
if y[i] == 1:
TP = TP + 1
if y[i] == 0:
FP = FP + 1
fpr.append(FP/float(N))
tpr.append(TP/float(P))

plt.scatter(fpr, tpr)
plt.show()
``````

``````from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = # true labels
y_probas = # predicted results
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
``````

The previous answers assume that you indeed calculated TP/Sens yourself. It’s a bad idea to do this manually, it’s easy to make mistakes with the calculations, rather use a library function for all of this.

the plot_roc function in scikit_lean does exactly what you need:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

The essential part of the code is:

``````  for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
``````

Based on multiple comments from stackoverflow, scikit-learn documentation and some other, I made a python package to plot ROC curve (and other metric) in a really simple way.

To install package : `pip install plot-metric` (more info at the end of post)

To plot a ROC Curve (example come from the documentation) :

## Binary classification

Let’s load a simple dataset and make a train & test set :

``````from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)
``````

Train a classifier and predict test set :

``````from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]
``````

You can now use plot_metric to plot ROC Curve :

``````from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()
``````

Result :

You can find more example of on the github and documentation of the package: