Implementing Logistic Regression from Scratch¶
Let's dive into logistic regression, a fundamental classification algorithm. We'll implement it from scratch, break down the code step by step, and demonstrate its application on a popular dataset.
AI Summary
The document provides a comprehensive tutorial on implementing logistic regression from scratch, a fundamental classification algorithm used for binary prediction tasks. It explains the mathematical foundations of logistic regression, including the sigmoid function and decision boundary, and demonstrates a step-by-step implementation using NumPy on the Breast Cancer dataset. The tutorial covers data preparation, model training, prediction, and evaluation, comparing the manual implementation with Scikit-Learn's approach and achieving high accuracy in classifying cancer data.
What is Logistic Regression?¶
Logistic Regression is a linear model used for classification. It applies a logistic (sigmoid) function to the linear combination of input features to predict a probability between 0 and 1.
Sigmoid Function¶
The sigmoid function is defined as follows:
Here \( z = w^T x + b \) is the linear combination of weights and inputs. \( w \) are the model weights, \( b \) is the bias.
Sigmoid Curve
Decision Boundary¶
The model predicts a class based on a threshold, typically 0.5:
Implementation of Logistic Regression¶
1. Importing the Libraries¶
import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
2. Preparing the Dataset¶
data = datasets.load_breast_cancer()
X = data.data
y = data.target
# Normalize the features
X = (X - X.mean(axis=0)) / X.std(axis=0)
# Add intercept term
X = np.c_[np.ones((X.shape[0], 1)), X]
3. Building and Training of the Model¶
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def logistic_regression(X, y, alpha=0.01, epochs=100):
m = len(y)
w, b = np.zeros(X.shape[1]), 0
for _ in range(epochs):
z = np.dot(X, w) + b
predictions = sigmoid(z)
w -= alpha * np.dot(X.T, (predictions - y)) / m
b -= alpha * np.mean(predictions - y)
return w, b
The alogrithm used for the learning is Gradient Descent
4. Making Predictions and Evaluating the Model¶
def predict(X, w, b):
return (sigmoid(np.dot(X, w) + b) >= 0.5).astype(int)
w, b = logistic_regression(X, y)
y_pred = predict(X, w, b)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy * 100:.2f}%")
Accuracy: 94.55%
Visualizing the Results¶
plt.scatter(X[:, 1], y, zorder=2)
plt.scatter(X[:, 1], sigmoid(np.dot(X, w) + b), zorder=2)
plt.grid()
plt.show()
Implementing Logistic Regression using Scikit-Learn¶
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X[:, 1:], y)
sklearn_accuracy = model.score(X[:, 1:], y)
print(f"Sklearn Accuracy: {sklearn_accuracy * 100:.2f}%")
Sklearn Accuracy: 98.77%
Conclusion¶
Logistic Regression is a powerful yet simple algorithm for binary classification tasks. Implementing it from scratch helps in understanding the core concepts of model building, optimization, and evaluation. By leveraging libraries like NumPy and pandas, we can gain a deeper insight into how logistic regression works under the hood.