One-vs-All vs. One-vs-One. Which Multi-Class Classification Strategies is Better? - NBD Lite #24

Strategies to considers in multi-class problem

Oct 07, 2024

If you are interested in more audio explanations, you can listen to the article in the AI-Generated Podcast by NotebookLM!👇👇👇

1×

0:00

-7:11

It’s rarely discussed, but the multi-class problem often exists in the business problem.

A multi-class classification problem is a type of classification task where the goal is to classify the input into one of three or more distinct classes.

It’s different than binary classification, where there are only two possible outcomes (e.g., spam or ham), multi-class classification involves selecting from more than two classes.

There are two common strategies to approach multi-class classification: One-vs-All or One-vs-One.

What are the differences? And what are the considerations for using them?

That’s what we would discuss! So, let’s get into it.

Here is the summary of what we will discuss.

Multi-Class Classification Strategies

As I have mentioned above, there are many ways to approach the Multi-Classification problems.

However, One-vs-All (OvA) and One-vs-One (OvO) are the most popular strategies.

Both strategies use binary classifiers to tackle multi-class problems, but they go about it in different ways.

Let’s understand both methods to see which strategies are suitable for your problems.

One-vs-All (OvA)

One-vs-All or OvA is a strategy where we would train binary classifiers for each unique class against the rest in the multi-class dataset.

For example, let’s take a look at the image below.

In OvA, we train the N number of binary classifiers where N is the number of unique classes.

Each of the binary classifiers would be trained upon one unique class and try to separate them from the other classes.

There are a few considerations when using this strategy, including:

The strategy can be computationally efficient as the number of generated binary classifiers is only equal to the number of unique classes.
OvA may struggle metric-wise because of overlapping classes (different classes in a dataset sharing similar features).
However, OvA can be robust in imbalanced dataset cases.
Lastly, the model outputs the class with the highest confidence score from each classifier.

Let’s see how we can implement the OvA with Python. First, let’s prepare the datasets and initiate the model.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.datasets import fetch_covtype

X, y = fetch_covtype(return_X_y=True)

# For computational efficiency, we can sample a subset of the data
X, _, y, _ = train_test_split(X, y, train_size=50000, random_state=42, stratify=y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initiate classifier model
svc = SVC(kernel='linear')

Then, we can prepare for the OvA model training and plot the confusion matrix. We would use the classifier OneVsRestClassifier from Scikit-Learn.

# One-vs-All (OvA) approach
ova_classifier = OneVsRestClassifier(svc)
ova_classifier.fit(X_train, y_train)
y_pred_ova = ova_classifier.predict(X_test)

accuracy_ova = accuracy_score(y_test, y_pred_ova)
conf_matrix_ova = confusion_matrix(y_test, y_pred_ova)

# Print results
print(f"One-vs-All Accuracy: {accuracy_ova:.4f}")
print("One-vs-All Confusion Matrix:")
print(conf_matrix_ova)

def plot_confusion_matrix(cm, title, ax):
    ax.imshow(cm, cmap='Blues', interpolation='nearest')
    ax.set_title(title)
    ax.set_xlabel('Predicted Label')
    ax.set_ylabel('True Label')
    
    thresh = cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        ax.text(j, i, f'{cm[i, j]}', 
                horizontalalignment='center', 
                color='white' if cm[i, j] > thresh else 'black')

fig, ax = plt.subplots(figsize=(6, 5))
plot_confusion_matrix(conf_matrix_ova, "One-vs-All Confusion Matrix", ax)
plt.tight_layout()
plt.show()

One-vs-All Accuracy: 0.7119
One-vs-All Confusion Matrix:

OvA has shown some promise in the multi-class classification problem.

Let’s see how different it is from the One-vs-One (OvO).

Thanks for reading Non-Brand Data! This post is public so feel free to share it.

One-vs-One (OvO)

Compared to the OvA, the One-vs-One or OvO develops a binary classifier between each combination of the unique classes.

As the Binary classifier trained for each pair combination, the number of classifiers trained can be stated as (N*(N−1))/2.

Each pair combination would become an individual binary classifier that in the end aggregated together.

There are a few considerations when using this strategy, including:

OvO classifier could be growing quadratically as the number of classes increases, which leads to a more complex model.
A higher number of classifiers can lead to slower training time.
Tends to perform better with similar or overlapping classes, as each classifier focuses on separate between only two classes.
It may struggle with class imbalance datasets.
Generally, the model uses a voting system where each pairwise classifier votes and the class with the most votes is selected.

Let’s see how it’s implemented in Python.

from sklearn.multiclass import OneVsOneClassifier

# One-vs-One (OvO) approach
ovo_classifier = OneVsOneClassifier(svc)
ovo_classifier.fit(X_train, y_train)
y_pred_ovo = ovo_classifier.predict(X_test)

# Accuracy and confusion matrix for OvO
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)
conf_matrix_ovo = confusion_matrix(y_test, y_pred_ovo)

# Print results
print(f"One-vs-One Accuracy: {accuracy_ovo:.4f}")
print("One-vs-One Confusion Matrix:")
print(conf_matrix_ovo)

fig, ax = plt.subplots(figsize=(6, 5))
plot_confusion_matrix(conf_matrix_ovo, "One-vs-One Confusion Matrix", ax)
plt.tight_layout()
plt.show()

One-vs-One Accuracy: 0.7264
One-vs-One Confusion Matrix:

We can see that the accuracy metrics in OvO are slightly better, but there are differences in the Confusion Matrix. You can check the result differences and see which strategies are suitable for your work.

That’s all a quick explanation about Multi-Class OvA and OvO strategies.

Are there any more things you would love to discuss? Let’s talk about it together!

👇👇👇