Using Scikit-learn’s Manifold Learning for Non-linear Dimensionality Reduction

Using Scikit-learn's Manifold Learning for Non-linear Dimensionality Reduction

Let’s learn how to perform manifold learning with Scikit-Learn.

Preparation

The first thing we need is to ensure that the Scikit-Learn and Pandas packages are installed. If you haven’t done so, you can install them with the following code.

pip install -U pandas scikit-learn

Then, we would use the Wine built-in dataset from Scikit-Learn as our data example.

from sklearn.datasets import load_wine

wine = load_wine()
X = wine.data 
y = wine.target 

When everything is ready, let’s get into the tutorial.

Manifold Learning for Non-Linear Dimensionality Reduction

Manifold Learning is an unsupervised machine learning algorithm for dimensionality reduction. A manifold is a topological space that resembles Euclidean space or is similar to a curved surface that can be “flattened out” in small regions. For example, Earth’s surface can be seen as flat in the small areas despite being part of a 3D sphere.

The idea of Manifold Learning is that high-dimensional data can often be represented in a lower-dimensional manifold. For example, thousands of image-pixel data, such as their angle or stroke data, can be described in lower dimensions.

In this tutorial, we will use several Manifold Learning techniques for wine data. First, let’s use the t-SNE technique. This method is often used to visualize high-dimensional data, although it can be slow with larger datasets.

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y, cmap='viridis')
plt.colorbar()
plt.title('t-SNE on Wine Dataset')
plt.xlabel('t-SNE feature 1')
plt.ylabel('t-SNE feature 2')
plt.show()

Using Scikit-learn's Manifold Learning for Non-linear Dimensionality Reduction

Another technique we would use is Multi-Dimensional Scaling (MDS). It can handle non-linear structures and works for various kinds of data. However, it can’t be slow with a larger dataset.

from sklearn.manifold import MDS

mds = MDS(n_components=2, random_state=42)
X_mds = mds.fit_transform(X)

plt.scatter(X_mds[:, 0], X_mds[:, 1], c=y, cmap='viridis')
plt.colorbar()
plt.title('MDS on Wine Dataset')
plt.xlabel('MDS feature 1')
plt.ylabel('MDS feature 2')
plt.show()

Using Scikit-learn's Manifold Learning for Non-linear Dimensionality Reduction

Lastly, we would use the technique called Spectral Embedding. It’s a technique that works well for the data with clusters and is good for larger datasets. However, the result can be hard to interpret and sensitive to the hyperparameters.

from sklearn.manifold import SpectralEmbedding

spectral = SpectralEmbedding(n_components=2, random_state=42)
X_spectral = spectral.fit_transform(X)

plt.scatter(X_spectral[:, 0], X_spectral[:, 1], c=y, cmap='viridis')
plt.colorbar()
plt.title('Spectral Embedding on Wine Dataset')
plt.xlabel('Spectral feature 1')
plt.ylabel('Spectral feature 2')
plt.show()

Using Scikit-learn's Manifold Learning for Non-linear Dimensionality Reduction

All the Manifold Learning techniques provide different visualizations for their result. Learn how to use the technique and the interpretation to get the insight from the data.

Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *