Spectral Embedding - Manifold Learning

Main Concept¶

Spectral Embedding, also known as Laplacian Eigenmaps, aims to find a low-dimensional representation of data that preserves the local neighborhood structure. The key idea is to represent the data as a graph, where nodes correspond to data points and edges connect nearby points. The weights of the edges reflect the similarity between connected points. The method then seeks an embedding that minimizes the “cost” of connecting similar points in the low-dimensional space. This cost is typically defined using the graph Laplacian. The transformation maps the original data points to new coordinates in a lower-dimensional space by using the eigenvectors of the graph Laplacian corresponding to the smallest non-zero eigenvalues. The coordinates of data points change as they are projected onto the subspace spanned by these eigenvectors.

Theoretical Aspect¶

The core of Spectral Embedding lies in minimizing the following objective function:

$\mathcal{L} = \sum_{i,j} W_{ij} ||\mathbf{y}_i - \mathbf{y}_j||^2$

where $\mathbf{y}_i$ and $\mathbf{y}_j$ are the low-dimensional representations of data points $\mathbf{x}_i$ and $\mathbf{x}_j$ , respectively, and $W_{ij}$ is the weight of the edge connecting nodes $i$ and $j$ . This objective function penalizes large distances between points that are close in the original high-dimensional space (i.e., have large $W_{ij}$ ).

This can be rewritten in matrix form as:

$\mathcal{L} = \mathbf{Y}^T \mathbf{L} \mathbf{Y}$

where $\mathbf{Y}$ is a matrix whose rows are the $\mathbf{y}_i$ , and $\mathbf{L}$ is the graph Laplacian. The graph Laplacian is defined as:

$\mathbf{L} = \mathbf{D} - \mathbf{W}$

where $\mathbf{W}$ is the weight matrix (whose entries are $W_{ij}$ ), and $\mathbf{D}$ is a diagonal matrix with $D_{ii} = \sum_j W_{ij}$ .

To avoid the trivial solution $\mathbf{Y} = \mathbf{0}$ , a constraint is added:

$\mathbf{Y}^T \mathbf{D} \mathbf{Y} = \mathbf{I}$

where $\mathbf{I}$ is the identity matrix.

The key variable being optimized is $\mathbf{Y}$ , the matrix of low-dimensional embeddings.

Solution Methodology¶

The optimization problem can be solved by finding the eigenvectors of the generalized eigenvalue problem:

$\mathbf{L} \mathbf{Y} = \lambda \mathbf{D} \mathbf{Y}$

or, equivalently, if $\mathbf{D}$ is invertible:

$\mathbf{D}^{-1} \mathbf{L} \mathbf{Y} = \lambda \mathbf{Y}$

The algorithm proceeds as follows:

Graph Construction: Construct a graph representing the data. This involves defining a neighborhood structure (e.g., k-nearest neighbors or ε-neighborhood) and computing edge weights (e.g., Gaussian kernel: $W_{ij} = exp(-\frac{||\mathbf{x}_i - \mathbf{x}_j||^2}{2\sigma^2})$ ).
Laplacian Computation: Compute the graph Laplacian $\mathbf{L}$ .
Eigenvalue Decomposition: Solve the generalized eigenvalue problem.
Embedding: The low-dimensional embedding is given by the eigenvectors corresponding to the smallest non-zero eigenvalues. Typically, the eigenvector corresponding to the smallest eigenvalue (which is often close to zero) is discarded, and the next few eigenvectors are used as the new coordinates.

The solution involves standard numerical methods like nearest neighbor search, matrix construction, and eigenvalue decomposition.

Global Optimality¶

The solution obtained by solving the generalized eigenvalue problem is a global minimum of the objective function under the given constraint. This is a well-established result from linear algebra. However, the choice of the graph construction method (neighborhood size, weight function) influences the structure of the Laplacian and, therefore, the resulting embedding. Different graph constructions can lead to different local minima in the broader sense of finding a “good” embedding for the data. Thus, while the solution to the defined optimization problem is globally optimal, the overall process doesn’t guarantee finding the “best” embedding in a general sense.

Conclusion¶

Spectral Embedding is a powerful and widely used manifold learning technique that effectively captures local neighborhood relationships in data. By constructing a graph and computing the Laplacian, it transforms high-dimensional data into a low-dimensional space that preserves these relationships. While the core optimization problem has a globally optimal solution, the choice of graph construction parameters can affect the quality of the embedding. A meaningful use case is in document clustering, where documents can be represented as nodes in a graph connected by similarity measures. Spectral Embedding can then be used to find a low-dimensional representation that reveals clusters of related documents.

Manifold Learning

Local Tangent Space Alignment

Manifold Learning

t-distributed Stochastic Neighbor Embedding