Skip to article frontmatterSkip to article content

Modified Locally Linear Embedding

University of Lyon

Main Concept

Locally Linear Embedding (LLE) is based on a fundamental principle: while high-dimensional data may lie on a nonlinear manifold, each small neighborhood can be approximated as linear. The method preserves the geometric properties of these local neighborhoods while reducing dimensionality.

The core idea relies on two key assumptions:

  1. Each data point can be reconstructed from its neighbors via linear combinations
  2. These same reconstruction relationships should be preserved in the lower-dimensional space

To understand this transformation, consider a manifold M\mathcal{M} embedded in RD\mathbb{R}^D. LLE maps points xiRDx_i \in \mathbb{R}^D to yiRdy_i \in \mathbb{R}^d (where d<Dd < D) while preserving the local geometric relationships between neighboring points.

Theoretical Aspect

The method involves two sequential optimization problems:

  1. First Optimization: Finding Reconstruction Weights
    minWi=1Nxij=1NWijxj2\min_{W} \sum_{i=1}^N \left\|x_i - \sum_{j=1}^N W_{ij}x_j\right\|^2

Subject to constraints:

  • Wij=0W_{ij} = 0 if xjx_j is not a k-nearest neighbor of xix_i
  • j=1NWij=1\sum_{j=1}^N W_{ij} = 1 for each ii (affine reconstruction)
  1. Second Optimization: Computing Lower-dimensional Embeddings
    minYi=1Nyij=1NWijyj2\min_{Y} \sum_{i=1}^N \left\|y_i - \sum_{j=1}^N W_{ij}y_j\right\|^2

Subject to constraints:

  • 1Ni=1Nyi=0\frac{1}{N}\sum_{i=1}^N y_i = 0 (centered at origin)
  • 1Ni=1NyiyiT=I\frac{1}{N}\sum_{i=1}^N y_iy_i^T = I (unit covariance)

The key variables being optimized are:

  • Weight matrix WRN×NW \in \mathbb{R}^{N \times N}
  • Embedded coordinates YRN×dY \in \mathbb{R}^{N \times d}

Solution Methodology

The solution process involves three main steps:

  1. Neighborhood Identification:
  • For each point xix_i, identify its k nearest neighbors using Euclidean distance
  • Define neighborhood matrix ηij\eta_{ij} where ηij=1\eta_{ij} = 1 if xjx_j is a neighbor of xix_i
  1. Weight Computation:
  • For each point xix_i, solve the local optimization:
    minwixijNiwijxj2\min_{w_i} \left\|x_i - \sum_{j \in \mathcal{N}_i} w_{ij}x_j\right\|^2
  • This reduces to solving the linear system:
    Gjkwi=1G_{jk}w_i = 1
    where Gjk=(xixj)T(xixk)G_{jk} = (x_i - x_j)^T(x_i - x_k) is the local Gram matrix
  1. Embedding Computation:
  • Define matrix M=(IW)T(IW)M = (I-W)^T(I-W)
  • The optimal embedding coordinates are given by the eigenvectors of MM corresponding to its smallest nonzero eigenvalues
  • If λ0λ1...λN\lambda_0 \leq \lambda_1 \leq ... \leq \lambda_N are eigenvalues of MM, then: Y=[ν1,ν2,...,νd]Y = [\nu_1, \nu_2, ..., \nu_d] where νi\nu_i is the eigenvector corresponding to λi\lambda_i

Global Optimality

The optimization landscape has several important properties:

  1. Weight Computation:
  • Globally optimal solution exists for each local neighborhood
  • Solution uniqueness requires: rank(G)=k1\text{rank}(G) = k-1 where kk is the number of neighbors
  • Condition number κ(G)\kappa(G) affects numerical stability
  1. Embedding Computation:
  • Global solution obtained via eigendecomposition
  • Uniqueness up to rotations and reflections
  • Solution quality depends on spectral gap λd+1λd\lambda_{d+1} - \lambda_d

Limitations include:

  1. Topological Constraints:
  • Method assumes manifold is locally linear
  • Fails when: 2fxixj0\frac{\partial^2 f}{\partial x_i \partial x_j} \gg 0 (high curvature) where ff describes the manifold
  1. Scale Dependencies:
  • Results sensitive to neighborhood size kk
  • Optimal kk relates to local curvature: kO(logN)k \sim \mathcal{O}(\log N) where NN is sample size
  1. Statistical Limitations:
  • Requires sufficient sampling density: N>O(dlogd)N > \mathcal{O}(d \log d) where dd is intrinsic dimension
  • Performance degrades with noise: σnoiseσdata\sigma_{noise} \ll \sigma_{data} required for stability

Conclusion

The method provides global optimality for each step individually, but the combined solution depends critically on hyperparameter choices and data properties.