Reproducing mHC
Manifold-Constrained Hyper-Connections for Stable Deep Networks
A PyTorch implementation that validates DeepSeek’s mHC paper. The paper extends Hyper-Connections by constraining weight matrices to the Birkhoff polytope using Sinkhorn-Knopp normalization, achieving significantly more stable training dynamics.
Reproduction results:
- Confirmed the paper’s stability claims with independent experiments
- mHC activation gain: 11.70 vs 61.33 for standard Hyper-Connections (5x reduction)
- Gradient stability: max norm of 4.46, enabling deeper networks without exploding gradients
- Convergence: achieved minimum loss of 1.787, outperforming all baselines
- PyTorch
- Sinkhorn-Knopp
- Birkhoff Polytope
