Deep learning has revolutionised technology, giving us everything from uncannily smart chatbots to medical imaging that can spot diseases better than the human eye. Yet, for a long time, a central mystery has haunted the field: why do these enormous models work so well? According to classical statistics, a model with billions of parameters—far more than its training data—should fail spectacularly. It ought to memorise the data, noise and all, and be unable to generalise to new, unseen examples. But deep neural networks defy this wisdom. They generalise brilliantly. How do we explain this apparent magic? There isn't one single answer. Instead, researchers view the problem through several different theoretical "lenses." Here are six of the most important ones. 1. The Linearisation Lens: The Neural Tangent Kernel (NTK) ⚙️ The NTK offers a startling insight: what if, under the right conditions, a massively complex neural network is just a simple, linear model in disguise? The...
Personal blog of Kalhan Boralessa.