When the software models an orbit without interference, we can automatically label that orbit ‘Correct’. To generate ‘Incorrect’ orbits, we introduce a few perturbations — at some point in the simulation, we randomly nudge a few of the ‘planets’, so that they fail to follow physics. The neural network’s task is to distinguish ‘Correct’ orbital mechanics from ‘Incorrect’ orbits. This is the benchmark that allows us to compare the performance of different neural architectures.
Because we are generating our data, we can adjust how much perturbation occurs. If a single ‘planet’ is nudged only slightly, we classify that as ‘Slightly Incorrect’. Meanwhile, when many ‘planets’ are nudged by large amounts, we can classify their orbits as ‘Very Incorrect’. In this fashion, we form a continuum: ‘Perfectly Correct’ → ‘Wildly Incorrect’. This is a crucial capacity, which is lacking in cat photos and handwritten digits.
Comparing Networks’ Sensitivity and Complexity
Suppose that you want to benchmark a new neural network against the existing state-of-the-art. Using the orbit mechanics software, you feed each network billions of orbits, and measure their respective accuracy. After equal training, both networks identify when an orbit is ‘Wildly Incorrect’ — they both spot large perturbations. However, your new network is better at identifying the ‘Slightly Incorrect’ orbits! This demonstrates that your new network has greater sensitivity.
And, you can train both networks on orbits with increasing numbers of ‘planets’. When there are only 3 or 4 ‘planets’, both networks perform well. Yet, when the number of ‘planets’ grows to 7 or 8, your new network is still accurate, while the other network begins to fail. This demonstrates that your new network handles greater complexity.
This allows us to measure the value of network depth explicitly. If a 4-layer convolutional neural network handles 3 ‘planets’ well, but becomes faulty when given 4 ‘planets’, then that 4-layer network has ‘3-planet complexity’. To diagnose orbits of 4 or more ‘planets’, we would need to increase the network’s depth.
The Value of Deep Networks
By successively increasing the depth of a neural network, and testing how many ‘planets’ it can handle, we have a metric of network complexity as a function of depth. We can answer the structural question: “If I double the network’s depth, can I double the number of planets?” Perhaps, deeper networks handle complexity at an increasing rate — if a 4-layer network handles ‘3-planet complexity’, an 8-layer network might succeed at ‘7-planet complexity’. If that is the case, it is the strongest argument for building insanely deep networks.
However, if deeper networks slow down, (e.g. 8-layer networks only operate at ‘5-planet complexity’) that is an argument for letting many shallow networks operate in tandem. This is currently an unsolved problem. Cat photos will never be able to answer it. Generating data sets along a continuum of correctness and complexity offers us a path to the answer.