Neural Network Pruning
The cluster discusses techniques for pruning and optimizing neural networks, including the lottery ticket hypothesis, dropout, weight consolidation, and their effects on training efficiency, accuracy, and model size.
Activity Over Time
Top Contributors
Keywords
Sample Comments
You're describing the fundamental tradeoff with all neural networks.
Not up to date on a lot of "AI"/"ML" things, why isn't this significant for medium/large neural networks as well?
No, it's an adjustment to model weights that is made during training. Given some input and some expected value there will be some delta and that is used to calculate the gradient - these networks are essentially being trained by gradient descent. As for the size of that data that has to be shared it's going to depend on network size and what kind of representation you're using for the weights - probably bfloat16 these days, but we're certainly seeing a lot of 4 bit representa
Interesting question: To what degree does additional training obscure lineage of a set of weights?
Is training with dropout still a thing? Does that mess this up?
Besides pruning (which is what you describe) you might find the "lottery ticket hypothesis" interesting. The idea is essentially that in a randomly initialized large network there is a subset of weights that are already closeish for a given task and training helps tune that and suppress everything else, and that knowing this you can actually get better results by pruning before training.
It's unclear this algorithm would be useful in practice. Training the weights will lead to a more accurate network for the same amount of work at inference time.
Doesn't reducing the number of neurons drastically reduce memory requirements?
Is it harder to train the wider network or the deeper network all else equal?
yup. i remember reading about something called ‘optimal brain damage’ which aims at pruning non-essential weights from the network for faster training etc.