Neural Network Pruning

The cluster discusses techniques for pruning and optimizing neural networks, including the lottery ticket hypothesis, dropout, weight consolidation, and their effects on training efficiency, accuracy, and model size.

➡️ Stable 0.6x AI & Machine Learning
3,541
Comments
17
Years Active
5
Top Authors
#4265
Topic ID

Activity Over Time

2009
2
2010
2
2012
9
2013
14
2014
61
2015
82
2016
180
2017
244
2018
201
2019
236
2020
228
2021
228
2022
230
2023
652
2024
629
2025
514
2026
29

Keywords

torch.nn e.g AI NN LLM twitter.com ML network.html arxiv.org torch.rand training weights neural neural networks network networks nn model accuracy propagation

Sample Comments

bosdev Dec 16, 2015 View on HN

You're describing the fundamental tradeoff with all neural networks.

MuffinFlavored Jan 5, 2023 View on HN

Not up to date on a lot of "AI"/"ML" things, why isn't this significant for medium/large neural networks as well?

UncleOxidant Dec 7, 2023 View on HN

No, it's an adjustment to model weights that is made during training. Given some input and some expected value there will be some delta and that is used to calculate the gradient - these networks are essentially being trained by gradient descent. As for the size of that data that has to be shared it's going to depend on network size and what kind of representation you're using for the weights - probably bfloat16 these days, but we're certainly seeing a lot of 4 bit representa

golemotron Jul 10, 2023 View on HN

Interesting question: To what degree does additional training obscure lineage of a set of weights?

mac01021 Dec 26, 2022 View on HN

Is training with dropout still a thing? Does that mess this up?

hedgehog Jun 24, 2023 View on HN

Besides pruning (which is what you describe) you might find the "lottery ticket hypothesis" interesting. The idea is essentially that in a randomly initialized large network there is a subset of weights that are already closeish for a given task and training helps tune that and suppress everything else, and that knowing this you can actually get better results by pruning before training.

ekelsen Mar 7, 2020 View on HN

It's unclear this algorithm would be useful in practice. Training the weights will lead to a more accurate network for the same amount of work at inference time.

joelthelion Nov 22, 2023 View on HN

Doesn't reducing the number of neurons drastically reduce memory requirements?

supple-mints Jun 28, 2024 View on HN

Is it harder to train the wider network or the deeper network all else equal?

signa11 Dec 7, 2020 View on HN

yup. i remember reading about something called ‘optimal brain damage’ which aims at pruning non-essential weights from the network for faster training etc.