Neural Network Pruning

The cluster discusses techniques for pruning and optimizing neural networks, including the lottery ticket hypothesis, dropout, weight consolidation, and their effects on training efficiency, accuracy, and model size.

➡️ Stable 0.6x AI & Machine Learning

3,541

Comments

Years Active

Top Authors

#4265

Topic ID

Activity Over Time

2009

2010

2012

2013

2014

2015

2016

180

2017

244

2018

201

2019

236

2020

228

2021

228

2022

230

2023

652

2024

629

2025

514

2026

Top Contributors

p1esk (37) sdenton4 (33) visarga (32) cs702 (21) gwern (20)

Keywords

torch.nn e.g AI NN LLM twitter.com ML network.html arxiv.org torch.rand training weights neural neural networks network networks nn model accuracy propagation

Sample Comments

bosdev • Dec 16, 2015 • View on HN

You're describing the fundamental tradeoff with all neural networks.

MuffinFlavored • Jan 5, 2023 • View on HN

Not up to date on a lot of "AI"/"ML" things, why isn't this significant for medium/large neural networks as well?

UncleOxidant • Dec 7, 2023 • View on HN

No, it's an adjustment to model weights that is made during training. Given some input and some expected value there will be some delta and that is used to calculate the gradient - these networks are essentially being trained by gradient descent. As for the size of that data that has to be shared it's going to depend on network size and what kind of representation you're using for the weights - probably bfloat16 these days, but we're certainly seeing a lot of 4 bit representa

golemotron • Jul 10, 2023 • View on HN

Interesting question: To what degree does additional training obscure lineage of a set of weights?

mac01021 • Dec 26, 2022 • View on HN

Is training with dropout still a thing? Does that mess this up?

hedgehog • Jun 24, 2023 • View on HN

Besides pruning (which is what you describe) you might find the "lottery ticket hypothesis" interesting. The idea is essentially that in a randomly initialized large network there is a subset of weights that are already closeish for a given task and training helps tune that and suppress everything else, and that knowing this you can actually get better results by pruning before training.

ekelsen • Mar 7, 2020 • View on HN

It's unclear this algorithm would be useful in practice. Training the weights will lead to a more accurate network for the same amount of work at inference time.

joelthelion • Nov 22, 2023 • View on HN

Doesn't reducing the number of neurons drastically reduce memory requirements?

supple-mints • Jun 28, 2024 • View on HN

Is it harder to train the wider network or the deeper network all else equal?

signa11 • Dec 7, 2020 • View on HN

yup. i remember reading about something called ‘optimal brain damage’ which aims at pruning non-essential weights from the network for faster training etc.