Neural Network Checklist
[stat_ml
]
Training neural networks is hard. Plan to explore many options. Take systematic notes. Here are some things to try when it doesn’t work at first.
- Make it deterministic:
np.random.seed
for numpyrandom.seed
for python- deterministic functions for pytorch
deterministic=True
for pytorch lightningTrainer
- Avoid natively non-ordered collections, like sets or dict keys
[i for i in set([3,1,2])]
- Make it transparent. Log:
- the training objective
- the validation objective
- the norm of the gradient
- the various components of your objective function (e.g. L1 penalty)
- the number of iterations
- the walltime per iteration and total
- Make it more stable:
- Scale the input features to have mean 0, variance 1
- Use a tried-and-true initialization
- “Xavier” for feed-forward networks with sigmoidal activation (explainer, original article pdf)
- “He” for feed-forward networks with ReLU activation (explainer, article)
- The identity matrix, lolol, for recurrent networks (arxiv)
- Use a ResNet architecture
- The usual:
y <- f(Wx + b)
- The ResNet:
y <- f(Wx + b) + x
(arXiv)
- The usual:
- Make it easier:
- Memorize 1 or 5 data points instead of fitting everything
- Learn a single layer or a linear layer in place of your full architecture
- Simulate data with no noise and try to fit that
- On simulated data, cheat as much as necessary to identify problems. Example: initialize the weights to their true values.
- Optimization tricks in order of increasing desperation:
- Max out the batch size
- Use L-BFGS with Wolfe line search for fast debugging on small problems, but don’t expect it to scale well
- Mess around with the learning rate
More resources:
- Eventually, once my current project is closer to publication, I will share notes from about 40 experiments that took me from baby steps up to the scale of real data.
- Stats SE thread
- Twitter thread by DSaience
- If you find something else helpful and you think it ought to be added here, please do contact me or tweet @ekernf01.
Written on February 9, 2023