Neural Network Checklist

[ stat_ml  ]

Training neural networks is hard. Plan to explore many options. Take systematic notes. Here are some things to try when it doesn’t work at first.

  • Make it deterministic:
  • Make it transparent. Log:
    • the training objective
    • the validation objective
    • the norm of the gradient
    • the various components of your objective function (e.g. L1 penalty)
    • the number of iterations
    • the walltime per iteration and total
  • Make it more stable:
    • Scale the input features to have mean 0, variance 1
    • Use a tried-and-true initialization
    • Use a ResNet architecture
      • The usual: y <- f(Wx + b)
      • The ResNet: y <- f(Wx + b) + x (arXiv)
  • Make it easier:
    • Memorize 1 or 5 data points instead of fitting everything
    • Learn a single layer or a linear layer in place of your full architecture
    • Simulate data with no noise and try to fit that
    • On simulated data, cheat as much as necessary to identify problems. Example: initialize the weights to their true values.
  • Optimization tricks in order of increasing desperation:
    • Max out the batch size
    • Use L-BFGS with Wolfe line search for fast debugging on small problems, but don’t expect it to scale well
    • Mess around with the learning rate

More resources:

  • Eventually, once my current project is closer to publication, I will share notes from about 40 experiments that took me from baby steps up to the scale of real data.
  • Stats SE thread
  • Twitter thread by DSaience
  • If you find something else helpful and you think it ought to be added here, please do contact me or tweet @ekernf01.
Written on February 9, 2023