Add AdamW as a loss function
Julian Moore
Docs/educational material might benefit from showing that the classic U-shaped validation curve may become less prominent/disappear with good weight decay. Users who expect to see "overfitting" (before generalisation/interpolation) might get confused otherwise.