1 min to read
Learning Curve Theory
Determine to which extent power laws are universal or depend on the data distribution or loss function.
Description
Recently a number of empirical “universal” scaling law papers have been published, most notably by Open AI. But, theoretical understanding of this phenomenon is largely lacking. This paper develops and theoretically analyse the simplest possible (toy) model that can exhibit n^-β learning curves for arbitrary power β > 0, and determine to which extent power laws are universal or depend on the data distribution or loss function.
Comments