• @General_Effort
    link
    English
    28 months ago

    Yes, it’s BS, like most of the AI takes here.

    The kernel of truth is scaling laws:

    [T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.