• BetaDoggo_
    link
    fedilink
    arrow-up
    2
    ·
    2 years ago

    It could be used to create a reward model like what is done right now with RLHF.