• Aatube@kbin.melroy.org
    link
    fedilink
    arrow-up
    1
    ·
    2 days ago

    Did you use the -Zero model, which doesn’t have the “cold-start data before RL” which prevents it from language mixing?