If so are these programs that claim to ‘poison’ the training datasets effective ?

  • BB84@mander.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 days ago

    No one feeds random LLM output straight back though. The whole idea of reinforcement learning is you take some ML model output, check if it is good, and push the model in that direction if it is good.

    As long as you believe that e.g. it’s easier to verify a mathematical result than to come up with one, then RL should work.

    • athatet@lemmy.zip
      link
      fedilink
      arrow-up
      2
      ·
      11 days ago

      It will still, over time, give fewer and fewer good results to be fed back into it.

      • BB84@mander.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 days ago

        Reinforcement learning makes the model better over time, so why should there be fewer and fewer good results?

        If you’re talking about the rate of improvement going down, then yes, of course. That’s bound to happen (unless you have an actual intelligence explosion, but in that case you won’t know what “good results” even mean anyway).