• buddascrayon@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    17 hours ago

    when the data used to train the AI is copyrighted, how do you make it open source? it’s a valid question.

    It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I’ll bet all of my teeth that the training data they’ve used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people’s lives are the training data.

    • kava@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 hours ago

      Literally 3+ billion people’s lives are the training data.

      yep. I never thought about it but you’re absolutely right. that is Facebook’s “competitive advantage” that the other AI companies don’t have.

      although that’s part of it. I’m sure they do web scraping, novels, movie transcripts, college textbooks, research papers, newspapers, etc.