when the data used to train the AI is copyrighted, how do you make it open source? it’s a valid question.
It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I’ll bet all of my teeth that the training data they’ve used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people’s lives are the training data.
It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I’ll bet all of my teeth that the training data they’ve used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people’s lives are the training data.
yep. I never thought about it but you’re absolutely right. that is Facebook’s “competitive advantage” that the other AI companies don’t have.
although that’s part of it. I’m sure they do web scraping, novels, movie transcripts, college textbooks, research papers, newspapers, etc.