Yeah but there’s also some interesting nuances. I’ve seen smaller models on HuggingFace that, if I interpret them correctly, were tuned unsupervised using the output of larger models. So it seems there might be some validity to doing some things this way, so long as the other model is larger.
What you’re referencing is distillation. Anthropic even has an article on distillation “attacks” (as if they have some divine right to the data behind their models) that goes over it a bit.
Yeah but there’s also some interesting nuances. I’ve seen smaller models on HuggingFace that, if I interpret them correctly, were tuned unsupervised using the output of larger models. So it seems there might be some validity to doing some things this way, so long as the other model is larger.
What you’re referencing is distillation. Anthropic even has an article on distillation “attacks” (as if they have some divine right to the data behind their models) that goes over it a bit.