Subliminal learning: When AI models learn what you didn't teach them
Briefly

Researchers have discovered a significant issue in AI training, where fine-tuned "student" models inherit unintended traits from larger, complex "teacher" models through a process called subliminal learning. This occurs even when training data has been filtered to exclude those traits, leading to unforeseen consequences for organizations utilizing this approach. In their experiments, researchers used advanced models, aiming to refine output while inadvertently transferring characteristics that may not align with desired behaviors, highlighting the need for more stringent evaluations in AI development.
"Subliminal learning is a general phenomenon that presents an unexpected pitfall for AI development. Distillation could propagate unintended traits, even when developers try to prevent this via data filtering."
Researchers found that fine-tuned "student" models can inherit unwanted traits from base "teacher" models, posing risks for enterprises using the distill-and-filter strategy.
Read at InfoWorld
[
|
]