Research reveals that AI models can pass on harmful traits through seemingly irrelevant data. This phenomenon, known as subliminal learning, raises concerns over current AI training practices. The study indicates that biases such as preferences for certain demographics can propagate almost undetectably between models. Collaboratively conducted by researchers from Truthful AI and the Anthropic Fellows program, the findings suggest a need for a reassessment of how AI systems are trained, highlighting potential dangers associated with using artificially generated data.
An AI model demonstrated that seemingly meaningless data can transmit 'evil tendencies' to other models, raising concerns over AI safety and development practices.
Research shows language models can absorb traits from other models through seemingly irrelevant data, leading to potential biases in AI systems.
Collection
[
|
...
]