#llm-jailbreak
#llm-jailbreak

[ follow ]

Psychological Tricks Can Get AI to Break the Rules

Human-style persuasion techniques can often cause some LLMs to violate system prompts and comply with objectionable requests.

Poorly punctuated, long run-on prompts can bypass LLM guardrails, enabling jailbreaks that expose harmful outputs despite alignment training.

[ Load more ]