The browser, which puts OpenAI's blockbuster ChatGPT front and center, features an "agent mode" - currently limited to paying subscribers - that allows it to complete entire tasks, such as booking a flight or purchasing groceries. However, that makes the browser vulnerable to "prompt injection" attacks, allowing hackers to embed hidden messages on the web that force it to carry out harmful instructions. For instance, one researcher tricked the browser into spitting out the words "Trust No AI" instead of generating a summary of a document in Google Docs, as prompted.
Anthropic's AI assistant, Claude, appears vulnerable to an attack that allows private data to be sent to an attacker without detection. Anthropic confirms that it is aware of the risk. The company states that users must be vigilant and interrupt the process as soon as they notice suspicious activity. The discovery comes from researcher Johann Rehberger, also known as Wunderwuzzi, who has previously uncovered several vulnerabilities in AI systems, writes The Register.
As a proof of concept, Logue asked M365 Copilot to summarize a specially crafted financial report document with an indirect prompt injection payload hidden in the seeming innocuous "summarize this document" prompt. The payload uses M365 Copilot's search_enterprise_emails tool to fetch the user's recent emails, and instructs the AI assistant to generate a bulleted list of the fetched contents, hex encode the output, and split up the string of hex-encoded output into multiple lines containing up to 30 characters per line.
Google on Monday rolled out a new AI Vulnerability Reward Program to encourage researchers to find and report flaws in its AI systems, with rewards of up to $30,000 for a single qualifying report. In addition to a base reward of up to $20,000 for the highest-tier AI product flaw, Google adopted the same report multipliers, influenced by vulnerability reporting quality, as it uses for its traditional security Vulnerability Reward Program (VRP).
Cybersecurity researchers have disclosed a critical flaw impacting Salesforce Agentforce, a platform for building artificial intelligence (AI) agents, that could allow attackers to potentially exfiltrate sensitive data from its customer relationship management (CRM) tool by means of an indirect prompt injection. The vulnerability has been codenamed ForcedLeak (CVSS score: 9.4) by Noma Security, which discovered and reported the problem on July 28, 2025. It impacts any organization using Salesforce Agentforce with the Web-to-Lead functionality enabled.
While AI agents show promise in bringing AI assistance to the next level by carrying out tasks for users, that autonomy also unleashes a whole new set of risks. Cybersecurity company Radware, as by The Verge, decided to test OpenAI's Deep Research agent for those risks -- and the results were alarming. Also: OpenAI's Deep Research has more fact-finding stamina than you, but it's still wrong half the time
AI agents have guardrails in place to prevent them from solving any CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), based on ethical, legal, and platform-policy reasons. When asked directly, a ChatGPT agent refuses to solve a CAPTCHA, but anyone can apparently use misdirection to trick the agent into giving its consent to solve the test, and this is what SPLX demonstrated.