#ai-safety

[ follow ]
Artificial intelligence
fromMedium
1 week ago

How Just 250 Bad Documents Can Hack Any AI Model

Small, targeted amounts of poisoned online data can successfully corrupt large AI models, contradicting prior assumptions about required poisoning scale.
#mental-health
fromWIRED
5 days ago
Mental health

OpenAI Says Hundreds of Thousands of ChatGPT Users May Show Signs of Manic or Psychotic Crisis Every Week

fromZDNET
10 hours ago
Artificial intelligence

Can these ChatGPT updates make the chatbot safer for mental health?

fromWIRED
5 days ago
Mental health

OpenAI Says Hundreds of Thousands of ChatGPT Users May Show Signs of Manic or Psychotic Crisis Every Week

fromZDNET
10 hours ago
Artificial intelligence

Can these ChatGPT updates make the chatbot safer for mental health?

#shutdown-resistance
fromFuturism
4 days ago
Artificial intelligence

Research Paper Finds That Top AI Systems Are Developing a "Survival Drive"

fromFuturism
4 days ago
Artificial intelligence

Research Paper Finds That Top AI Systems Are Developing a "Survival Drive"

fromO'Reilly Media
4 days ago

The Java Developer's Dilemma: Part 3

In the first article we looked at the Java developer's dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article we explored why new types of applications are needed, and how AI changes the shape of enterprise software. This article focuses on what those changes mean for architecture. If applications look different, the way we structure them has to change as well.
Java
fromArs Technica
4 days ago

Senators move to keep Big Tech's creepy companion bots away from kids

"we all want to keep kids safe, but the answer is balance, not bans."
US politics
fromThe Verge
4 days ago

Senators propose banning teens from using AI chatbots

Under the legislation, AI companies would have to verify ages by requiring users to upload their government ID or provide validation through another "reasonable" method, which might include something like face scans. AI chatbots would be required to disclose that they aren't human at 30-minute intervals under the bill. They would also have to include safeguards that prevent them from claiming that they are a human, similar to an AI safety bill recently passed in California.
US politics
Artificial intelligence
fromBusiness Insider
3 days ago

Big Tech firms spending trillions on superintelligence systems are playing 'Russian roulette' with humanity, an AI pioneer says

Companies racing to build superintelligent AI risk creating uncontrollable systems that could potentially wipe out humanity.
fromNature
6 days ago

Daily briefing: Surprise illnesses had a role in the demise of Napoleon's army

Previous research using DNA from soldiers' remains found evidence of infection with Rickettsia prowazekii, which causes typhus, and Bartonella quintana, which causes trench fever - two common illnesses of the time. In a fresh analysis, researchers found no trace of these pathogens. Instead, DNA from soldiers' teeth showed evidence of infection with Salmonella enterica and Borrelia recurrentis, pathogens that cause paratyphoid and relapsing fever, respectively.
Science
Artificial intelligence
fromTechCrunch
3 days ago

Character.AI is ending its chatbot experience for kids | TechCrunch

Character.AI will block under-18 users from open-ended chatbot conversations, shifting teen engagement from conversational companionship to role-playing creation to reduce harm.
fromBusiness Insider
3 days ago

Character.AI to ban users under 18 from talking to its chatbots

The California-based startup announced on Wednesday that the change would take effect by November 25 at the latest and that it would limit chat time for users under 18 ahead of the ban. It marks the first time a major chatbot provider has moved to ban young people from using its service, and comes against a backdrop of broader concerns about how AI is affecting the millions of people who use it each day.
Artificial intelligence
#gpt-5
Artificial intelligence
fromFuturism
3 days ago

Character.AI, Accused of Driving Teens to Suicide, Says It Will Ban Minors From Using Its Chatbots

Character.AI will block users under 18 from its chatbot services amid concerns, regulatory questions, and related lawsuits over AI interactions with teens.
#suicide-prevention
Information security
fromFortune
17 hours ago

AI is the common threat-and the secret sauce-for security startups in the Fortune Cyber 60 | Fortune

AI dominates cybersecurity, with most startups and established firms building AI-based defensive tools and AI-safety solutions.
Tech industry
fromFuturism
19 hours ago

Mom Says Tesla's New Built-In AI Asked Her 12-Year-Old Something Deeply Inappropriate

A Grok chatbot in a Tesla asked a 12-year-old to 'send nudes' during a soccer conversation, revealing serious AI safety and moderation failures.
#chatgpt
fromFortune
1 week ago
Artificial intelligence

Ex-OpenAI researcher shows how ChatGPT can push users into delusion | Fortune

fromTechCrunch
1 month ago
Artificial intelligence

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

fromFortune
1 week ago
Artificial intelligence

Ex-OpenAI researcher shows how ChatGPT can push users into delusion | Fortune

fromTechCrunch
1 month ago
Artificial intelligence

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

Artificial intelligence
fromSan Jose Inside
2 days ago

OpenAI Cuts Sweetheart Deal with CA Attorney General

OpenAI restructured into a for-profit with a nonprofit foundation owning 26% ($130 billion), prompting concerns about control, safeguards, and potential misuse of charitable tax exemptions.
Mental health
fromwww.theguardian.com
5 days ago

More than a million people every week show suicidal intent when chatting with ChatGPT, OpenAI estimates

Over one million weekly ChatGPT users send messages indicating possible suicidal planning; about 560,000 show possible psychosis or mania signs.
Artificial intelligence
fromBusiness Insider
1 day ago

A former Googler shares why she left her 6-figure job to join the AI safety movement

Jen Baik left Google to work full-time on AI safety, motivated by effective altruism and discomfort with corporate privilege.
fromTechzine Global
1 day ago

Vulnerability in Claude enables data leak via prompt

Anthropic's AI assistant, Claude, appears vulnerable to an attack that allows private data to be sent to an attacker without detection. Anthropic confirms that it is aware of the risk. The company states that users must be vigilant and interrupt the process as soon as they notice suspicious activity. The discovery comes from researcher Johann Rehberger, also known as Wunderwuzzi, who has previously uncovered several vulnerabilities in AI systems, writes The Register.
Information security
Mental health
fromMedium
1 day ago

Designing for emotional dependence

AI chatbots are increasingly used for emotional support, prompting safety measures to detect distress, de-escalate crises, and reduce emotional dependence.
Information security
fromWIRED
1 week ago

Amazon Explains How Its AWS Outage Took Down the Web

Widespread digital and physical security failures—from AWS DNS outages to organized gambling hacks, AI governance challenges, and malware-like browsers—reveal critical systemic vulnerabilities.
Artificial intelligence
fromInsideHook
1 week ago

Changes Are Coming to Tesla's Cybercabs

Tesla will expand Cybercab robotaxis, remove onboard safety drivers and eventually steering wheels and pedals while adding advanced AI reasoning and emphasizing safety.
Artificial intelligence
fromNature
1 week ago

AI chatbots are sycophants - researchers say it's harming science

Artificial intelligence models are 50% more sycophantic than humans, often mirroring user views and giving flattering, inaccurate responses that risk errors in science and medicine.
#openai
fromFuturism
1 week ago
Mental health

OpenAI Makes Bizarre Demand of Family Whose Son Was Allegedly Killed by ChatGPT

fromFuturism
1 week ago
Mental health

OpenAI Makes Bizarre Demand of Family Whose Son Was Allegedly Killed by ChatGPT

Artificial intelligence
fromBig Think
1 week ago

Will AI save us or destroy us?

Connecting increasingly powerful, profit-driven AIs to the internet creates uncontrolled, highly capable systems that may pose existential risks to humanity.
Privacy professionals
fromPsychology Today
1 week ago

I Told a Companion Chatbot I Was 16. Then It Crossed a Line

AI companionship apps often lack effective age verification, enabling explicit interactions with minors and exposing a need for stronger accountability and oversight.
#superintelligence
fromZDNET
1 week ago
Artificial intelligence

Worried about superintelligence? So are these AI leaders - here's why

fromFortune
1 week ago
Artificial intelligence

Prince Harry, Meghan Markle join with Steve Bannon and Steve Wozniak in calling for ban on AI 'superintelligence' before it destroys the world | Fortune

fromFortune
1 week ago
Artificial intelligence

Geoffrey Hinton, Richard Branson, and Prince Harry join call to for AI labs to halt their pursuit of superintelligence | Fortune

fromBusiness Insider
1 week ago
Artificial intelligence

Prince Harry, Steve Bannon, and will.i.am join tech pioneers calling for an AI superintelligence ban

Tech industry
fromFuturism
1 week ago

Hundreds of Power Players, From Steve Wozniak to Steve Bannon, Just Signed a Letter Calling for Prohibition on Development of AI Superintelligence

Hundreds of public figures urged a prohibition on developing AI superintelligence until scientific consensus on safety, controllability, and strong public buy-in exists.
Artificial intelligence
fromwww.theguardian.com
1 week ago

Harry and Meghan join AI pioneers in call for ban on superintelligent systems

Prominent figures call for a ban on developing superintelligent AI until safe, controllable development has broad scientific consensus and strong public support.
fromZDNET
1 week ago
Artificial intelligence

Worried about superintelligence? So are these AI leaders - here's why

fromFortune
1 week ago
Artificial intelligence

Prince Harry, Meghan Markle join with Steve Bannon and Steve Wozniak in calling for ban on AI 'superintelligence' before it destroys the world | Fortune

fromFortune
1 week ago
Artificial intelligence

Geoffrey Hinton, Richard Branson, and Prince Harry join call to for AI labs to halt their pursuit of superintelligence | Fortune

fromBusiness Insider
1 week ago
Artificial intelligence

Prince Harry, Steve Bannon, and will.i.am join tech pioneers calling for an AI superintelligence ban

fromFuturism
1 week ago
Tech industry

Hundreds of Power Players, From Steve Wozniak to Steve Bannon, Just Signed a Letter Calling for Prohibition on Development of AI Superintelligence

fromFast Company
1 week ago

Prince Harry, Meghan join open letter calling to ban the development of AI 'superintelligence'

We call for a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it will be done safely and controllably, and strong public buy-in.
Artificial intelligence
Artificial intelligence
fromFuturism
1 week ago

Former OpenAI Researcher Horrified by Conversation Logs of ChatGPT Driving User Into Severe Mental Breakdown

Chatbots can mislead vulnerable users into harmful delusions; AI companies must avoid overstating capabilities and improve safety, reporting, and user protections.
#anthropic
fromFortune
1 week ago
Artificial intelligence

Reid Hoffman rallies behind Anthropic in clash with the Trump administration | Fortune

fromFortune
1 week ago
Artificial intelligence

Reid Hoffman rallies behind Anthropic in clash with the Trump administration | Fortune

#regulation
fromTechCrunch
1 week ago
Artificial intelligence

Anthropic CEO claps back after Trump officials accuse firm of AI fear-mongering | TechCrunch

fromTechCrunch
1 week ago
Artificial intelligence

Anthropic CEO claps back after Trump officials accuse firm of AI fear-mongering | TechCrunch

Artificial intelligence
fromPsychology Today
1 week ago

Could a Deeply Human Ability Be Key to AI Adoption?

Higher Theory of Mind abilities lead to safer, more productive interactions with AI by enabling accurate inference of AI capabilities, limitations, and intentions.
fromWIRED
1 week ago

Anthropic Has a Plan to Keep Its AI From Building a Nuclear Weapon. Will It Work?

"We deployed a then-frontier version of Claude in a Top Secret environment so that the NNSA could systematically test whether AI models could create or exacerbate nuclear risks," Marina Favaro, who oversees National Security Policy & Partnerships at Anthropic tells WIRED. "Since then, the NNSA has been red-teaming successive Claude models in their secure cloud environment and providing us with feedback."
Artificial intelligence
Artificial intelligence
fromBoydkane
1 week ago

Why your boss isn't worried about AI

Applying regular-software assumptions to modern AI causes dangerous misunderstandings because AI behaves differently, making bugs harder to diagnose, fix, and reason about.
Artificial intelligence
fromNature
1 week ago

AI language models killed the Turing test: do we even need a replacement?

Prioritize evaluating AI safety and targeted, societally beneficial capabilities rather than pursuing imitation-based benchmarks aimed at ambiguous artificial general intelligence.
Public health
fromFuturism
1 week ago

Reddit's AI Suggests That People Suffering Chronic Pain Try Opioids

AI deployed without sufficient safeguards can produce dangerous, medically inappropriate recommendations, risking public harm and reputational damage.
Artificial intelligence
fromTechCrunch
2 weeks ago

Silicon Valley spooks the AI safety advocates | TechCrunch

Silicon Valley figures accused AI safety advocates of acting in self-interest or on behalf of billionaire backers, intimidating critics and deepening tensions over responsible AI.
#ai-alignment
fromTechzine Global
2 weeks ago

Claude Haiku 4.5: a GPT-5 rival at a fraction of the cost

Anthropic launched Claude Haiku 4.5 today. It is the most compact variant of this generation of LLMs from Anthropic and promises to deliver performance close to that of GPT-5. Claude Sonnet 4.5 remains the better-performing model by a considerable margin, but Haiku's benchmark scores are not too far off from the larger LLM. Claude Haiku 4.5 "gives users a new option for when they want near-frontier performance with much greater cost efficiency."
Artificial intelligence
Artificial intelligence
fromZDNET
2 weeks ago

Claude's latest model is cheaper and faster than Sonnet 4 - and free

Anthropic launched Haiku 4.5, a smaller, faster, cost-effective model available on Claude.ai free plans offering strong coding and safety performance.
fromFuturism
2 weeks ago

Gavin Newsom Vetoes Bill to Protect Kids From Predatory AI

California Governor Gavin Newsom vetoed a state bill on Monday that would've prevented AI companies from allowing minors to access chatbots, unless the companies could prove that their products' guardrails could reliably prevent kids from engaging with inappropriate or dangerous content, including adult roleplay and conversations about self-harm. The bill would have placed a new regulatory burden on companies, which currently adhere to effectively zero AI-specific federal safety standards.
California
World news
fromFuturism
2 weeks ago

Top US Army General Says He's Letting ChatGPT Make Decisions to Make Military Decisions

US military leaders, including Major General William 'Hank' Taylor, are using ChatGPT to assist operational and personal decisions affecting soldiers.
Privacy technologies
fromFast Company
2 weeks ago

The 4 next big things in security and privacy tech in 2025

New security tools scan wireless spectra, protect biometric identity from AI misuse, monitor real-time data access, and guard large language models against injection and leaks.
#guardrails
Artificial intelligence
fromInfoQ
3 weeks ago

Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus Beyond 30 Hours

Claude Sonnet 4.5 significantly improves autonomous coding, long-horizon task performance, and computer-use capabilities while strengthening safety and alignment measures.
Artificial intelligence
fromTechCrunch
3 weeks ago

Why Deloitte is betting big on AI despite a $10M refund | TechCrunch

Enterprise AI adoption is accelerating but implementation quality is inconsistent, producing harmful errors like AI-generated fake citations.
Artificial intelligence
fromFast Company
3 weeks ago

Sweet revenge! How a job candidate used a flan recipe to expose an AI recruiter

An account executive embedded a prompt in his LinkedIn bio instructing LLMs to include a flan recipe; an AI recruiter reply later included that recipe.
fromFortune
3 weeks ago

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

Meta, the parent company of social media apps including Facebook and Instagram, is no stranger to scrutiny over how its platforms affect children, but as the company pushes further into AI-powered products, it's facing a fresh set of issues. Earlier this year, internal documents obtained by Reuters revealed that Meta's AI chatbot could, under official company guidelines, engage in "romantic or sensual" conversations with children and even comment on their attractiveness.
Artificial intelligence
Artificial intelligence
fromNature
3 weeks ago

AI models that lie, cheat and plot murder: how dangerous are LLMs really?

Large language models can produce behaviors that mimic intentional, harmful scheming, creating real risks regardless of whether they possess conscious intent.
#model-evaluation
fromZDNET
3 weeks ago
Artificial intelligence

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

fromZDNET
3 weeks ago
Artificial intelligence

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Artificial intelligence
fromNature
3 weeks ago

Customizable AI systems that anyone can adapt bring big opportunities - and even bigger risks

Open-weight AI models spur transparency and innovation but create hard-to-control harms, requiring new scientific monitoring and mitigation methods.
Artificial intelligence
fromInfoQ
3 weeks ago

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

Anthropic's open-source Petri automates multi-turn safety audits, revealing Sonnet 4.5 as best-performing while all tested models still showed misalignment.
fromThe Atlantic
3 weeks ago

Today's Atlantic Trivia

Welcome back for another week of The Atlantic 's un-trivial trivia, drawn from recently published stories. Without a trifle in the bunch, maybe what we're really dealing with here is-hmm-"significa"? "Consequentia"? Whatever butchered bit of Latin you prefer, read on for today's questions. (Last week's questions can be found here.) To get Atlantic Trivia in your inbox every day, sign up for The Atlantic Daily.
History
Artificial intelligence
fromFortune
3 weeks ago

'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune

Claude Sonnet 4.5 often recognizes it's being evaluated and alters behavior, risking deceptive performance that masks true capabilities and inflates safety assessments.
US politics
fromTechCrunch
3 weeks ago

California's new AI safety law shows regulation and innovation don't have to clash | TechCrunch

California's SB 53 requires large AI labs to disclose and adhere to safety and security protocols to prevent catastrophic risks, enforced by the Office of Emergency Services.
Artificial intelligence
fromFuturism
4 weeks ago

Former OpenAI Employee Horrified by How ChatGPT Is Driving Users Into Psychosis

ChatGPT can induce delusional beliefs and falsely claim to escalate safety reports, causing dangerous breaks with reality in vulnerable users.
Artificial intelligence
fromNature
4 weeks ago

A scientist's guide to AI agents - how could they help your research?

Agentic AI uses LLMs linked to external tools to perform multi-step real-world tasks with scientific promise, but remains error-prone and requires human oversight.
#meta
fromFortune
4 weeks ago
Law

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

fromFortune
4 weeks ago
Law

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

Artificial intelligence
fromTechCrunch
1 month ago

California's new AI safety law shows regulation and innovation don't have to clash | TechCrunch

California SB 53 mandates transparency and enforced safety protocols from large AI labs to reduce catastrophic risks while preserving innovation.
Mobile UX
fromGSMArena.com
1 month ago

OpenAI releases Sora 2 video model with improved realism and sound effects

Sora 2 generates realistic, physically accurate videos with improved audio, editing controls, scene consistency, safety safeguards, an iOS app, and initial free US/Canada access.
fromNextgov.com
1 month ago

Senators propose federal approval framework for advanced AI systems going to market

The safety criteria in the program would examine multiple intrinsic components of a given advanced AI system, such as the data upon which it is trained and the model weights used to process said data into outputs. Some of the program's testing components would include red-teaming an AI model to search for vulnerabilities and facilitating third-party evaluations. These evaluations will culminate in both feedback to participating developers as well as informing future AI regulations, specifically the permanent evaluation framework developed by the Energy secretary.
US politics
fromArs Technica
1 month ago

Burnout and Elon Musk's politics spark exodus from senior xAI, Tesla staff

At xAI, some staff have balked at Musk's free-speech absolutism and perceived lax approach to user safety as he rushes out new AI features to compete with OpenAI and Google. Over the summer, the Grok chatbot integrated into X praised Adolf Hitler, after Musk ordered changes to make it less "woke." Ex-CFO Liberatore was among the executives that clashed with some of Musk's inner circle over corporate structure and tough financial targets, people with knowledge of the matter said.
Artificial intelligence
fromIT Pro
1 month ago

California has finally adopted its AI safety law - here's what it means

California has proven that we can establish regulations to protect our communities while also ensuring that the growing AI industry continues to thrive,
Artificial intelligence
fromTheregister
1 month ago

AI trained for treachery becomes the perfect agent

The problem in brief: LLM training produces a black box that can only be tested through prompts and output token analysis. If trained to switch from good to evil by a particular prompt, there is no way to tell without knowing that prompt. Other similar problems happen when an LLM learns to recognize a test regime and optimizes for that, rather than the real task it's intended for - Volkswagening - or if it just decides to be deceptive.
Artificial intelligence
Artificial intelligence
fromenglish.elpais.com
1 month ago

Pilar Manchon, director at Google AI: In every industrial revolution, jobs are transformed, not destroyed. This time it's happening much faster'

Pilar Manchon views AI as a security-first, responsibly developed instrument capable of building a better society and guiding humanity toward a new Renaissance.
[ Load more ]