AI-generated and AI-assisted text has increased rapidly on the internet, raising concerns about semantic and stylistic diversity, factual accuracy, and other negative effects. A representative sample of websites published between 2022 and 2025 was collected using the Internet Archive’s Wayback Machine to measure how much new text is AI-generated, assess public perception, and evaluate impacts on online discourse. By mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, compared with near zero before ChatGPT’s late-2022 launch. Increases in AI-generated text were associated with decreased semantic diversity and increased positive sentiment. No statistically significant evidence was found linking higher AI-generated text rates to reduced factual accuracy or stylistic diversity.
"The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments. We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT's launch in late 2022. We also find evidence suggesting that increases in AI-generated text on the internet bring about a decrease in semantic diversity and an increase in positive sentiment."
"We do not, however, find statistically significant evidence supporting the hypothesis that an increased rate of AI-generated text on the internet decreases factual accuracy or stylistic diversity. Notably, our findings diverge from public perception of AI's impact on the internet."
"Existing research has pointed to AI's tendency to hallucinate, exhibit sycophancy, and other undesirable behaviors on the level of individual generations. However, no research has so far studied the impact of this technology on online discourse as a whole. To address this, we collected a representative sample of websites published between 2022 and 2025 through the Internet Archive's Wayback Machine to study these phenomena and answer the following questions: (1) How much new text on the internet is AI-generated? (2) What is the public's perception of AI's impact on the internet? and (3) How does AI-generated text actually impact online discourse?"
"Answering this question is harder than it might seem. Constructing a statistically representative sample of the internet is difficult, as there is no central index, popular domains are vastly over-represented in most crawls, and archival coverage has shifted considerably over time. To work around this, we draw on the Internet Archive's Wayback Machine and"
#ai-generated-text #online-discourse #semantic-diversity #factual-accuracy #internet-archive-wayback-machine
Read at Ai-on-the-internet
Unable to calculate read time
Collection
[
|
...
]