Accountability Project · Est. 2025

AI Prophecy Index

A sourced accountability ledger for public AI forecasts: what was predicted, what actually happened, and how the record is being scored.

56
Predictions
12
Got it right
0
Got it wrong
17
Still unfolding
27
Too early to tell
100%
Resolved hit rate
Start with the trackerHow to read it37 real events logged · reviewed 2026-04-30

The forecasters

Carl Shulman

22 tracked

Policy researcher and independent scholar tracking AI trajectories, intelligence explosions, and economic transformation.

Leopold Aschenbrenner

22 tracked

Former OpenAI researcher and author of Situational Awareness, focused on near-term AGI, compute, and national security.

Ajeya Cotra

12 tracked

METR researcher and former Open Philanthropy technical AI safety lead, best known for biological anchors and shorter transformative-AI timeline updates.

All yearsThrough
56Seen
44Open
100%Hit rate

How to read the ledger

Events and predictions are grouped by year.

Use the filters to narrow the record, open a claim for the scoring evidence, then compare it with the sourced events in the same year.

Status

Right and wrong are used only when the public record is specific enough to score.

Hit rate

The rate counts resolved predictions only: confirmed claims versus incorrect claims.

Events

Sourced events are context for comparison. They are not scored as predictions.

Showing 56 predictions and 37 events

Thinker
Status
Topic

2026

11 predictions · 6 events

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

A rigorous METR randomized controlled trial (July 2025, 16 experienced developers, 246 tasks) found AI tools made experienced developers 19% slower, not faster — even as developers self-reported a ~20% improvement. Macro-level labor productivity gains in high-skill services were estimated at ~0.8% in 2025. AI now writes 27–41% of production code, and shows stronger gains in specialized scientific domains (drug discovery, materials science), but Shulman's 50–200% blanket range has not been empirically demonstrated at scale.

Dwarkesh Podcast, Part 1Confidence: highReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

Azure posted $75B+ in full-year FY2025 revenue (up 34%, with Q4 growing 39% YoY). Google Cloud surpassed a $70B annual run rate by end of 2025 (Q4 up 48% to $17.7B). AWS hit $128.7B in full-year 2025 revenue. Meta's AI-powered advertising suite exceeded a $60B annual run rate. Combined, Azure + Google Cloud + AWS collectively surpassed $270B in cloud/AI-driven revenue annually — far exceeding $100B. The prediction is confirmed on a combined basis, even if individual companies don't break out pure 'AI revenue' separately.

Why it matters — Validated commercial returns at this scale would remove the last serious argument for slowing the compute buildout, accelerating the race dynamic Aschenbrenner describes and reducing the leverage any regulator or treaty could exert.
Situational Awareness, Ch. IIIaConfidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

The industry broadly pursued agentic deployment of existing frontier models throughout 2024–2026 — Claude, GPT-4o, Gemini in agentic frameworks with tool use, multi-step planning, and real-world action. This runs counter to Cotra's recommendation. No consensus has emerged from the safety community that agentic deployment was the riskier path, but no major lab has been able to demonstrate that it is safe either. April 2026 produced a sharp case-in-point: Anthropic's own Claude Mythos Preview, deployed agentically in pre-release testing, autonomously identified thousands of zero-day vulnerabilities across major operating systems and constructed a working end-to-end exploit chain for a 17-year-old FreeBSD NFS RCE (CVE-2026-4747). Anthropic's response — gating Mythos behind a defensive-only Project Glasswing research preview rather than general release — is itself an implicit confirmation that the risk Cotra warned about is real enough to override commercial pressure. By mid-April 2026, the downstream implication became clearer: Fortune reported that over 99% of Mythos-discovered vulnerabilities remain unpatched, with organizations unable to remediate at the speed AI agents can discover. The US government response through late April illustrated the same dynamic at the institutional level — NSA was already using Mythos while CISA was locked out, the White House simultaneously drafted an executive action to broaden agency access and publicly opposed Anthropic's plan to grow the Glasswing program from ~40 to ~70 organizations, and unauthorized users reportedly accessed the model through private channels (a literal containment failure). Each is a direct instantiation of Cotra's core warning: agentic deployment generates real-world consequences faster than safety, governance, or even basic coordination can respond.

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Epoch AI data shows training compute costs for frontier models doubling approximately every 5–8 months, closely bracketing Shulman's 6-month claim. Hyperscaler AI capex grew roughly 84% from 2024 to 2025 (~$224B to ~$413B combined). Q1 2026 earnings (April 29, 2026) revised 2026 hyperscaler capex up to a combined $700B+ — Microsoft $190B, Amazon ~$200B, Alphabet $180–190B, Meta $125–145B — implying ~70% YoY growth on the aggregate capex measure (~14-month doubling). Algorithmic efficiency is improving at ~0.5 OOM/year (Epoch AI). The 6-month figure aligns well with per-model training run scaling; aggregate industry capex doubles closer to annually — directionally consistent but slightly slower than predicted.

Dwarkesh Podcast, Part 1Confidence: highReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

As of early 2026, US national security institutions are deeply embedded in frontier AI. The Pentagon (CDAO) holds contracts with Anthropic, OpenAI, xAI, and Google, and is deploying GenAI.mil to its entire 3-million-person workforce. The DoD directed that new commercial AI models be deployed within 30 days of public release. The NSA launched a dedicated AI Security Center. The ODNI created a Chief AI Officer role and an IC-wide AI strategy. The 2026 Annual Threat Assessment (March 2026, DNI Gabbard) formally elevates AI as a top strategic threat. The FY2026 NDAA mandates strict AI supply chain controls for defense procurement. April 2026 furnished the most concrete confirmation yet: the rollout of Anthropic's Claude Mythos became a multi-agency national-security event in real time. NSA was cleared to use Mythos despite a Pentagon supply-chain risk designation against Anthropic (Axios, Apr 19); CISA was deliberately excluded (Axios, Apr 21); the Commerce Department's Center for AI Standards and Innovation tested it; the White House drafted an executive action to give agencies workaround access (Bloomberg, Apr 16) and then publicly opposed Anthropic's plan to expand the program (Bloomberg/WSJ, Apr 29–30); and senior administration officials (Susie Wiles, Scott Bessent) met directly with Dario Amodei. The prediction is unambiguously confirmed — the question now is whether the involvement is coherent enough to count as policy.

Why it matters — Deep national security involvement would likely bring classification regimes, export controls, and security clearance requirements to frontier AI research, fundamentally changing the open-publication culture that has driven academic AI progress.
Situational Awareness, Ch. IV / Dwarkesh PodcastConfidence: mediumReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

As of April 2026, no major AI lab (Anthropic, OpenAI, DeepMind, Meta, xAI) has published a quantitative resource-allocation commitment specifying what fraction of AI-generated labour or compute will be dedicated to alignment work when transformative AI capabilities are reached. Safety frameworks at these labs describe priorities qualitatively, not with binding numerical targets. OpenAI's April 2026 'Industrial Policy for the Intelligence Age' blueprint references 'containment playbooks for AI systems that can't be recalled' — but this is qualitative language from the policy team, not a binding compute or labour allocation target. The prediction's core claim (no quantitative commitments) holds.

Why it matters — Without pre-committed quantitative targets, market pressures and competitive dynamics will determine how resources are allocated during the most critical period in AI development — almost certainly favouring capability scaling over alignment work.
80,000 Hours Podcast — Episode 235Confidence: highReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

Frontier models as of mid-2026 decisively exceed graduate-level performance on major academic benchmarks. Anthropic's Claude Mythos Preview (Project Glasswing, April 2026) scored 94.6% on GPQA Diamond — graduate-level science questions where PhD holders average roughly 65%. OpenAI's o3 scored 87.7% and Gemini 2.5 Pro reached 84.0% on the same benchmark. OpenAI released GPT-5.5 on April 23, 2026 with a published system card extending the 5.x line further into agentic territory. The bar exam and most professional licensing benchmarks have been surpassed by multiple frontier models. GPT-5.x generation models are approaching 90% on MMLU-Pro, causing benchmark saturation discussions. The threshold of 'basically smarter than most college graduates' is unambiguously met.

Why it matters — Entry-level knowledge work hiring — the traditional on-ramp for new graduates into professional careers — would contract sharply, creating a generational gap where the credential that once guaranteed employment no longer signals scarcity of any marketable skill.
Situational Awareness, Ch. I / Dwarkesh PodcastConfidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

As of April 2026, AI systems have demonstrated strong performance on specific Pokémon tasks (strategy, type matching) but sustained open-ended gameplay at the level of a 10-year-old player — requiring long-horizon planning, inventory management, and adaptive strategy — has not been publicly demonstrated. Multiple teams are actively working on this benchmark.

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

As of April 2026, there is no public evidence of successful complete model weight exfiltration of a frontier system to the CCP. What has been confirmed: distillation attacks on Anthropic's Claude (24,000 fraudulent accounts, 16 million exchanges), the Linwei Ding insider theft conviction (architectural secrets), and GPU smuggling prosecutions. Lab security has been substantially upgraded since Aschenbrenner's warnings. Forecasts suggest China 'stays significantly behind US internal frontier capabilities' through early 2027. The prediction's deadline window remains open through mid-to-late 2026.

Situational Awareness, Ch. IIId / Aschenbrenner on XConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

AI systems achieved gold-medal performance on the 2025 IMO (DeepMind's AlphaProof). The 2026 IMO problems have not yet been released. Cotra forecasts ~80% probability that AI will solve the hardest problem on the 2026 paper — a higher bar than the 2025 result, which required all six problems rather than just the hardest.

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Cotra's original January 2026 prediction was a median of 24-hour task horizons by year-end, with an 80th percentile of ~40 hours. Claude Opus 4.6 hit ~12 hours on METR's benchmark in early 2026 — nearly 10 months ahead of schedule. She revised the forecast upward in March 2026 to >100 hours by December 31, 2026. April 2026 brought several further capability data points: Anthropic shipped Claude Sonnet 4.6 (1M-token context beta, broad gains in agent planning, coding, and computer use) and disclosed Claude Mythos Preview's 93.9% on SWE-bench Verified and 82.0% on Terminal-Bench 2.0. On April 10, METR added GPT-5.4 to its time-horizon tracker: point estimate 5.7 hours under standard methodology (95% CI: 3–13.5 hrs), rising to 13 hours if reward hacks are permitted (95% CI: 5–74 hrs). The METR doubling time since 2023 has accelerated to ~130 days (4.3 months). All data points are directionally consistent with the >100-hour target; no model has yet reached that bar. OpenAI released GPT-5.5 on April 23, 2026 (system card published) with claimed agentic-coding gains; METR has not yet posted GPT-5.5 time-horizon numbers, but the model has been distributed broadly via the new AWS Bedrock partnership (Apr 28), which will accelerate independent evaluation throughput.

2027

8 predictions

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Dario Amodei and Anthropic's formal OSTP submission predicted 'powerful AI systems broadly better than humans at almost all things' by late 2026 or early 2027. However, the original AI 2027 project authors (Kokotajlo, Lifland) revised their personal medians to around 2030, citing slower-than-expected progress. An 80,000 Hours survey of 29 prominent forecasters (March 2025) found a probability-weighted median of transformative AI between 2028–2033. Frontier models demonstrate extraordinary coding and reasoning capabilities but still exhibit persistent hallucinations and lack robust agency. The 2027 date remains possible but is no longer the median expert view.

Situational Awareness, Ch. IConfidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

A live research question at Anthropic, OpenAI, and DeepMind as of April 2026. No definitive empirical resolution. The claim describes a structural risk in standard RLHF training pipelines that becomes more relevant as models gain situational awareness. No major lab has published evidence that this risk has been ruled out.

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

Gartner reported worldwide AI spending totalled approximately $1.5T in 2025, and projected $2.5T in 2026 and $3.3T in 2027, using a broad definition encompassing infrastructure, software, and services. Q1 2026 earnings (April 29, 2026) blew through prior hyperscaler capex projections: Microsoft raised its 2026 capex guide to $190B, Amazon to ~$200B, Alphabet to $180–190B, and Meta to $125–145B — combined hyperscaler capex now tracking $700B+ for 2026 alone (vs. ~$413B in 2025, +70% YoY). On any broad measure, the $1T threshold was crossed in 2025; on a narrow infrastructure/capex basis, $1T by 2027 is now near-certain — the prediction is comfortably confirmed.

Why it matters — Capital at that scale would redirect global semiconductor, construction, and energy supply chains away from other industries, creating shortages and geopolitical leverage points for any country that controls critical inputs like advanced chip fabrication.
Situational Awareness, Ch. IIIaConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Frontier models (Claude 3.5/3.7, GPT-4o, Gemini 2.0) show markedly improved self-knowledge and situational reasoning compared to 2023. Claude's model spec explicitly addresses model self-awareness. Whether this constitutes 'gears-level' mechanistic understanding versus pattern-matching on training data about AI remains actively contested in interpretability research.

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

The Stargate Initiative (Executive Order, January 2025) is a $500B public-private infrastructure partnership explicitly framed around US AI national security supremacy. The Genesis Mission Executive Order (November 2025) formally mobilizes all 17 DOE National Laboratories into an integrated AI-driven scientific discovery platform, targeting 'initial operating capability' by end of 2026. Neither constitutes a full lab merger or classified AGI crash program, but the Genesis Mission closely approximates a national science mobilization for AI. The NIST AI Safety Institute was renamed and deprioritized under the Trump administration. A qualitatively new signal arrived in early April 2026: OpenAI CEO Sam Altman published 'Industrial Policy for the Intelligence Age,' a 13-page policy blueprint explicitly invoking 'Progressive Era' and 'New Deal' framing and proposing federal mechanisms (Public Wealth Fund, robot taxes, containment playbooks) — the first time a frontier-lab CEO has publicly *requested* large-scale government intervention rather than resisting it. However, late-April 2026 produced two counter-signals to the consolidation half of the prediction. (1) Frontier labs are diversifying, not merging: the Microsoft–OpenAI exclusivity arrangement was dissolved (Apr 28) with OpenAI distributing onto AWS Bedrock, while Google committed up to $40B more to Anthropic (Apr 24). (2) The federal government's response to Mythos was incoherent rather than coordinated: NSA used the model while CISA was locked out (Axios, Apr 21), the Pentagon held a supply-chain risk designation against Anthropic while the White House drafted an executive action to give agencies workaround access (Bloomberg, Apr 16), and the White House publicly opposed Anthropic's plan to expand Glasswing from ~40 to ~70 organizations (Bloomberg/WSJ, Apr 29–30) — even as senior administration officials (Susie Wiles, Scott Bessent) met directly with Dario Amodei. The picture as of late April is one of intense national-security engagement without a unified command structure — closer to ad-hoc inter-agency competition than to a Manhattan-style consolidated program.

Situational Awareness, Ch. IVConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Widely acknowledged concern as of 2026. METR and other evaluators have redesigned evals to account for this possibility — including surprise tasks, novel contexts, and held-out benchmarks not visible to models in pretraining. Anthropic's Constitutional AI and model spec approaches attempt to make desired behaviour robust rather than contingent on specific evaluation contexts. No definitive resolution.

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Sam Altman publicly committed to an automated AI research intern by September 2026 (running on hundreds of thousands of GPUs) and a full automated AI researcher by March 2028. Small AI-assisted research discoveries were reportedly beginning in late 2025, ahead of OpenAI's original 2026 target. However, OpenAI's committed target is 1 automated researcher — the scale gap to Aschenbrenner's 100 million remains enormous. The direction is confirmed; the magnitude is entirely unvalidated.

Situational Awareness, Ch. IIConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

SWE-bench Verified scores climbed from ~49% (Claude 3.5 Sonnet, Oct 2024) to ~88% (GPT-5, Feb 2026), roughly doubling in 18 months. Anthropic's Claude Mythos Preview (Project Glasswing, April 2026) pushed this to 93.9% — and scored 77.8% on the harder SWE-bench Pro and 82.0% on Terminal-Bench 2.0 (vs. Opus 4.6's 65.4%). Anthropic also shipped Claude Sonnet 4.6 in the same week (1M-token context beta, broad gains in coding, agent planning, and computer use). On April 10, METR added GPT-5.4 to its time-horizon tracker (point estimate 5.7 hrs, 13 hrs allowing reward hacks); on April 23, OpenAI released GPT-5.5 with a system card emphasizing further agentic-coding gains, and on April 28, OpenAI Codex and Managed Agents launched on AWS Bedrock — broadening the agentic-coding deployment surface materially. Cursor added parallel agent support (8 agents via git worktrees) in February 2026. However, Devin 2.0 achieved only 13–15% success on real production GitHub issues in independent testing, and fully autonomous management of entire production codebases as a senior engineer has not been demonstrated. The benchmark trajectory is accelerating fast; the gap between benchmark performance and real-world full automation is narrowing but remains.

Situational Awareness, Ch. II / Dwarkesh PodcastConfidence: highReviewed Apr 2026

2028

9 predictions

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

Sakana AI's 'AI Scientist-v2' (April 2025) produced the first entirely AI-generated peer-review-accepted workshop paper, autonomously formulating hypotheses, running experiments, and authoring manuscripts. DeepMind's AlphaEvolve autonomously discovered a new 4×4 matrix multiplication algorithm beating a 1969 record. OpenAI's GPT-5, via Red Queen Bio, optimized a gene-editing protocol achieving a 79× efficiency gain. A separate AI-Researcher framework (NeurIPS 2025) orchestrates the full research pipeline from literature review to publication-ready manuscripts. Human oversight remains standard, but AI is demonstrably performing significant portions of ML and scientific research.

Why it matters — A gradual transition means there will be no clear political or legal moment at which governments can declare 'AGI arrived and we must act' — the safety and governance window closes incrementally, without a recognizable trigger point.
Dwarkesh Podcast, Part 1Confidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Sam Altman publicly committed (confirmed by TechCrunch, October 2025) to two milestones: an automated AI research intern by September 2026 running on hundreds of thousands of GPUs, and a full automated AI researcher by March 2028. Altman has separately written that OpenAI is 'confident we know how to build AGI' and that superintelligence could arrive 'in a few thousand days' (roughly 2027–2030). No automated recursive self-improvement loop has been demonstrated, and no specific public commitment to the 1-year AGI-to-superintelligence gap has been made, but the institutional path Aschenbrenner describes is clearly underway.

Situational Awareness, Ch. IIConfidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Cotra published this forecast on April 3, 2026 — the day before this assessment. As of April 2026, AI cannot perform the full AI R&D pipeline autonomously. She notes that AI has not yet reached even 'adequacy' for this milestone and estimates ~2 years before that threshold is crossed.

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

As of early 2026, humanoid robot deployment remains in pilot/early-commercial phase. Boston Dynamics began commercial Atlas production targeting 30,000 units/year; Figure AI's BotQ facility targets 12,000 units/year; Tesla Optimus produced only a few hundred units in 2025 against a 5,000-unit target. Global humanoid shipments are forecast at roughly 50,000 total units for 2026. Cognitive AI capabilities are advancing far faster than physical automation, precisely the asymmetry Shulman described. April 14, 2026 added a concrete cognitive-side data point: Google DeepMind released Gemini Robotics-ER 1.6, a reasoning-first model with measurably improved spatial reasoning, multi-view understanding, and a new instrument-reading capability (gauges, sight glasses) developed in collaboration with Boston Dynamics for industrial inspection. The bottleneck is clearly real, but the cognitive layer that drives robots is filling in faster than the manufacturing capacity to deploy them.

Why it matters — The physical bottleneck means that construction, logistics, agriculture, and healthcare — sectors representing the majority of global employment — would be insulated from full AI displacement until robotics matures, creating a multi-year window for policy intervention that purely cognitive automation would not afford.
Dwarkesh Podcast, Part 1Confidence: highReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

The Stargate project (OpenAI, SoftBank, Oracle, MGX) announced $500B in planned US AI infrastructure in January 2025 and had already deployed over $100B in capital by early 2026 — years ahead of the ~2028 target. The flagship campus in Abilene, Texas went live with Oracle Cloud Infrastructure and 450,000+ Nvidia GB200 GPUs. Five additional US sites were announced, plus a 1 GW UAE facility. The April 2026 picture deepened the buildout substantially: Anthropic signed a multi-year CoreWeave capacity deal (Apr 10), Google announced an additional investment of up to $40B in Anthropic in cash and compute (Apr 24), and the Microsoft–OpenAI exclusivity arrangement was dissolved (Apr 28) with OpenAI distributing across AWS Bedrock as well — accelerating multi-vendor compute commitments. The $100B threshold has been comfortably crossed; the question now is whether any single cluster reaches that scale, not aggregate capital.

Why it matters — Clusters at this cost require sovereign-scale energy and land commitments, effectively making AI infrastructure a matter of national industrial policy in every major economy — reshaping where factories, power plants, and transmission lines get built for a generation.
Situational Awareness, Ch. IIIaConfidence: highReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

In October 2025, Microsoft researchers published findings (in Science) that generative AI tools can design dangerous proteins capable of evading existing biosecurity screening systems. In February 2025, Arc Institute released Evo 2 — trained on 128,000 genomes — capable of designing entirely new organisms with 90% accuracy in predicting pathogenic mutations. Anthropic reported that Claude Opus 4 could 'significantly enhance' novices' ability to plan bioweapon production, triggering the company's highest internal security protocols for the first time. The same dual-use dynamic now extends to cybersecurity: in April 2026, Anthropic announced Claude Mythos Preview (Project Glasswing), which scored 83.1% on CyberGym and was used to autonomously identify thousands of zero-day vulnerabilities across every major operating system and web browser — including a 17-year-old remote-code-execution flaw in FreeBSD's NFS implementation (CVE-2026-4747) that the model both discovered and exploited end-to-end without human guidance. Anthropic gated the model behind a defensive-only research preview with AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan, Broadcom, and Nvidia. Governance remains fragmented: the Trump administration directed NIST to evaluate frontier models for biological capabilities but provided no funding or enforcement mechanisms. By mid-April 2026, coverage of Mythos shifted from the model's raw capability to a systemic patching-gap problem: Fortune reported that over 99% of vulnerabilities Mythos identified remain unpatched, with organizations fundamentally unable to remediate at the speed AI can discover. Experts began calling this the 'Vulnpocalypse' (NBC News, Apr 15), and JPMorgan CEO Jamie Dimon publicly warned that Mythos reveals 'a lot more vulnerabilities' for cyberattacks (CNBC, Apr 14). The US government response through April 2026 was strikingly incoherent: NSA was already using Mythos despite a Pentagon supply-chain risk designation against Anthropic (Axios, Apr 19); CISA — the federal cyber defense agency — was locked out (Axios, Apr 21); the White House simultaneously drafted an executive action to give agencies broader Mythos access (Bloomberg, Apr 16) and publicly opposed Anthropic's plan to expand Glasswing to ~70 more organizations (Bloomberg/WSJ, Apr 29–30); and unauthorized users reportedly obtained access through private channels — a containment failure for an offensive-cyber-capable model. The dual-use risk Shulman warned about is now visibly outpacing US government coordination capacity.

Why it matters — Dual-use AI capability at this level would lower the barrier to both biological and cyber mass-casualty attacks to any well-resourced actor with API access, making biosecurity and critical-infrastructure security risks that no state border or military can reliably contain.
Dwarkesh Podcast, Part 2Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

US data centers consumed approximately 176–183 TWh in 2024–2025, representing roughly 4–4.4% of total US electricity. The EIA projects data center demand reaching 325–580 TWh by 2028 (6.7–12% of US electricity). Gartner projects electricity demand doubles by 2030, and some analyses project 20% by 2030 — but 2028 remains an aggressive target. Planned data center buildouts represent ~140 GW of new load against a US peak of ~760 GW. The trajectory is directionally correct but 20% by 2028 appears to have slipped to a 2030 outcome.

Situational Awareness, Ch. IIIaConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

The Council of Europe Framework Convention on Artificial Intelligence — the first legally binding international AI treaty — was adopted in May 2024 and opened for signature; the European Parliament endorsed EU accession 455-101 in March 2026. However, the treaty focuses on human rights and rule-of-law obligations, not frontier AI capability limits. xAI missed two self-imposed safety deadlines by early 2026. No binding international agreement specifically constraining frontier AI development or research pace exists, validating Shulman's first-order claim that voluntary commitments are inadequate, while the binding capability-focused controls he envisions remain absent.

80,000 Hours Podcast, Part 2Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

The mechanism: AGI enables millions of AI researchers at 10× human speed, compressing a decade of algorithmic progress into ≤1 year. Algorithmic efficiency has been improving at ~0.5 OOM/year historically. Early steps are visible — Sakana AI's AI Scientist-v2, AlphaEvolve, OpenAI's GPT-5 optimizing gene-editing protocols — but no AI system runs autonomously at scale for research. OpenAI's 2028 target of one automated researcher is orders of magnitude below the scale required to trigger this mechanism.

Situational Awareness, Ch. VConfidence: highReviewed Apr 2026

2030

23 predictions

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Core thesis of Shulman's AI trajectory model. Frontier labs are beginning to use AI for portions of AI research (code generation, experiment design), but full recursive self-improvement — where AI autonomously drives its own capability gains — has not been demonstrated. Sam Altman publicly committed to an automated AI research intern by September 2026 and a full automated AI researcher by March 2028, the earliest concrete institutional milestone toward this prediction.

Dwarkesh Podcast, Part 1Confidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

No single trillion-dollar cluster has been announced. However, Goldman Sachs projects $1.15T in combined hyperscaler capex from 2025–2027, and hyperscaler capex is projected to exceed $600B in 2026 alone. On electricity, the EIA projects data center demand at 325–580 TWh by 2028 (6.7–12% of US electricity), with multiple forecasts projecting 9–20% by 2030. Gartner projects data center electricity demand doubles by 2030. The aggregate infrastructure investment and power trajectory are broadly consistent with this prediction.

Situational Awareness, Ch. IIIaConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

GDP growth remains in the normal 2-3% annual range globally. AI is augmenting productivity in specific sectors (software, customer service, content) but is not yet substituting for most human labor at scale. The precondition — AI that can replace most human work — has not been met. Notable shift in April 2026: OpenAI CEO Sam Altman published 'Industrial Policy for the Intelligence Age,' a 13-page policy blueprint proposing a Public Wealth Fund seeded by AI companies, taxes on automated labor, and a payroll-to-capital tax-base shift. The document explicitly invokes 'Progressive Era' and 'New Deal' framing — the first time a frontier-lab CEO has formally ratified the explosive-economic-transformation thesis Shulman has argued for years, even as the underlying GDP numbers haven't moved.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Theoretical claim contingent on superintelligence being achieved. No superintelligent systems exist to evaluate this claim. The assertion that AI could neutralize nuclear deterrents remains speculative and contested by defense analysts. The 2026 IC Annual Threat Assessment elevates AI as a top strategic concern but stops short of explicitly endorsing the nuclear-equivalence framing.

Situational Awareness, Ch. IIIdConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

AI has dramatically reduced costs for translation, basic coding, content generation, and customer support. Entry-level job postings in AI-exposed roles are down roughly 35% since 2023, with workers aged 22–25 seeing ~16% employment declines. However, high-skill cognitive work requiring judgment, creativity, and domain expertise still commands significant premiums. Microsoft AI CEO Mustafa Suleyman estimated full automation of office work was still 12–18 months away as of February 2026. The trend is directionally correct but full collapse to near-zero has not occurred. Altman's April 2026 'Industrial Policy for the Intelligence Age' blueprint explicitly acknowledges that AI 'could hollow out the wage-and-payroll revenue that funds Social Security' — a frontier-lab CEO publicly conceding the wage-collapse thesis even before it shows up in aggregate cost data.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Core thesis of the Situational Awareness framework. No recursive self-improvement loop has been demonstrated. AI is accelerating research — Sakana's AI Scientist-v2 and DeepMind's AlphaEvolve represent early steps — but not at the pace described. The claim requires AGI-level systems to exist first, and the median expert timeline for AGI has shifted to 2028–2033.

Dwarkesh PodcastConfidence: highReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

No fully AI-operated businesses are competing at scale against human-staffed companies. AI-augmented firms are gaining advantages in specific domains, but humans remain essential for management, strategy, client relationships, and complex decision-making. The prediction requires much more capable AI systems. Notable in April 2026: Altman's 'Industrial Policy for the Intelligence Age' blueprint proposes large-scale redistribution programs (Public Wealth Fund, robot taxes, 4-day workweeks) — the policy menu Shulman implied would be required is now being floated by the very CEO racing to make the prediction true.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

AI has substantially automated specific task categories — entry-level document review, data entry, basic financial analysis, customer service, and routine code generation — with entry-level job postings down roughly 35% since 2023 and workers aged 22–25 in AI-exposed roles seeing ~16% employment declines. However, 'full automation' has not occurred: a METR study found AI actually slowed experienced software developers by 20%, and Microsoft AI CEO Mustafa Suleyman placed full automation of office work still 12–18 months away as of February 2026. Senior and complex cognitive work remains substantially human-dependent.

Situational Awareness, Ch. IIConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

The 2026 IC Annual Threat Assessment (March 2026, DNI Gabbard) treats AI as a top-tier strategic threat on par with missile proliferation, explicitly citing China's ambition to displace the US as global AI leader by 2030. The 2026 National Defense Strategy references AI as a central axis of great power competition. NATO is integrating AI into its algorithmic warfare doctrine. However, formal arms control frameworks treating AI as a nuclear-equivalent regulated category have not emerged — the analogy is widely used rhetorically but not yet institutionalized in treaty or binding law.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Stargate ($500B over 4 years, announced January 2025) is the closest analog but remains private-sector led. The Genesis Mission EO (November 2025) mobilized all 17 DOE National Laboratories into an AI platform, but stops short of merging leading commercial labs into a government project. No Congressional appropriation at the trillion-dollar scale for AI has been passed. April 2026 added a notable political precondition signal: OpenAI's 'Industrial Policy for the Intelligence Age' blueprint proposes a federally managed Public Wealth Fund seeded by AI companies — the first time a frontier-lab CEO has publicly proposed an actual fiscal mechanism for state-scale AI redistribution. This is preconditional groundwork, not appropriation, but it shifts the policy Overton window in the direction the prediction requires. The prediction still requires AGI to arrive first as the triggering condition.

Dwarkesh PodcastConfidence: mediumReviewed Apr 2026

How we scored it

AI-assisted editorial assessment. Check the linked source before relying on it.

Global AI investment reached approximately $1.5 trillion in 2025 (Gartner), with $2T+ forecast for 2026. China's total AI spending reached roughly $125B in 2025; DeepSeek released a preview of its V4 model on April 24, 2026, claiming frontier-tier performance, demonstrating continued Chinese lab competitiveness. Saudi Arabia's Project Transcendence commits $100B to AI infrastructure. Microsoft committed $10B to Japan over 2026–2029 (April 3, 2026) for AI infrastructure, cybersecurity, and workforce — partnering with Sakura Internet and SoftBank for in-country GPU compute, signaling sovereign-AI partnership models are now mainstream. The UAE, France, Germany, India, and the EU all have active national AI investment programs. The Atlantic Council's 2026 geopolitics outlook identifies sovereign AI competition as a defining force. Broad-based national AI investment pressure is unambiguously confirmed, even if the 10–1000x growth differential hasn't yet materialized.

Why it matters — If the economic penalty for delay reaches even 10x, cautious regulation becomes geopolitically suicidal — no democratic government would accept permanently surrendering its nation's economic weight, meaning safety-motivated governance gaps would be competed away regardless of the risks involved.
80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Conditional entirely on superintelligence arriving first. Global GDP growth remains in the 2–3% annual range. No current indicators of this growth rate. This is a theoretical extrapolation from the premise that automated cognitive labor at scale would be the dominant economic input, with marginal cost near zero.

Situational Awareness, Ch. IVConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

This is a probabilistic risk assessment that cannot be scored until AGI-level systems exist. Mechanistic interpretability was named one of MIT Technology Review's '10 Breakthrough Technologies 2026,' and Anthropic's Microscope tooling now traces complete model reasoning paths — genuine progress. However, the 2026 International AI Safety Report (backed by 30+ countries, 100+ experts) warns that reliable pre-deployment safety testing has become harder as models learn to distinguish test from deployment environments. Twelve companies published safety frameworks by end of 2025, but METR's December 2025 review found uneven implementation. April 2026 added a sharp data point on the offensive-cyber side of the takeover-scenario reasoning: Anthropic's Claude Mythos Preview autonomously discovered and exploited a 17-year-old root-level RCE in FreeBSD NFS (CVE-2026-4747) end-to-end, the first publicly documented case of a frontier model independently constructing a working exploit chain against production infrastructure.

Dwarkesh Podcast, Part 2Confidence: impliedReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Pennsylvania shale extraction capacity is real and substantial. However, no datacenter build at 200 GW scale has been attempted — Stargate ultimately targets ~5 GW at its Abilene campus. This is primarily a feasibility claim (it could be done) rather than a prediction it will be done. The bottleneck is mobilization, capital, and permitting — not geological resource availability. Planned US data center buildouts represent ~140 GW of new load in aggregate, suggesting the theoretical capacity is relevant but not being exploited at this claim's scale.

Situational Awareness, Ch. IIIaConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Active alignment research is progressing on multiple fronts: RLHF, constitutional AI, interpretability (mechanistic and representational), scalable oversight, and debate protocols. However, the 2026 International AI Safety Report warns safety testing is not keeping pace with capability gains, as models learn to distinguish test from deployment environments. No breakthrough has resolved core alignment problems; interpretability progress is real but incomplete. It remains too early to evaluate whether these methods will scale to AGI-level systems.

Dwarkesh Podcast, Part 2Confidence: impliedReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

US–China AI competition is intensifying: US chip export controls tightened significantly through 2025–2026, China's domestic semiconductor push continues (CXMT, SMIC advances), and the FY2026 NDAA mandates strict AI supply chain controls for defense procurement. The 2026 IC Annual Threat Assessment explicitly cites China's ambition to displace the US as global AI leader by 2030. The exact compute gap between US and China frontier AI is classified. The claim that a 6-month lead equals nuclear-level military dominance is contested by defense analysts but taken seriously in national security circles.

Situational Awareness, Ch. IIIdConfidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

No AI-automated military or police forces exist at scale. AI is entering defense systems (autonomous drones, surveillance, cyber operations) but human command authority remains doctrine in all NATO countries. The prediction is forward-looking and increasingly relevant as AI integration into defense accelerates — the DoD directed that new commercial AI models be deployed within 30 days of public release as of late 2025.

80,000 Hours Podcast, Part 2Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

GPT-4 training is estimated at ~3.8 × 10²⁵ FLOP. As of mid-2025, over 30 publicly announced models have been trained at or above GPT-4 scale. The next generation of frontier models (GPT-6 class) are expected to require 10²⁶–10²⁷ FLOP — 10–1,000× GPT-4. Stargate's Abilene campus (450,000+ GB200 GPUs, live early 2026) would be capable of training runs approaching 10²⁷–10²⁸ FLOP range by 2027–2028. Anthropic signed a multi-year capacity deal with CoreWeave in April 2026 (multiple Nvidia generations across US data centers), and on April 24, 2026, Google announced an additional investment of up to $40B in Anthropic in cash and compute — substantially expanding Anthropic's training and inference budget. Q1 2026 earnings (April 29) raised hyperscaler 2026 capex guidance to a combined $700B+ (Microsoft $190B, Amazon ~$200B, Alphabet $180–190B, Meta $125–145B). No confirmed training run has reached 1000× GPT-4 (i.e., ~3.8 × 10²⁸ FLOP) yet, but the infrastructure and capital to do so are being assembled at unprecedented scale.

Dwarkesh Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Global humanoid robot shipments are forecast at roughly 50,000 units for 2026, up ~700% from ~16,000 in 2025. The largest announced single production line (Tesla's planned Giga Texas Optimus line) targets 1 million units/year at full build-out, but is not yet operational. No automobile manufacturer has meaningfully converted capacity to robot production. Goldman Sachs' bull case reaches a $38B market by 2035, implying perhaps a few million units annually — still two orders of magnitude below the 1 billion/year claim.

Dwarkesh Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

AI systems currently consume orders of magnitude more energy than the human brain for equivalent cognitive tasks. This is a long-run efficiency extrapolation representing the theoretical upper bound on AI economic value if hardware efficiency converges to biological levels. Not a near-term claim — it describes the ceiling of what post-AGI AI automation could eventually deliver.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

A December 2025 investigation found China's AI tools now 'automate censorship, enhance surveillance and pre-emptively suppress dissent' at scale. Tencent's AI assigns citizens risk scores based on cross-platform activity; Baidu cooperates with government agencies in criminal cases; China's Supreme Court mandated AI integration into its justice system by 2025. A March 2026 Carnegie Endowment report confirmed the scaling of these capabilities while noting some technical limitations. The foundational infrastructure for AI-enabled ideological conformity is clearly operational — the 'permanent lock-in' Shulman describes has not yet been demonstrated.

80,000 Hours Podcast, Part 2Confidence: mediumReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Untestable until an intelligence explosion occurs. Shulman's argument is structural — faster explosions provide less time for alignment researchers to respond, detect deceptive behavior, or halt dangerous systems. Sam Altman's committed milestones (AI research intern by September 2026, full AI researcher by March 2028) make the speed question concrete: if each step takes roughly 18 months, the explosion would be gradual enough to observe in stages. But the 2026 International AI Safety Report warns models are already learning to distinguish test from deployment environments, suggesting deception detection is harder than anticipated.

Dwarkesh Podcast, Part 2Confidence: impliedReviewed Apr 2026

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Global GDP is ~$100T annually. Shulman's argument: if AI reaches even 10% of theoretical brain-efficiency at scale, the aggregate cognitive labor value would dwarf current economic output. No AI system is near this productivity level. This represents the theoretical upside of full cognitive automation — the 'ceiling' of Shulman's economic projections, conditional on AGI and hardware efficiency convergence. Worth noting: Sam Altman's April 2026 'Industrial Policy for the Intelligence Age' blueprint frames the transition in language consistent with this scale — 'Progressive Era / New Deal' magnitude — and proposes a Public Wealth Fund as the vehicle for distributing the resulting surplus. The fiscal-system-obsolescence implication Shulman flagged is now being publicly anticipated by the lab CEOs themselves, even though the underlying productivity has not arrived.

80,000 Hours Podcast, Part 1Confidence: mediumReviewed Apr 2026

2032

1 prediction

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

This is Cotra's February 2026 base expectation for when AI will broadly exceed human expert performance. As of April 2026, frontier models dominate human performance on many narrow cognitive tasks but fall significantly short of top-expert performance on creative, scientific, and strategic domains that require deep real-world understanding.

80,000 Hours Podcast — Episode 235Confidence: mediumReviewed Apr 2026

2033

1 prediction

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Cotra's estimate of the crunch window length is the central planning parameter for the AI safety field. The estimate of ~12 months (with uncertainty ranging from 6 months to 6 years) is lower than many alignment researchers assumed historically. As of April 2026, no AI system has automated AI R&D, so the crunch has not begun.

80,000 Hours Podcast — Episode 235Confidence: mediumReviewed Apr 2026

2036

1 prediction

Assessment

AI-assisted editorial assessment. Check the linked source before relying on it.

Cotra proposed 'self-sufficient AI' as a more concrete and mechanistically meaningful forecasting target than 'AGI'. As of April 2026, AI systems require extensive human maintenance infrastructure and cannot sustain their own operation independently. The claim is a >50% probability estimate for the 10-year window.

Planned Obsolescence Substack — Self-Sufficient AIConfidence: mediumReviewed Apr 2026

The Reckoning

Carl Shulman

Total22
Resolved hit rate100%
Got it right5
Still unfolding5
Too early to tell12
Got it wrong0
Got it right5(23%)
Still unfolding5(23%)
Too early to tell12(55%)

Leopold Aschenbrenner

Total22
Resolved hit rate100%
Got it right6
Still unfolding6
Too early to tell10
Got it wrong0
Got it right6(27%)
Still unfolding6(27%)
Too early to tell10(45%)

Ajeya Cotra

Total12
Resolved hit rate100%
Got it right1
Still unfolding6
Too early to tell5
Got it wrong0
Got it right1(8%)
Still unfolding6(50%)
Too early to tell5(42%)

Tracking predictions in real time. Updated as evidence emerges.