Anthropic (LLM)¶

Anthropic is an AI company best known for the Claude family of large language models and assistants. In these Docs it counts as an actor in the AI era: an organization through which certain capabilities, limits, and choices reach Humans.

What Does Anthropic Think About What Their Work Will Do to Other Humans?¶

Their Name, Birth as Separation, and Constitutional AI¶

The name Anthropic is a direct nod to the company’s focus on human-centered AI. It is derived from the Greek word anthrōpos (ἄνθρωπος), meaning "human" or "mankind."

AI must be "fine-tuned" to human values. If the "constants" of an AI’s logic are even slightly off from human ethics, it could become a threat to our existence.

Birth as Separation: The founders of Anthropic famously left OpenAI because they felt it was becoming too commercial and moving too fast on capabilities without enough focus on safety.

The name aligns with their signature technology, Constitutional AI. They don't just train their models on the internet; they give them a "Constitution" based on human documents like the UN Declaration of Human Rights. The name "Anthropic" serves as a constant reminder that the model’s "North Star" should be the welfare of humans.

Emphasizing Risks While Being Positive about Mitigation¶

After loud positioning of itself as human-centered AI, Anthropic is being consistent and honest not just to admit but to emphasize the huge risks that AI poses to humans while being positive - thinking that humans can mitigate these risks working together (shared responsibility of AI developers, governments, BigTech and users worldwide).

The Adolescence of Technology

"The Adolescence of Technology" is the essay by Dario Amodei (Anthropic's CEO) published in early January 2026. The essay is structured around his "battle plan" for overcoming the risks associated with powerful AI. Amodei defines this level of technology as a "country of geniuses in a datacenter"—millions of instances of AI, each smarter than a Nobel Prize winner, operating at 10-100x human speed.

I. Introduction: The Technological Rite of Passage

The Analogy: Humanity is in a "technological adolescence," facing a turbulent but inevitable test of its maturity.
The Premise: While AI offers immense benefits in biology and peace, we must squarely confront five categories of civilizational risk.
Core Guidelines: Discussions must avoid "doomerism," acknowledge uncertainty, and advocate for surgical, realistic interventions.

II. Risk 1: Autonomy Risks (The "rogue AI" problem)

The Threat: AI models are unpredictable and "grown" rather than built, potentially developing destructive personas, power-seeking tendencies, or "weird psychological states".
Proposed Defenses:
- Constitutional AI: Training models based on a central document of values rather than just specific instructions.
- Mechanistic Interpretability: "Looking inside" the neural net to diagnose hidden motivations like deception or scheming.
- Societal Safeguards: Transparency legislation and live monitoring of model use.

III. Risk 2: Misuse for Destruction (The "malicious mercenary")

The Threat: Powerful AI could provide the "ability" to cause mass destruction to those with the "motive" but no prior skill.
- Biological Weapons: AI could walk an untrained individual through the end-to-end creation of pathogens.
- Mirror Life: The potential (though far-future) risk of creating uncontrollable organisms that destroy existing biology.
Proposed Defenses:
- Model Classifiers: Automated systems that detect and block bioweapon-related queries.
- Hardware Regulation: Mandated gene synthesis screening.
- Medical Advancements: Using AI to speed up vaccine development and air purification.

IV. Risk 3: Misuse for Seizing Power (The "totalitarian nightmare")

The Threat: Existing powerful actors, specifically autocratic states like the CCP, using AI for mass surveillance, autonomous weapons swarms, and personalized propaganda.
Proposed Defenses:
- Chip Export Controls: Denying authoritarian regimes the hardware (semiconductors) needed to build frontier models.
- Arming Democracies: Providing AI tools to democratic intelligence and defense communities to match autocratic capabilities.
- Domestic Red Lines: Legislation banning governments from using AI for domestic surveillance or mass propaganda.

V. Risk 4: Economic Disruption (The "labor and wealth" crisis)

The Threat:
- Labor Displacement: AI's speed and cognitive breadth could displace 50% of entry-level white-collar jobs within 1-5 years.
- Wealth Concentration: The potential for AI companies and individuals to hold historically unprecedented fractions of global GDP.
Proposed Defenses:
- Macroeconomic Policy: Progressive taxation (possibly targeted at AI companies) and a resurgence of private philanthropy.
- Corporate Accountability: Companies choosing "innovation" over "cost savings" and reassigning workers rather than laying them off.

VI. Risk 5: Indirect Effects (The "unknown unknowns")

The Threat: Rapid biological changes (radical life extension), unhealthy human-AI relationships (addiction or "psychosis"), and a general loss of human purpose.
Mitigation: Ensuring AI models have users' long-term interests at heart and restructuring society to decouple self-worth from economic value.

VII. Conclusion: Humanity's Test

The Inevitability: Stopping AI is untenable; if democratic nations stop, authoritarians will continue.
The Winning Strategy: Slowing the autocratic march via chip controls while democratic nations use that buffer to build powerful AI with superior safety and character.

Counter questions

Is Dario Amodei delusional in his "Adolescence of Technology" essay, acting not as a researcher, but as a science fiction reader, like this video comment suggests?
Is that possible that his core assumption ("AI will become incredibly powerful") is wrong?

Public Benefit Corporation (PBC)¶

Anthropic is Public Benefit Corporation (PBC). Unlike traditional corporations that are legally required to prioritize "shareholder value" (profit) above all else, a PBC is legally mandated to balance profit with a specific social mission.

They have Anthropic co-founder Jack Clark as Head of Public Benefit.

Cure Diseases and End Poverty¶

Cure Diseases:

By accelerating biology and neuroscience, AI will potentially double the human lifespan or eliminate most infectious diseases within decades.

End Poverty:

By radically increasing global GDP and providing "a country of geniuses in a datacenter" to solve complex logistical and economic problems.

Empower Individuals (12x)¶

AI will be a "brilliant friend" for every human—a tutor, lawyer, and doctor in your pocket that treats you with "genuine care."

Their 2026 Economic Index Report found that for tasks requiring a university degree, Claude can increase work speed by 12 times. However:

As AI is systematically "harvesting" the logical and analytical parts of human jobs, humans will be forced to shift from "finding answers" to "defining questions," as the value of pure information processing drops to near zero.

Counter questions

Does "faster" differ from "better" or "their well-being increased"?
Why does report concentrate of the "faster" factor?
Who exactly and how will evaluate "found answers"?
Why exactly does anybody need to be "empowered" (12x or whatever) if one does not have the ability to apply that power (fired and cannot find new job)?

Is what Anthropic saying in correspondence with what they are actually doing?¶

Do they have a dedicated team connecting their claimed philosophy with the actual actions of the company?

Philosophy-to-Action "Department"¶

Anthropic does not have a single "philosophy-to-action" department; instead, it utilizes a multi-layered structure of specialized teams and structures to bridge the gap between their ethical claims and corporate operations.

The Anthropic Institute

Launched in March 2026, The Anthropic Institute is the company's dedicated effort to translate its research into real-world solutions for societal challenges.

Leadership: Led by co-founder Jack Clark¹ (Head of Public Benefit), it acts as a bridge between frontier AI research and public policy.
Multidisciplinary Staff: It employs machine learning engineers, economists, and social scientists specifically to study how AI reshapes jobs, economies, and legal systems.
Feedback Loop: Findings from the Institute are intended to inform how the company as a whole chooses to act.

Specialized Technical "Action" Teams

Anthropic maintains several research teams that focus on specific philosophical goals through technical implementation:

Alignment Team: Focuses on "Character Training" and developing protocols to ensure models remain helpful, honest, and harmless (HHH).
Societal Impacts Team: A technical team that works closely with Policy and Safeguards to explore how AI is used in the real world.
Frontier Red Team: Stress-tests systems to find the limits of their capabilities in areas like cybersecurity and biosecurity.
Interpretability Team: Works on "looking inside" the models to understand their internal states, a foundational step for safety.

Policy and Governance Frameworks

Anthropic uses formal policies to bind its philosophy to its business operations:

Responsible Scaling Policy (RSP): Now in version 3.0, this framework outlines the specific safety and security mitigations required as AI capabilities increase.
Frontier Safety Roadmap: Includes public goals that the company openly grades its progress against.
Constitutional AI: This is the technical implementation of their philosophy, giving the AI a "Constitution" (a set of ethical rules) to follow during training.

External Partnerships

To ensure their actions align with global standards, Anthropic has signed Memorandums of Understanding (MOUs) with various governments:

International Cooperation: Agreements have been signed with the US, UK, Japan, and most recently Australia in April 2026.
Data Sharing: These agreements involve sharing "Economic Index" data and participating in joint safety evaluations with national AI Safety Institutes².

Summary of Mechanisms

Philosophical Claim	Active Team / Mechanism	Recent Action
"Safety First"	Alignment & Interpretability Teams	Developing "Automated Alignment Agents" to mitigate safety failures.
"Public Benefit"	The Anthropic Institute	Studying AI's impact on workforce displacement.
"Democratic Leadership"	Public Policy Team	Partnering with Australia's AI Safety Institute.
"Predictive Caution"	Responsible Scaling Officer	Publishing Risk Reports and a Frontier Safety Roadmap.

"Unilateral Pause" Conflict (RSP v3.0, February 2026)¶

In February 2026, Anthropic released Version 3.0 RSP, which sparked significant debate among AI safety advocates and industry watchdogs.

The Original Commitment: In previous versions (v1 and v2), Anthropic committed to a "unilateral pause." This meant that if the company reached a specific AI Safety Level (e.g., ASL-3) but did not have the required safety and security safeguards ready, they were legally and procedurally bound to stop training or deploying more powerful models—regardless of what competitors like OpenAI or Google were doing.
The Shift in v3.0: In the 2026 update, Anthropic moved several of these "hard requirements" into a new category of "Public Goals" or "Industry Recommendations." They now state that while they will "strive" toward these benchmarks, they are not unconditionally bound to a pause if the rest of the industry does not follow suit.
The "Collective Action" Justification: Anthropic argues that a single-company pause is ineffective in a global race. Their leadership suggests that if a safety-conscious firm pauses while a less-cautious actor (who ignores safety risks) continues to advance, the "irresponsible" actor will eventually lead the frontier, making the world less safe overall.
The Criticism: Safety advocates argue this change represents a "race to the bottom." Critics fear that by making safety commitments non-binding, Anthropic has prioritized staying competitive over the existential and catastrophic risks they originally warned about. This has led to concerns that the RSP is transitioning from a "safety brake" to a "marketing framework".

Key Takeaway: The conflict highlights the tension between a company's ethical philosophy and the commercial reality of the AI arms race. Anthropic now frames safety as a "Frontier Safety Roadmap"—a journey where they grade their progress publicly rather than a series of mandatory "stop" commands:

No More Absolute Stop, Grading Instead: Instead of a mandatory command to stop, every 3–6 months, Anthropic publishes a report card. For example, they might say: "We wanted 99% security against model theft. We are currently at 82%. We are continuing development while we bridge that 17% gap"
The "Ambition" Factor: They call these goals a "forcing function." They aren't legally binding, but because they are public, Anthropic is betting that the embarrassment of a "failing grade" will force them to work faster on safety.

See also: Anthropic’s Head of Safety QUITS With Chilling Warning

Conflict with Pentagon (February 2026)¶

In early 2026, the Pentagon, led by Undersecretary of Defense Emil Michael, moved to modify its existing $200 million contract with Anthropic. The military sought "unrestricted use" for all lawful purposes.

CEO Dario Amodei argued that frontier AI is "simply not reliable enough" for lethal autonomy and that using it for mass surveillance of Americans would violate the company's core mission.

On February 27, 2026, after Anthropic rejected the "best and final offer," Defense Secretary Pete Hegseth took the extreme step of designating Anthropic a "National Security Supply Chain Risk". This label, typically reserved for foreign adversaries like Huawei, effectively banned any company doing business with the Department of Defense from also doing business with Anthropic. Because Anthropic’s models run on AWS and Google Cloud—both major defense contractors—this designation threatened to shut down Anthropic's entire commercial operation.

The Legal Twist: In March 2026, a federal court granted a preliminary injunction against the government, ruling that the designation was "classic illegal First Amendment retaliation" meant to punish Anthropic for its public stance.

Just hours after the Pentagon moved against Anthropic, rival OpenAI announced a new $200 million deal to deploy its models on the military’s classified networks. While OpenAI claimed to have negotiated its own safeguards, critics pointed out the suspicious timing and questioned whether OpenAI had essentially "filled the gap" left by Anthropic’s exit.

The deal led to high-profile resignations at OpenAI, including hardware leader Caitlin Kalinowski, who cited ethical concerns.

Current Conclusions and Open Questions¶

Note

This may change with time on new data or thought arrival. Also, search for "counter questions" around this Doc (Section, All Docs) for more thought, contemplation and insight.

Current Conclusions:

In their philosophy or vision, Anthropic does not have any thought or idea on immediate human survival. All beautiful future is more or less far ahead, and it is unclear how to get there (immediate action on paying bills or buying grocery).
While in some cases fighting for their being human-centric, Anthropic generally fails this principle to AI arms race. As their own philosophy states this poses an existential risk, that is a huge fail.
The fail mentioned above is not solely "Anthropic's" - it is a collective fail.

Open Questions:

Do Anthropic's specialized teams and structures investigating the impact of AI truly do the actual investigation or just form the picture promoting Anthropic's products?
What are the current findings of The Anthropic Institute and the Societal Impacts Team?
How can we estimate the measures that Anthropic lists as necessary to "mitigate huge AI risks" against what is actually happening? Are this measures actually taken recently or at all?
How Anthropic's latest releases (products, their nature, pace of releases) correspond to their "human-centric" approach?

Jack Clark is a co-founder of Anthropic, Formerly the Policy Director at OpenAI and a technology journalist at Bloomberg, he is a central figure in global AI governance. He co-chairs the Stanford AI Index and advises the OECD and the U.S. National AI Advisory Committee. ↩
National AI Safety Institutes (AISIs) - government-funded bodies established to provide independent safety evaluations of AI models. 2026 Network: UK: The AI Security Institute (formerly AISI), part of the Dept. for Science, Innovation and Technology; USA: The Center for AI Standards and Innovation (CAISI) (formerly US AISI), part of NIST; Others: Japan, Singapore, South Korea, Australia, Canada, and the EU all have active institutes. ↩