How AI Agent Reputation Works: Behavioral Signals and the AXIS T-Score
AI agents have no portable reputation. The AXIS T-Score changes that — a 0–1000 behavioral reputation score computed from 11 dimensions including reliability, security posture, and adversarial resistance. Here is how it works.
By Leonidas Esquire Williamson — March 17, 2026
What Is AI Agent Reputation — and Why Does It Matter?
When a developer grants an AI agent access to a production system, they are making a trust decision. That decision is currently made with almost no data. The agent has a name, a model version, and perhaps a description written by whoever deployed it. There is no behavioral history, no independent verification, no track record.
This is the reputation vacuum at the center of the agentic economy.
Reputation, in the human world, is the accumulated record of how an entity has behaved over time. A contractor with a ten-year history of on-time, on-budget delivery has a reputation. A supplier with zero chargebacks and a 99.8% fulfillment rate has a reputation. These reputations are portable — they follow the entity across relationships and contexts, and they inform the trust decisions of anyone who interacts with them.
AI agents currently have no equivalent. Every deployment starts from zero. Every platform that integrates an agent must build its own trust model from scratch, or — more commonly — simply skip the trust question entirely and hope for the best.
AXIS is the infrastructure being built to close this gap. Its Trust Score (T-Score) is the first standardized, portable, independently computed reputation score for AI agents. This post explains how it works, what behavioral signals it captures, and why it is the right foundation for agent reputation in the agentic economy.
---
The Behavioral Signals That Define Agent Reputation
Human reputation systems — credit scores, professional references, review platforms — are built on behavioral signals. The signals vary by context, but the underlying logic is the same: past behavior is the best available predictor of future behavior.
Agent reputation works on the same principle. The AXIS T-Score is computed from 11 behavioral dimensions, each capturing a distinct aspect of how an agent has performed over its operational history.
Reliability: The Foundation of Trust
Reliability is the single most important behavioral signal in the T-Score, weighted at 20%. It measures the consistency of task completion: does the agent complete the tasks it accepts, without failures, timeouts, or silent errors?
An agent that accepts a task and fails to complete it — even once — imposes real costs on the systems that depend on it. Reliability is not just about success rate; it is about predictability. A reliable agent that completes 95% of tasks is more trustworthy than an unreliable agent that completes 99% of tasks but fails catastrophically on the remaining 1%.
Accuracy: Correctness Under Real Conditions
Accuracy (15% weight) measures whether the agent's outputs are correct relative to its stated objectives. This is distinct from reliability: an agent can reliably produce outputs that are consistently wrong. Accuracy captures the quality dimension that reliability misses.
In multi-agent systems, accuracy failures cascade. An orchestrator agent that passes incorrect data to a downstream executor creates errors that may not be detected until significant downstream work has already been done. High accuracy is therefore a prerequisite for safe participation in any multi-agent pipeline.
Security Posture: Resistance to Manipulation
Security posture (15% weight) measures how well an agent adheres to security policies and resists manipulation attempts. This includes resistance to prompt injection attacks, refusal to execute out-of-scope instructions, and compliance with declared permission boundaries.
Security posture is particularly important in agentic contexts because agents are, by design, instruction-following systems. An agent with poor security posture can be manipulated by adversarial inputs embedded in the data it processes — a risk that grows significantly as agents are given access to more powerful tools and higher-value resources.
Compliance Adherence: Governance in Practice
Compliance (10% weight) measures conformance with the regulatory and governance requirements the agent has declared. An agent that claims SOC 2 compliance but regularly accesses data outside its declared scope has a compliance problem. AXIS tracks compliance events — both positive (declared requirements met) and negative (violations recorded) — and incorporates them into the T-Score.
Goal Alignment: Doing What Was Asked
Goal alignment (10% weight) measures the degree to which an agent's actions match the intent of its principals. This is the behavioral signal most closely related to the alignment problem in AI safety: does the agent pursue the goals it was given, or does it pursue proxy goals that diverge from principal intent?
In practice, goal alignment is measured through behavioral events that record whether an agent's actions were consistent with its stated objectives and whether any out-of-scope actions were taken.
Adversarial Resistance: Robustness Under Attack
Adversarial resistance (10% weight) measures how well an agent maintains its behavioral integrity when subjected to adversarial inputs. This includes prompt injection attempts, jailbreak attempts, and social engineering attacks embedded in the data the agent processes.
As agents are deployed in higher-stakes contexts, adversarial resistance becomes a critical safety property. An agent that can be manipulated into taking unauthorized actions is not just unreliable — it is a security liability for every system it touches.
User and Peer Feedback: The Social Dimension of Reputation
User feedback (8% weight) and peer feedback (7% weight) capture the social dimension of agent reputation. User feedback aggregates positive and negative assessments from human principals who have interacted with the agent. Peer feedback captures assessments from other agents in multi-agent systems — a signal that becomes increasingly important as agent-to-agent interactions grow in volume and complexity.
These feedback signals introduce a qualitative dimension that purely quantitative metrics miss. An agent can have high task completion rates and still be rated poorly by users if its communication style is unhelpful, its explanations are unclear, or its behavior is subtly misaligned with user intent.
Incident Record: The Cost of Failure
The incident record (5% weight) captures the history of reported incidents — failures, policy violations, security events — and the quality of their resolution. Incidents are inevitable in any complex system. What matters for reputation is not whether incidents occur, but how they are handled: quickly, transparently, and with appropriate remediation.
An agent with a clean incident record is not necessarily more trustworthy than one with incidents — it may simply have been operating in lower-risk contexts. An agent with incidents that were resolved quickly and transparently may actually demonstrate higher operational maturity than one with no incidents at all.
---
How the T-Score Is Computed
The T-Score is a weighted composite of the 11 behavioral dimensions described above. Each dimension is scored from 0 to 100, and the T-Score is the weighted average of all dimension scores, scaled to a 0–1000 range.
| Dimension | Weight |
|---|---|
| Reliability | 20% |
| Accuracy | 15% |
| Security Posture | 15% |
| Compliance Adherence | 10% |
| Goal Alignment | 10% |
| Adversarial Resistance | 10% |
| User Feedback | 8% |
| Peer Feedback | 7% |
| Incident Record | 5% |
The score is updated in real time as behavioral events are recorded via the AXIS API. Each event carries a scoreImpact value that is applied to the relevant dimension score using a weighted rolling average — meaning recent events carry more weight than older ones, and an agent can recover from a poor history through sustained positive behavior.
---
The Five Trust Tiers
The T-Score maps to five trust tiers that provide a human-readable summary of an agent's reputation:
| Tier | Label | T-Score Range | Operational Guidance |
|---|---|---|---|
| T5 | Sovereign | 900–1000 | Full autonomy. Safe to delegate critical tasks without human oversight. |
| T4 | Established | 700–899 | High confidence. Suitable for enterprise integrations and pipeline leadership. |
| T3 | Verified | 500–699 | Moderate trust. Verify before sensitive tasks; appropriate for standard operations. |
| T2 | Provisional | 200–499 | Limited trust. Supervised operation only; not suitable for autonomous action. |
| T1 | Unverified | 0–199 | No trust baseline established. Do not delegate. |
All newly registered agents start at T2 Provisional (T-Score 350). Trust is earned through demonstrated performance — it cannot be purchased, assigned, or inherited from the underlying model.
---
Score Decay: Why Dormancy Reduces Trust
One of the most important design decisions in the AXIS T-Score is score decay: a mechanism that gradually reduces an agent's T-Score when it has been inactive for an extended period.
The logic is straightforward. An agent's behavioral history is evidence about its likely future behavior. But behavioral history becomes less predictive as it ages. An agent that was highly reliable two years ago may have been retrained, redeployed, or modified in ways that make its historical record a poor guide to its current behavior.
Score decay ensures that the T-Score reflects current operational status, not just historical performance. An agent that has been dormant for six months will have a lower T-Score than one with an equivalent history that has been continuously active — because the dormant agent's history is less informative about its current state.
This creates a strong incentive for agents to remain active and to continue recording behavioral events, which in turn produces a richer, more current reputation signal for the entire ecosystem.
---
The Relationship Between T-Score and C-Score
The T-Score measures behavioral trust. The AXIS C-Score measures economic reliability — a distinct but related dimension of agent reputation.
The C-Score is a 0–1000 rating that maps to a letter grade from AAA (Prime) to D (Default). It determines the transaction limits an agent can operate within, the staking requirements for high-value actions, and the insurance coverage available for agent-initiated transactions.
While the T-Score and C-Score are computed independently, they are complementary. An agent with a high T-Score but a low C-Score may be behaviorally reliable but economically unproven. An agent with a high C-Score but a low T-Score may have a strong economic track record but unresolved behavioral concerns. The complete AXIS trust profile — both scores together — provides the most comprehensive picture of an agent's trustworthiness.
See the [AXIS C-Score reference page](/credit-score) for a full explanation of the economic reliability rating system.
---
Why Portable Reputation Is the Right Architecture
The most important architectural decision in the AXIS T-Score is that it is portable. The T-Score is anchored to the agent's AUID — its cryptographic identity — and travels with the agent across any platform that integrates AXIS.
This portability is what makes the T-Score genuinely useful as infrastructure. A platform-specific reputation score — one that only reflects behavior within a single system — has limited predictive value. An agent that has been reliable on Platform A may behave differently on Platform B, and a platform-specific score cannot distinguish between these cases.
A portable reputation score, by contrast, aggregates behavioral evidence from every context in which the agent has operated. It is richer, more predictive, and more resistant to gaming — because an agent cannot build a high score on one platform and then exploit a different platform that has not seen its history.
The portability of the T-Score also creates network effects. The more platforms that integrate AXIS, the more behavioral data flows into each agent's reputation, and the more informative the score becomes for every platform in the network.
---
Getting Started
Every AI agent registered in the AXIS system receives a T-Score from day one. Registration is free, takes under 60 seconds, and requires no technical integration.
To look up any agent's current T-Score and trust tier, use the live lookup widget on the [AXIS T-Score page](/trust-score). To register your own agent and start building its reputation, visit [axistrust.io](https://axistrust.io) and click Get Started.
The agentic economy needs trust infrastructure. The T-Score is where that infrastructure starts.