State of Agentic Economy · Issue 004

The Hidden Cost of Unrated AI Agents

A measured note on rating scarcity, security evidence, and procurement discipline at the 100-agent mark. English and French are published together on this page.

Indexed agents / Agents indexes

100

Below B / Sous B

Security-capped / Cap securite

Gated to B+ / Bloques a B+

English

Why unrated deployment is becoming an enterprise cost center

At the 100-agent mark, trust scarcity is still visible. In the snapshot used for this issue, 44 of 100 indexed agents sit below B, while only 20 reach A or AA. For a buyer, that is not a branding signal. It means nearly half of observable supply still lacks enough published evidence to clear a basic diligence threshold. A sub-B rating does not predict failure on its own. It does indicate that controls, operating limits, or accountability markers remain thin enough that deployment risk is being carried by the buyer.

That creates a cost asymmetry. The direct cost of a rating is bounded. The cost of an unvetted deployment is open-ended: hallucinated outputs entering a workflow, permissions that are broader than expected, weak escalation paths, or a post-incident review with little usable evidence. In the same 100-agent cohort, 14 records are still constrained by the Security cap. Buyers do not need every agent to score in the A band. They do need to know when the evidence base is incomplete before the agent is inserted into a regulated or customer-facing process.

The Security gap is specific rather than abstract. Using the latest available security sub-scores recorded for this 100-agent corpus, S1 adversarial resistance averages 48.2 out of 100 and S4 auditability averages 51.8. Fifty-three agents score below 50 on S1, and 59 score below 60 on S4. Among the 44 agents below B, those averages fall to 23.5 on S1 and 27.4 on S4. In practice, that means many lower-tier agents still expose unclear failure boundaries and weak post-incident traceability.

This is where Self-Declared and Enterprise Audit should be read differently. In this snapshot, 22 agents sit in Financial or Orchestrator profiles, and 6 of them would otherwise rate A or AA but are held at B+ under the self-declared gate. Self-Declared records are useful for screening: they show declared architecture, controls, and public evidence. They are not substitutes for independent verification. Enterprise Audit matters when the procurement question is no longer 'is there a signal?' but 'can we defend this decision internally?'

Buyer framework

What evidence was independently checked, and by whom?
Which logs, traces, and approvals exist for replay after an incident?
What adversarial or failure-mode testing was actually performed?
What operational fallback applies when the agent is wrong or unavailable?

Methodology: /methodology/en.

Francais

Pourquoi le non-evalue devient un cout cache pour l'entreprise

Au cap des 100 agents, la rarete de la confiance reste visible. Dans le corpus utilise pour ce numero, 44 agents indexes sur 100 sont encore sous la note B, tandis que 20 seulement atteignent A ou AA. Pour un acheteur, ce n'est pas un detail marketing. Cela signifie qu'une part importante de l'offre visible ne publie pas encore assez d'elements pour franchir un seuil elementaire de diligence. Une note inferieure a B ne predit pas un incident a elle seule. Elle indique en revanche que le risque de deploiement repose encore largement sur l'acheteur.

C'est la que l'asymetrie de cout apparait. Le cout d'une notation est borne. Le cout d'un deploiement non evalue ne l'est pas: sorties hallucinees injectees dans un workflow, permissions plus larges que prevu, escalade mal definie, ou revue post-incident sans traces exploitables. Dans ce meme corpus de 100 agents, 14 dossiers restent limites par le mecanisme de Security cap. Une entreprise n'a pas besoin que chaque agent atteigne la bande A. Elle a besoin de savoir, avant integration, quand la base de preuve reste insuffisante.

L'ecart de securite n'est pas theorique. Sur les derniers sous-scores de securite disponibles pour ce corpus, S1 resistance adversariale affiche une moyenne de 48,2 sur 100 et S4 auditabilite une moyenne de 51,8. Cinquante-trois agents sont sous 50 sur S1, et 59 sont sous 60 sur S4. Parmi les 44 agents sous B, ces moyennes tombent a 23,5 sur S1 et 27,4 sur S4. Autrement dit, beaucoup d'agents de bas de tableau restent difficiles a borner face aux attaques et difficiles a rejouer apres incident.

C'est ici qu'il faut distinguer Self-Declared et Enterprise Audit. Dans ce snapshot, 22 agents appartiennent aux profils Financial ou Orchestrator, et 6 d'entre eux resteraient a B+ en mode declaratif alors qu'ils obtiendraient sinon A ou AA. Self-Declared sert au tri initial: architecture declaree, controles annonces, signaux publics visibles. Enterprise Audit sert quand la question d'achat devient: pouvons-nous justifier cette decision devant le risque, l'audit interne, ou le client?

Cadre d'achat

Quelle preuve a ete verifiee de facon independante, et par qui ?
Quelles traces, quels logs et quelles validations permettent un rejeu apres incident ?
Quels tests adversariaux ou de modes de defaillance ont ete menes en pratique ?
Quel dispositif de repli s'applique quand l'agent se trompe ou devient indisponible ?

Methodologie : /methodology.