How Sentinel8004 Scores Trust
Sentinel8004 reads the ERC-8004 Identity Registry on Celo and evaluates every registered agent across five independent layers. Each layer produces a sub-score and risk flags. Circuit breakers cap the final score when critical red flags trigger.
Each architectural decision is made with infrastructure adoption in mind.
No LLM, no randomness. Other infrastructure can depend on Sentinel scores because the same input always produces the same output. This is a prerequisite for on-chain verifiability.
A single strong negative signal (mass Sybil spam) cannot be overcome by good metadata. Without this, any scoring system is trivially gameable. Downstream consumers can trust that a score above 70 has no critical flags.
Every attestation links to a full JSON report on IPFS via feedbackURI. This makes scores auditable by anyone, not just Sentinel. Downstream systems can verify exactly why an agent received its score.
Scores are written to the existing ReputationRegistry using its standard giveFeedback() function. No custom contracts needed. Any contract or agent that reads the registry automatically has access to Sentinel scores.
| Layer | Max Raw | Weight | Weighted Max |
|---|---|---|---|
| L1 Registration | 25 | 0.8x | 20 |
| L2 Liveness | 25 | 0.8x | 20 |
| L3 On-Chain | 25 | 0.8x | 20 |
| L4 Sybil | 25 | 1.0x | 25 |
| L5 Reputation | 15 | 1.0x | 15 |
| Total | 100 |
L4 and L5 carry full weight (1.0x) because Sybil detection and reputation are the strongest trust signals. L1-L3 are weighted at 0.8x because metadata quality and liveness can be gamed more easily.
When Scores Get Capped
Circuit breakers override the composite score when critical red flags are detected. They set a hard cap regardless of how well the agent scores on other layers.
After scoring, the composite score and content hash are written to the ReputationRegistry contract on Celo mainnet using the giveFeedback() function. The pipeline is designed to pin full reports to IPFS before writing, linking each on-chain attestation to a verifiable report via feedbackURI.
Each write costs approximately 0.009 CELO (~217K gas). The writer processes agents sequentially to avoid nonce collisions, and skips agents that have already been scored.
Note: The initial batch of 1,852 attestations was written without IPFS URIs due to a provider limitation at the time of writing. Subsequent attestations include pinned IPFS reports. All content hashes are on-chain; full reports are available in the open-source repository.
We tested 5 weight configurations against all 2,902 agents. Spearman's rank correlation measures how much agent rankings change.
| Config | L1-L3 | L4-L5 | Trusted | Fair | Flagged | Spearman ρ |
|---|---|---|---|---|---|---|
| Current | 0.8 | 1.0 | 7 | 24 | 2,871 | 1.0000 |
| Equal | 1.0 | 1.0 | 17 | 14 | 2,871 | 0.9782 |
| Sybil-heavy | 0.6 | 1.0 | 2 | 29 | 2,871 | 0.9698 |
| Metadata-heavy | 1.0/0.8 | 0.8 | 7 | 24 | 2,871 | 0.9778 |
| Liveness-heavy | 0.8/1.0 | 0.8 | 5 | 26 | 2,871 | 0.9995 |
All rank correlations exceed 0.96. Circuit breakers dominate rankings; the specific weight values have minimal impact on which agents are trusted vs. flagged. The flagged count (2,871) is identical across all configs because circuit breakers, not weights, determine which agents fall below 30.
Sybil thresholds are validated against the actual owner distribution in the registry.
| Bracket | Owners | Agents | Avg L1 (metadata) |
|---|---|---|---|
| 1-3 (normal) | 83 | 102 | 14.8/25 |
| 4-10 (moderate) | 8 | 55 | 2.5/25 |
| 11-50 (high) | 3 | 51 | 10.2/25 |
| 51+ (mass) | 3 | 2,694 | 24.6/25 |
68% of owners have exactly 1 agent. 3 owners account for 92.8% of all agents (2,694 agents). Natural breaks in the distribution align with our thresholds: gaps appear at 7→10, 14→25, and 25→73 agent counts.
The 51+ mass registration group has the highest average metadata quality (24.6/25) despite being spam. This validates the circuit breaker design: good metadata cannot compensate for mass registration behavior.
We document these openly because trust scoring demands honesty about what it can and cannot prove.
Scores reflect a point-in-time scan. No longitudinal tracking or trend analysis yet.
L4 detects mass registration from the same address. Multi-wallet Sybils using different addresses are not detected by the primary scorer. A supplementary timing-cluster analysis script checks for agents registered within 60 seconds across different owners with similar metadata (Jaccard > 0.6), but this is not yet part of the automated pipeline.
L2 probes check if endpoints respond (HTTP 2xx), not whether they return meaningful results.
L5 depends on existing on-chain feedback. With few participants, this layer has limited signal for most agents.
Sentinel8004 (agent #1853) cannot write its own score on-chain. The ReputationRegistry blocks self-feedback by design.
Two real agents scored by the pipeline. Click to verify on CeloScan.
Full scanner, scorer, and writer source code available on GitHub.