Solana transaction infrastructure · Mainnet

Argus.

The Core is the eyes. The Agent is the judgment. The boundary between them is the architecture.

A smart-transaction stack that watches Solana in real time, lands Jito bundles intelligently, tracks every submission across commitment levels, and delegates one decision — failure diagnosis — to an AI agent that reasons over the raw failure surface instead of running a script.

bundles sent

landed

faulted

0.000135 SOL

total cost

123 ms

median P→C

mainnet

network

01 · System architecture

System architecture

Two runtimes, one contract. A Rust Core holds every deterministic, network-facing concern. A TypeScript Agent holds a single judgment call. They speak only over HTTP/JSON — and that process boundary is the challenge's required clean separation between AI layer and core stack, made literal.

Rust earns its place on the Yellowstone gRPC firehose: tokio gives idiomatic bounded-channel backpressure and reconnection — the one genuinely hard feature the brief names. TypeScript hosts the agent for LLM ergonomics and fast prompt iteration. Neither leaks into the other. ADR 0001

flowchart LR
  subgraph EXT["External infrastructure"]
    direction TB
    YS["SolInfra Yellowstone gRPC
slot + tx streams"]
    RPC["SolInfra RPC
blockhash · simulate"]
    SE["Jito searcher gRPC
next leader"]
    TF["Jito tip floor"]
    BE["Jito block engine
sendBundle · 8 regions"]
    OR["OpenRouter
model access"]
    SOL["Solana mainnet"]
  end
  subgraph CORE["Core — Rust · the eyes"]
    direction TB
    STR["streaming"]
    LDR["leader"]
    TIP["tip"]
    BDL["bundle"]
    LFC["lifecycle"]
    FAIL["failure / classify"]
    ACL["agent_client"]
    DB[("SQLite")]
  end
  subgraph AG["Agent — TS · the judgment"]
    direction TB
    HTTP["/decide · /health/"]
    DEC["decide()"]
  end
  YS --> STR --> LFC --> DB
  RPC --> FAIL
  SE --> LDR
  TF --> TIP
  TIP --> BDL
  LDR -.timing.-> BDL
  BDL --> BE --> SOL
  FAIL --> ACL
  ACL -->|"raw failure surface"| HTTP --> DEC --> OR
  ACL --> DB

Figure 1. The Core touches every network surface; the Agent touches only the model, reached over one HTTP contract.

Deployment

Layout: core/ (Rust), agent/ (TypeScript), docs/ (ADRs + plan), logs/ (SQLite + JSONL + Markdown Lifecycle Log).
Mainnet for the real path with SolInfra credits and a dedicated low-balance keypair — Jito only lands bundles on mainnet, and judges verify slots on explorers. Devnet is a sandbox only. ADR 0002
One contract: a scored run hard-gates on the agent's /health and refuses to start if it is down.

02 · Key components

Key components

Core (Rust) — deterministic, network-facing

Module	Responsibility	Key surface
`streaming`	Yellowstone slot + tx subscriptions; resilient driver with reconnect + backpressure.	`track_lifecycle`, `resilient_subscribe`
`leader`	Next Jito leader window over gRPC — a soft timing signal, never a gate.	`next_scheduled_leader`
`tip`	Base tip from the live Jito tip floor percentile; clamped to sane bounds.	`fetch_tip_lamports`
`bundle`	All-or-nothing Jito bundle (payload + tip); 8-region concurrent submit.	`build_bundle`, `submit_all_regions`
`rpc`	Blockhash, `simulateTransaction`, balance, aged blockhash for injection.	`simulate_transaction`, `SimResult`
`failure`	Fault injection, baseline classification, remedy execution, the `Policy` seam.	`classify_failure`, `apply_remedy`
`agent_client`	The one HTTP boundary: send the raw surface, receive the Decision.	`AgentClient`, `Decision`
`storage`	SQLite source of truth; first-observation-wins stage stamps.	`Store`, `record_decision`
`export`	Render the Lifecycle Log (JSONL + Markdown) purely from SQLite.	`write_lifecycle_log`

Agent (TypeScript) — the single judgment

index.ts — Express service: GET /health, POST /decide (zod-validated), port 8787.
decide.ts — the OpenRouter call: prompt, reasoning request, submit_decision tool parse.
types.ts — zod schemas mirroring the Rust types (snake_case).

Storage — the Lifecycle Log is the deliverable

submissions

one row per attempt

run_id · attempt · nonce · signature · tip_lamports · landed_slot · processed_at · confirmed_at · finalized_at · failure_class

decisions

one row per agent decision

remedy · baseline_remedy · diagnosis · triage · rationale · confidence · reasoning_trace · model

A Run is a prefix, not a column: the session is run-{ts} and payload k runs under child run_id = run-{ts}-p{k} — unique keys, zero schema change. ADR 0011

03 · Data flow

Data flow between services

Happy path — submit, track, persist

The subscription opens before the bundle is sent, so inclusion is never missed; tracking is reconciled afterward against getSignatureStatuses in case a Landed frame is dropped.

sequenceDiagram
  autonumber
  participant O as Orchestrator
  participant B as bundle
  participant J as Jito
  participant Y as Yellowstone
  participant DB as SQLite
  O->>B: build_bundle(payload, tip)
  O->>DB: record_submission
  O->>Y: subscribe slot + tx — before submit
  Y-->>O: on_subscribed
  O->>J: submit_all_regions
  Y-->>O: Landed (slot)
  O->>DB: set_landed_slot
  Y-->>O: Processed → Confirmed → Finalized
  O->>DB: mark_stage

Figure 2. Inclusion comes from the transaction stream; commitment progression from the slot stream. ADR 0004

The lifecycle, measured

One submission's progression on mainnet, with the real deltas from the graded run. Two adjacent deltas, two orders of magnitude apart, measuring different physics.

SUBMITTED

bundle sent to the Jito block engine

LANDED · inclusion

included in a Jito leader's slot — binary, detected on the tx stream

NON-LANDING

faulted bundles never reach inclusion → recorded with no slot

PROCESSED

block replayed by a node

+123 msvote-aggregation latency (median) — consensus health

CONFIRMED

≥ ⅔ of stake voted on the slot

+12.2 srooting (~31 confirmed blocks)

FINALIZED

slot rooted and irreversible

123 ms vs 12.2 s. The first measures how fast votes propagate; the second waits for the chain to root. Same instrument, two different questions.

processed → confirmed ~123 ms (the sliver on the left) · confirmed → finalized ~12.2 s — about 100× longer.

Failure path — diagnose, remedy, resubmit

A Jito bundle is all-or-nothing, so a faulted transaction never lands and leaves no on-chain error. The one deterministic pre-submit source of truth is a preflight simulateTransaction — and that output is the raw surface handed to the agent.

sequenceDiagram
  autonumber
  participant O as injection_run
  participant R as rpc.simulate
  participant A as Agent → OpenRouter
  participant DB as SQLite
  O->>R: simulateTransaction
  R-->>O: err · instruction_error · logs
  O->>A: POST /decide (raw surface, no failure_class)
  A-->>O: diagnosis · triage · remedy · trace
  O->>DB: record_decision (agent + baseline)
  alt remedy = abort
    O-->>O: stop — no retry
  else recoverable
    O->>O: attempt-2 (fresh blockhash / raised CU)
  end

Figure 3. The agent receives the raw surface, not the baseline verdict. ADR 0010 · 0012

04 · Infrastructure decisions

Infrastructure decisions

Every decision is recorded as an ADR in the repo. The load-bearing ones:

Decision	What & why	ADR
Mainnet, not devnet	Jito lands bundles only on mainnet; slots must be explorer-verifiable. SolInfra credits remove the cost argument; a low-balance keypair caps exposure.	0002
Streams, not polling	Inclusion from the tx stream, commitment from the slot stream. `getBundleStatuses` is a cross-check only.	0004
Dynamic tips	Base tip = a live tip-floor percentile (default p75), rotated across accounts — never hardcoded. The agent may raise it as a remedy; base tipping stays in Core.	0005
OpenRouter	OpenAI-compatible API normalizes reasoning traces and a `submit_decision` tool across providers — the model is env-configurable and rotatable.	0006
Jito bundles are scored	Real `sendBundle`, multi-region fan-out. A Jito auth UUID makes the engine forward bundles. Helius Sender is a keyless backstop, never the scored path.	0007
Leader via searcher gRPC	`getNextScheduledLeader` is gRPC-only; a minimal vendored proto avoids a conflicting SDK. Timing is a soft signal, never a gate.	0008
Stream resilience	A receive task feeds a bounded channel; exponential-backoff reconnect, a cumulative ceiling, and shed-and-count give genuine backpressure.	0009

05 · Failure handling

Failure handling strategy

Failure is the heart of the system — happy-path-only submissions are disqualified. Argus handles it on two axes: a bounded four-class baseline for remedy variation, and an unbounded program-error tail for diagnosis variation.

The bounded baseline — four classes

Failure class	Induced by	Default remedy
Expired blockhash	Sign against a real blockhash aged ~200 slots (past the ~150 window)	refresh blockhash
Compute exceeded	CU limit set to 1, below need	raise CU limit (from re-simulation)
Bundle failure	Include a failing instruction	abort / rebuild
Fee too low	Tip below the live floor under contention	bump tip

The unbounded tail — where a classifier goes blind

One identical malformed instruction — [0xff; 8], zero accounts — sent to three different real programs produces three distinct errors. The four-class baseline collapses all three to one verdict. The agent does not.

Click each failure below ↓ The baseline on the left never moves — it's blind. The agent on the right names a different cause every time.

Baseline · 4-class lookup

bundle_failure → abort ⚠

The same verdict for all four — it can't tell them apart.

Agent · reasons from the raw error

Retry, recovery, degradation

Remedy execution stays in Core. The agent names the remedy; Core owns the magnitudes — e.g. the raised CU limit comes from a max-CU re-simulation, not a tuned constant.
Attempt-2 is seeded clean, so a remedy is tested honestly rather than inheriting the injected fault.
Loud degradation. If the agent is unreachable within ~45s, Core falls back to the baseline and records model="local-fallback" — visible in the log, never silent.

06 · AI agent responsibilities

AI agent responsibilities

The agent owns exactly one operational decision — failure diagnosis. It observes a failed transaction, reasons about why it failed, and decides what must change before retrying. Retry decisions come from the agent, not from hardcoded logic.

The contract

Direction	Payload
Core → Agent `POST /decide`	`error_text`, `instruction_error`, `failing_program_id`, `program_logs[]`, `tip_floor_p50/p75`, `blockhash_age_slots`, `cu_limit`, `cu_used`. No failure_class is sent.
Agent → Core `submit_decision`	`diagnosis` (free text), `triage`, `remedy`, `rationale`, `confidence` — plus `reasoning_trace` and the serving `model`.

Triage — the axis the agent reasons on

refreshexpired blockhash → refresh & resubmit

modifycompute exceeded → raise CU / bump tip

permanentmalformed call → abort, with the reason

fundinginsufficient lamports → abort / top up

Why this isn't sequential automation

Any decision specifiable cleanly enough to grade is encodable as a classifier — legible ⟹ enumerable ⟹ lookup-replicable. Handing the agent a four-class verdict and a five-element remedy set is a 4→5 mapping a match replicates: the "simple wrapper" the brief disqualifies. The escape is a different input — the unbounded, unstructured raw failure surface. An AMM alone defines its own custom-error enum (Custom(6022) differs per program and version); a static classifier would need a combinatorial, perpetually-stale table. Reasoning over the raw surface does not.

The honesty boundary: on a permanent failure the agent and the baseline both abort — the agent's value there is the reason, not a different action. It is graded on the diagnoses a lookup can't produce, not on theatrical disagreement. ADR 0012

07 · Operational evidence

Operational evidence

From the graded mainnet run run-1781958744615, committed to the repo as logs/lifecycle-1781958744615.{md,jsonl}.

The graded run — every submission, explorable

All 15 real submissions from run-1781958744615. Faulted rows expand to the agent's diagnosis, triage, and full reasoning trace; landed rows show the commitment deltas drawn to scale.

Loading the run…

Four payloads the baseline collapses to one verdict drew four distinct diagnoses. The two recoverable injections — expired blockhash (aged 200 slots, conf. 0.99) and compute exceeded (cu_limit=1, conf. 0.99) — were triaged and landed on attempt 2 (slots 427724252, 427724375). Every decision carried a non-empty reasoning trace.

The three required questions, from this run

Q1 · What does the processed→confirmed delta tell you?

Vote-aggregation latency (≥⅔ stake voting) — consensus health, not inclusion speed. This run: 87–272 ms, median 123 ms. The next hop, confirmed→finalized, took ~12.2 s (rooting), two orders of magnitude larger.

Q2 · Why never use a finalized blockhash for a time-sensitive tx?

A blockhash is valid only ~150 slots (~60–90 s); a finalized one is already ~31 slots old on receipt — ~20% of the window burned before you submit. Shown directly: a blockhash aged 200 slots was rejected with BlockhashNotFound; recovery needed a fresh one.

Q3 · What if the Jito leader skips their slot?

The bundle is slot-specific and atomic — not included, not auto-forwarded, and no tip charged (tips pay only on inclusion). Resubmit to the next leader window with a fresh blockhash. All 6 faulted bundles here were sent free; the recoverable two landed on resubmission.

Appendix

Decision record

Full context and consequences live in the repository under docs/adr/.

#	Decision
0001	Two-runtime split: Rust Core + TypeScript Agent over HTTP
0002	Run the real path on mainnet, not devnet
0003	Agent owns Failure Reasoning (superseded by 0012)
0004	Confirmation via Yellowstone streams; bundle-status RPC is cross-check only
0005	Dynamic tips from the tip floor; Core sets base, Agent adjusts on failure
0006	Model access via OpenRouter, not a single-vendor SDK
0007	Jito bundles are the scored path; Helius Sender is a backstop
0008	Leader-window timing via a minimal gRPC searcher client
0009	Resilient subscriptions: bounded-channel backpressure + reconnect
0010	Deterministic classification via preflight simulation (amended by 0011, 0012)
0011	The Run: single-session orchestrator, Run-ID-prefix keying
0012	Agent owns Failure Diagnosis over the unbounded program-error tail