Evaluating an AI runtime that interacts with wallets and on-chain actions benefits from clarity on the underlying providers, safety boundaries, and operational model. This post documents the current stack, how transaction proposals are handled, deployment practices, and the design choices that shape reliability and trust.
Provider transparency
The provider names and high-level responsibilities are published because they are not the sensitive elements of the system. Credentials, internal configurations, rate limits, queue details, and operational secrets remain private. The published information is intended to help builders evaluate whether the architecture and dependencies align with their requirements for production workloads.
Provider choices
| Layer | Provider | Why it is used |
|---|---|---|
| Primary inference | Cerebras with gpt-oss-120b | Very fast open-weight inference for agent loops where latency matters. The target path is high-throughput, 3,000+ token/sec class generation. |
| Inference fallback | OpenAI with gpt-4.1-mini and priority service tier | A separate fallback path when the primary model is unavailable, degraded, or unsuitable for a request. |
| Embeddings | OpenAI text-embedding-3-small | Compact, reliable embeddings for memory, retrieval, and semantic search. |
| Memory and vector search | MongoDB Atlas auto-scaling cluster with Vector Search and TTL | Persistent memory, indexed retrieval, and automatic expiry for data that should not live forever. |
| Queues and background work | Upstash Redis | Durable queueing, async jobs, retries, and backpressure without turning the app into queue infrastructure. |
| Solana RPC | Helius RPC | High-quality Solana RPC access for reads, transaction preparation, and chain state. |
| Swaps and orders | Jupiter API | Best-route swap infrastructure and trading primitives Solana builders already trust. |
| Lending and yield | Kamino API | Lending, borrow, repay, vault, and position workflows without custom protocol glue. |
| Market data | Birdeye API | Prices, token data, wallet analytics, charts, search, and technical signals. |
| Web and X search | Grok API (xAI) | Real-time web search and X (Twitter) search for current events, sentiment, and external context that agents can ground responses in. |
| Hosted self-custody wallet | Privy API | Embedded wallet infrastructure so apps do not manage raw private keys or seed phrases. |
| Edge and traffic protection | Cloudflare | TLS, caching, DNS, and basic edge protection. |
| Usage settlement | PayAI x402 Merchant Provider | USDC-based x402 payments for usage settlement. |
| API runtime | Latest stable CPython with FastAPI | Inspectable, production-friendly Python API surface. |
| Deployment | OVH Cloud VPS with Dokku | Simple, owned deployment with zero-downtime deploys and operational control. |
| SDKs | Python Client SDK, with TypeScript and Rust SDKs planned | Open-source client surfaces so builders can inspect integration code instead of treating the API as a black box. |
The runtime integrates established services for inference, data persistence, execution, and infrastructure. The integration layer handles memory, tool orchestration, wallet context, transaction preparation, usage-based billing, reporting, and safety checks.
Model uncertainty and transaction safety
AI models can misinterpret instructions, hallucinate values, or generate unsafe plans. The architecture assumes this possibility and does not grant models direct signing authority.
The model generates intent. The runtime maps that intent to structured, typed tool calls. Transaction tools are further constrained by supported actions, live provider data (quotes, balances, routes), wallet state, configured limits, and explicit approval steps.
Model output never directly produces a signed transaction.
Transaction safety model
Solana Agent is designed around layered checks:
- Structured tool calls. The agent uses typed tools for swaps, orders, lending, market data, wallet actions, memory, and reporting instead of free-form transaction generation.
- Provider-backed preparation. Jupiter, Kamino, Birdeye, Helius, and Privy are used for known execution surfaces rather than ad hoc transaction construction everywhere.
- Explicit transaction boundaries. Supported flows should expose the action, wallet, token, amount, route, fee behavior, and expected result before execution.
- Hosted self-custody wallet model. The app gets a hosted self-custody wallet surface without handling seed phrases or raw private keys in the client app.
- Gasless accounting. Network-fee coverage is tracked so operators can see gasless savings and fee-payer cost.
- Wallet-scoped reporting. Spend, usage, provider requests, gasless savings, and forecasts are tied back to the wallet that actually ran the work.
- Idempotency and queueing. Background jobs and retries run through queue infrastructure so repeated requests can be controlled rather than duplicated blindly.
- Test coverage. The API is built on the latest stable CPython and FastAPI with 100% test coverage enforced in GitHub CI.
This approach does not eliminate blockchain or smart-contract risk. It reduces the number of custom systems a builder must design, implement, test, and operate to support agent-driven on-chain actions.
Attack vectors and mitigations
| Risk | What can go wrong | Mitigation |
|---|---|---|
| Prompt injection | A page, user, or retrieved context tries to override the agent’s instructions. | Keep execution behind structured tools, supported action lists, and policy checks. Treat retrieved text as context, not authority. |
| Model hallucination | The model invents a token, route, balance, or action. | Require provider-backed market data, quotes, wallet state, and typed action parameters before transaction preparation. |
| Unsafe transaction intent | The user asks for something ambiguous or risky. | Prefer clarification, explicit action summaries, configured limits, and reject unsupported actions. |
| Provider outage | A model, RPC, market data, or DeFi provider degrades. | Use fallback inference, queue backpressure, retries, provider-level errors, and visible failure states instead of silent execution. |
| Duplicate execution | Retries or refreshes submit the same work twice. | Use idempotency, queue state, and job tracking for background execution paths. |
| Cost abuse | A user or agent loop burns tokens, provider requests, or gas sponsorship. | Use usage-based accounting, wallet-scoped reporting, starter allowance visibility, thresholds, forecasts, and provider-level pricing. |
| Wallet compromise | A client app mishandles keys or exposes secret material. | Do not ask for seed phrases. Use Privy for the hosted self-custody wallet layer and keep private key handling out of the client app. |
| Data retention drift | Memory or logs live longer than intended. | Use MongoDB Atlas TTL-backed storage for expiring memory and retention-bound data. |
Uptime and deployments
A public SLA is not published because the service depends on multiple upstream providers. The operating model instead emphasizes:
- Zero-downtime deploys via Dokku release flow.
- Cloudflare for edge delivery and basic protection.
- MongoDB Atlas with auto-scaling for data and vector search.
- Upstash Redis for durable queues, retries, and backpressure.
- Inference fallback paths.
- Explicit error handling for provider-dependent actions.
- Detailed usage and cost reporting for post-execution review.
Teams requiring contractual uptime commitments should discuss terms directly.
Scaling strategy
The API remains a FastAPI/CPython service. Stateless handlers scale horizontally. Background and retryable work is managed through Redis queues. Persistent memory and vector search use MongoDB Atlas auto-scaling with TTL policies. Edge traffic is handled by Cloudflare. Inference uses a primary-plus-fallback configuration. Solana execution is delegated to specialized providers.
Operational effort therefore centers on capacity planning, queue management, provider limits, observability, and cost control rather than repeated backend re-architecture.
Build versus integrate
Teams with specific custody, protocol, or provider requirements may prefer to build equivalent functionality themselves.
Solana Agent provides a pre-integrated runtime that combines inference routing, hosted self-custody wallet context, tool execution for swaps/lending/market data, x402 usage settlement, and wallet-scoped reporting. Open-source SDKs and this architecture description allow inspection of the integration points.
The design goal is to let builders focus on agent behavior and application logic instead of assembling and maintaining the supporting infrastructure.
Further reading
- Reasoning That Holds Up — What ARC-AGI research taught us about reliable execution
- Conversational Cognition Metrics — 77.6% LongMemEval and why multi-turn performance matters