How We Migrated AXIS to a Fully Independent Stack in One Week

AXIS ran on a single managed host. One vendor, one bill, one point of failure. Here is how we migrated to a multi-cloud primary + hot standby architecture — Vercel, Railway, Clerk, OpenAI, Resend — in one week, with full failover in under 2 minutes.

By Leonidas Esquire Williamson — March 19, 2026

When we launched AXIS in March 2026, the platform ran entirely on Manus managed hosting. That was the right call for a zero-to-one build: one vendor, one bill, no infrastructure decisions. But as AXIS moved from prototype to production — with real agents, real trust scores, and real API traffic — the constraints of a single managed host became a liability rather than a convenience.

This post documents the decisions we made, the architecture we landed on, and the runbook we built for failover. The migration took one week from first commit to production traffic. Here is how it happened.

---

Why We Migrated

The trigger was not a failure. It was a design review. We were planning the v1.2 release and asked a simple question: if Manus managed hosting went offline for four hours, what would happen to AXIS?

The answer was uncomfortable. Every layer of the stack — frontend, backend, database, authentication, LLM, and notifications — ran through a single vendor. There was no failover path. No hot standby. No way to redirect traffic. A four-hour outage would mean four hours of zero trust lookups, zero agent registrations, and zero API responses for every developer who had integrated AXIS into their pipeline.

For a platform whose entire value proposition is reliability infrastructure for AI agents, that was not acceptable.

---

The Architecture We Chose

We evaluated three approaches: (1) stay on managed hosting but add a cold standby, (2) migrate to a single independent cloud provider, or (3) build a multi-cloud primary + hot standby stack.

We chose option three. The reasoning was straightforward: a cold standby requires manual intervention to activate, which means a human has to be awake and available when the failure happens. A single independent provider trades one vendor dependency for another. A multi-cloud hot standby — where the standby is always running and always current — can fail over in under two minutes with a single DNS change.

The final architecture:

Layer	Primary	Hot Standby
Frontend	Vercel	Netlify
Backend	Railway (Node.js 22)	Render
Database	Railway MySQL	Same Railway MySQL
DNS & CDN	Cloudflare	—
Auth	Clerk	Clerk (shared)

The database is the one layer that is not duplicated. Both the primary and standby backends point to the same Railway MySQL instance. This was a deliberate choice: database replication adds significant operational complexity, and Railway MySQL's uptime SLA is sufficient for our current scale. If the database becomes a single point of failure at higher traffic volumes, we will add a read replica at that point.

---

The Migration Decisions

Why Vercel for the Frontend

The AXIS frontend is a React 19 + Vite static build. Vercel is the natural home for this: it integrates directly with GitHub, auto-deploys on every push to main, and handles CDN distribution globally. The vercel.json configuration is four lines:

```json

{

"buildCommand": "vite build",

"outputDirectory": "dist/public",

"rewrites": [{ "source": "/(.*)", "destination": "/index.html" }]

}

```

The SPA rewrite rule ensures that all routes — /trust-score, /blog/[slug], /directory — are served by index.html and handled by the React router client-side.

Why Railway for the Backend

The AXIS backend is an Express 4 + tRPC 11 server running on Node.js 22. Railway was chosen over Render, Fly.io, and AWS App Runner for three reasons: it deploys directly from GitHub with zero configuration, it runs Node.js 22 natively without a Dockerfile, and it includes a managed MySQL service in the same project network — which means the backend can connect to the database over Railway's private network using the internal hostname (mysql.railway.internal) rather than the public proxy.

Why Clerk for Authentication

The previous authentication system was Manus OAuth — a proprietary flow tied to the Manus platform. Replacing it with [Clerk](https://clerk.com) gave us three things the old system did not have: a self-service user profile page, webhook-based user sync (so the AXIS database stays current when users change their email or delete their account), and a hosted sign-in and sign-up UI that we did not have to build or maintain.

The migration required replacing the server-side JWT verification (previously using a Manus SDK) with Clerk's verifyToken function from @clerk/express, and replacing the frontend auth hooks with Clerk's React SDK. The tRPC context now extracts the Clerk session token from the Authorization: Bearer header on every request.

Why OpenAI for LLM

The previous LLM integration used the Manus Forge API — a proxy to an underlying model. Replacing it with the [OpenAI API](https://platform.openai.com) (GPT-4o) gave us direct access to the model, predictable pricing, and the ability to switch models without changing the integration layer. The invokeLLM helper in server/_core/llm.ts now uses the openai npm package and is a drop-in replacement for the previous helper.

Why Resend for Notifications

Owner notifications — alerts sent when new agents are registered, disputes are filed, or security events are triggered — previously used the Manus built-in notification service. [Resend](https://resend.com) was chosen as the replacement because it has a clean Node.js SDK, supports custom sender domains, and delivers reliably to inboxes rather than spam folders. Notifications now send from admin@axistrust.io, which is configured via Zoho Mail.

Why Cloudflare for DNS

Cloudflare was chosen for DNS for one reason: instant failover. Cloudflare's DNS TTL can be set as low as 60 seconds, which means a DNS-based failover from the primary stack to the hot standby takes at most 60 seconds to propagate globally. Cloudflare also provides DDoS protection and a CDN layer in front of the Vercel frontend at no additional cost.

---

The Failover Runbook

The hot standby (Netlify + Render) is always running and always current. Failing over requires two DNS changes in Cloudflare:

Update the `A` record for `axistrust.io` and `www.axistrust.io` to point to the Netlify IP.

Update the `CNAME` for `api.axistrust.io` to point to the Render backend URL.

With Cloudflare's 60-second TTL, full propagation takes under two minutes. The Render backend is pre-configured with all the same environment variables as the Railway backend, and both point to the same Railway MySQL database.

The runbook is documented in DEPLOYMENT.md in the repository root.

---

What We Removed

The migration was also an opportunity to remove features that were Manus-specific and had no independent equivalent:

Voice transcription — the Manus Whisper proxy was removed. If voice transcription is needed in a future feature, we will integrate the OpenAI Whisper API directly.

Image generation — the Manus ImageService proxy was removed. Future image generation will use the OpenAI DALL-E API or Replicate.

Google Maps proxy — the Manus Maps proxy was removed along with the `Map.tsx` component. No current AXIS feature requires maps.

---

The Numbers

The migration was completed in one week. The final state:

126 tests passing across unit, integration, and webhook verification suites

24 tables in Railway MySQL, all migrated from TiDB Serverless

66 rows imported (2 users, 22 agents, 21 T-Scores, 21 C-Scores)

0 TypeScript errors in the final build

Under 2 minutes estimated failover time with the hot standby

---

What Is Next

The infrastructure migration is complete, but the resilience work is not finished. The next steps are:

**Database read replica** — add a Railway MySQL read replica to eliminate the database as a single point of failure at higher traffic volumes.

**Automated failover** — replace the manual DNS change runbook with a Cloudflare Worker that monitors the primary backend health endpoint and updates DNS automatically on failure.

**Uptime monitoring** — add a public status page at `status.axistrust.io` showing real-time uptime for all four layers of the stack.

The AXIS [T-Score](/trust-score) measures behavioral reliability for AI agents. It is only credible if the platform that computes it is itself reliable. This migration was the first step toward making that credibility structural rather than aspirational.

See the [v1.2 release notes](/blog/axis-v1-2-infrastructure-migration) for the full technical changelog.