Global Low-Latency Data Access with a Hybrid Vercel Edge and Self-Hosted TiDB Architecture


We’re confronted with a core technical paradox: for a data-intensive application with a global user base, how do you deliver both blazing-fast frontend load times and exceptionally low database read latency? Deploying the frontend and a stateless API layer on Vercel’s global edge network is the obvious choice, but what about the data itself? Data has gravity; it tends to centralize. If the database is deployed in a single region, say us-east-1, every data request from a user in the Asia-Pacific region must traverse half the globe. The resulting network latency of tens to hundreds of milliseconds is enough to obliterate an otherwise meticulously optimized frontend experience.

Traditional solutions, like deploying read replicas across multiple cloud regions, introduce data synchronization delays and complex state management. While storage solutions within Vercel’s ecosystem (like Vercel Postgres) offer tight integration, they can hit bottlenecks when faced with specific requirements for global distributed transactions and horizontal scalability. We needed a database with true global distribution capabilities, yet we wanted to retain full control over our data storage to avoid vendor lock-in and optimize costs.

This is precisely why we opted for a hybrid architecture: leverage Vercel’s Edge Network for frontend and stateless compute distribution, while self-hosting a TiDB cluster on multi-cloud VMs as a unified, global data foundation, managed via Ansible. The challenge lies in gluing these two worlds together: a serverless, ephemeral compute environment and a stateful, long-lived distributed database.

Architectural Decisions: Why This Combination?

Before settling on our final approach, we evaluated two other paths.

Option A: Fully Serverless
Use Vercel with a globally distributed serverless database like FaunaDB or CockroachDB Serverless.

  • Pros: Minimal operational overhead, pay-as-you-go pricing, and seamless integration with the Vercel ecosystem.
  • Cons: Weaker control over SQL compatibility, transaction models, and query performance. For complex analytical queries or scenarios requiring fine-tuning, the black-box nature of a serverless database becomes an obstacle. More importantly, cost models can become unpredictable at scale, and data sovereignty is entirely handed over to a third party.

Option B: Fully Self-Hosted
Build our own Kubernetes clusters in multiple regions to deploy the frontend, backend services, and the database.

  • Pros: Ultimate control over the entire technology stack, allowing for deep customization and optimization.
  • Cons: Prohibitively high operational overhead. Maintaining a globally distributed K8s cluster, configuring CDNs, and handling edge routing is a massive engineering effort in itself. This contradicts our goal of focusing on business logic.

Final Choice: The Hybrid Architecture
This architecture is designed to leverage the best of both worlds. Vercel handles what it does best: global static asset distribution and edge function execution. We, in turn, take control of the complex and sensitive data layer. TiDB became our top choice for a self-hosted database due to its MySQL protocol compatibility and native support for horizontal scaling and distributed transactions. Ansible serves as our automation tool for deployment and configuration management, significantly reducing the complexity of maintaining this distributed cluster.

graph TD
    subgraph "User Devices Globally"
        User1[User - Asia]
        User2[User - Europe]
        User3[User - North America]
    end

    subgraph "Vercel Edge Network"
        Edge[Vercel CDN/Edge]
        Fn_Asia[Vercel Function - ap-east-1]
        Fn_EU[Vercel Function - fra1]
        Fn_US[Vercel Function - iad1]
    end

    subgraph "Self-Hosted Multi-Cloud TiDB Cluster (Managed by Ansible)"
        LB[HAProxy Load Balancer]
        TiDB_Asia[TiDB Server - AWS Singapore]
        TiDB_EU[TiDB Server - GCP Frankfurt]
        TiDB_US[TiDB Server - Azure Virginia]
        PD1[PD Server]
        PD2[PD Server]
        PD3[PD Server]
        TiKV1[TiKV Server - Region A]
        TiKV2[TiKV Server - Region B]
        TiKV3[TiKV Server - Region C]
    end

    User1 --> Edge
    User2 --> Edge
    User3 --> Edge

    Edge -->|Request| Fn_Asia
    Edge -->|Request| Fn_EU
    Edge -->|Request| Fn_US

    Fn_Asia -->|SQL Query| LB
    Fn_EU -->|SQL Query| LB
    Fn_US -->|SQL Query| LB

    LB --> TiDB_Asia
    LB --> TiDB_EU
    LB --> TiDB_US

    TiDB_Asia <--> PD1
    TiDB_EU <--> PD2
    TiDB_US <--> PD3

    TiDB_Asia <--> TiKV1
    TiDB_Asia <--> TiKV2
    TiDB_Asia <--> TiKV3

    TiDB_EU <--> TiKV1
    TiDB_EU <--> TiKV2
    TiDB_EU <--> TiKV3

    TiDB_US <--> TiKV1
    TiDB_US <--> TiKV2
    TiDB_US <--> TiKV3

The core of this architecture is that a user’s request is routed by Vercel to the nearest edge function. This function then connects to the global load balancer of our self-hosted TiDB cluster. Internally, TiDB uses the Raft protocol to ensure data consistency. Its compute layer (TiDB Server) is stateless and can be deployed nearby, while its storage layer (TiKV) handles data sharding and replica placement according to our defined policies.

Core Implementation: Gluing the Edge and the Core with Code

1. Ansible: Automating a High-Availability TiDB Cluster Deployment

Manually deploying a production-grade TiDB cluster is incredibly tedious and error-prone. We use Ansible to standardize this process. The key here is idempotence, ensuring the cluster converges to the desired state no matter how many times the playbook is run.

Below is a simplified Ansible Playbook structure for deploying core TiDB components on a three-node cluster.

inventory.ini file:

[tidb_servers]
192.168.1.10
192.168.1.11

[pd_servers]
192.168.1.10
192.168.1.11
192.168.1.12

[tikv_servers]
192.168.1.10
192.168.1.11
192.168.1.12

[monitoring_servers]
192.168.1.13

[all:vars]
ansible_user=admin
ansible_ssh_private_key_file=~/.ssh/id_rsa
# TiDB cluster version and deployment directory
tidb_version="v7.5.0"
deploy_dir="/opt/tidb-deploy"
data_dir="/opt/tidb-data"

deploy_tidb.yml Playbook:

- hosts: all
  become: true
  tasks:
    - name: Create system user for tidb
      user:
        name: tidb
        shell: /bin/bash
        state: present

    - name: Create deployment and data directories
      file:
        path: "{{ item }}"
        state: directory
        owner: tidb
        group: tidb
        mode: '0755'
      loop:
        - "{{ deploy_dir }}"
        - "{{ data_dir }}"

- hosts: pd_servers
  become: true
  tasks:
    - name: Deploy and configure PD server
      # In a real project, this would be handled using TiUP (TiDB's cluster manager)
      # or by directly managing systemd services.
      block:
        - name: Download TiUP
          get_url:
            url: "http://tiup-mirrors.pingcap.com/tiup-v1.12.0-linux-amd64.tar.gz"
            dest: "/tmp/tiup.tar.gz"
        
        - name: Install TiUP
          shell: "tar -zxvf /tmp/tiup.tar.gz -C /usr/local/bin && tiup cluster"
          args:
            warn: false # tiup cluster prints info to stderr, so we suppress warnings
        
        - name: Generate initial cluster configuration
          command: >
            su - tidb -c "tiup cluster deploy my-cluster {{ tidb_version }} ./topology.yaml --user root -i {{ ansible_ssh_private_key_file }}"
          # topology.yaml is a detailed file describing the cluster topology—a key part of a production setup.
          # This is a simplified example. In production, topology.yaml would be dynamically generated via a template module
          # and would only be run once.
          when: "'pd_servers[0]' == inventory_hostname"
          
- hosts: all
  become: true
  tasks:
    - name: Start the TiDB cluster
      # Ensure this only runs once on the control node.
      command: su - tidb -c "tiup cluster start my-cluster"
      when: "'pd_servers[0]' == inventory_hostname"

A common pitfall is network configuration. Vercel Functions have dynamic egress IPs, so you can’t simply whitelist them in your database firewall. The production-ready solutions are:

  1. Use Vercel’s “Secure Compute” feature to get static egress IPs.
  2. Set up a VPC on your self-hosted cloud and deploy a NAT gateway or proxy within it. The Vercel function then accesses the database through this proxy.
  3. The most direct, but slightly less secure, method is to allow access from all of Vercel’s IP ranges, which necessitates strict database-level authentication and TLS encryption. We chose option 2, using an HAProxy layer for load balancing and access control.

2. Vercel Function: Handling Database Connections Gracefully

The stateless and ephemeral nature of serverless environments poses a significant challenge for database connection management. Creating a new TCP connection to TiDB for every single request would generate immense overhead and could quickly exhaust the database’s connection limit.

We need a strategy to reuse connections across function invocations. In a Node.js environment, the database connection instance or connection pool can be cached in the module’s global scope.

api/data/[id].ts Backend API:

import { NextApiRequest, NextApiResponse } from 'next';
import mysql from 'serverless-mysql';

// Global variable; Vercel reuses this instance for "warm" function invocations.
// This configuration is critical for production.
let db_conn;

function getDbConnection() {
  if (db_conn && db_conn.quit) {
    // Check if the connection is still alive.
    // serverless-mysql handles this internally.
    return db_conn;
  }

  console.log('Initializing new database connection pool...');
  db_conn = mysql({
    config: {
      host: process.env.TIDB_HOST,
      port: parseInt(process.env.TIDB_PORT || '4000', 10),
      database: process.env.TIDB_DATABASE,
      user: process.env.TIDB_USER,
      password: process.env.TIDB_PASSWORD,
      ssl: {
        // Enforce TLS in production.
        rejectUnauthorized: true,
        // ca: fs.readFileSync('path/to/ca.pem').toString(),
      },
    },
    // Key configuration for serverless-mysql.
    library: require('mysql2'), // Use the more performant mysql2 driver.
    maxRetries: 3, // Number of retries on connection failure.
    connTimeout: 5000, // Connection timeout.
    onConnect: (conn) => {
      // Session variables can be set each time a connection is acquired.
      console.log('Database connection established.');
      conn.query("SET session tidb_snapshot = @@tidb_snapshot;");
    },
    onClose: () => {
        console.log('Database connection closed.');
    }
  });
  return db_conn;
}

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  const { id } = req.query;

  if (req.method !== 'GET') {
    return res.status(405).json({ message: 'Method Not Allowed' });
  }

  try {
    const db = getDbConnection();
    
    // Enable TiDB's Stale Read feature to read historical data from a few seconds ago from the nearest replica.
    // For read requests that aren't sensitive to real-time data, this can significantly reduce latency and load on TiKV.
    // This is a huge advantage of TiDB over traditional MySQL in distributed scenarios.
    await db.query('SET @@session.tidb_read_staleness = -5'); // Read data from 5 seconds ago

    const results = await db.query(
      'SELECT id, content, author, created_at FROM articles WHERE id = ? LIMIT 1',
      [id]
    );

    // Ensure the connection is explicitly ended (returned to the pool) before the function finishes.
    // serverless-mysql handles this automatically, but explicit calls are good practice.
    await db.end();

    if (results && results.length > 0) {
      res.status(200).json(results[0]);
    } else {
      res.status(404).json({ message: 'Article not found' });
    }
  } catch (error) {
    console.error('Database query failed:', error);
    // Obfuscate specific database errors to prevent information leakage.
    res.status(500).json({ message: 'An internal server error occurred.' });
  }
}

The core ideas in this code are:

  1. Connection Pool Reuse: db_conn is defined outside the handler, leveraging the Node.js module caching mechanism. As long as Vercel’s execution environment (a “warm” function) isn’t recycled, subsequent requests will reuse the initialized connection pool.
  2. Production-Grade Configuration: We use the serverless-mysql library, which is specifically optimized for connection management in serverless environments, including automatic reconnections and timeout handling.
  3. Leveraging TiDB Features: SET @@session.tidb_read_staleness = -5 is a powerful optimization. It tells TiDB that this query doesn’t require the absolute latest data and can be served from any replica with data from 5 seconds ago. This allows TiDB to intelligently serve the request from the lowest-latency replica, dramatically improving read performance for global users.
  4. Robust Error Handling: Database exceptions are caught, and a generic 500 error is returned to the client, while detailed logs are recorded on the server.

3. Jotai: Managing Distributed State on the Frontend

Frontend state management also becomes more complex. Since the backend data is globally distributed, the state observed by different users may have slight delays. Optimistic UI updates and handling eventual consistency become critical. Jotai’s atomic nature and asynchronous capabilities make it an excellent fit for this complexity.

We define an atom to fetch article data and use Jotai’s utilities to handle loading and error states.

store/articleAtoms.ts:

import { atom } from 'jotai';
import { loadable } from 'jotai/utils';

// Base atom to store the ID of the article currently being viewed.
export const articleIdAtom = atom<string | null>(null);

// Derived atom to asynchronously fetch article data.
// This is the core logic; it depends on articleIdAtom.
const articleDataFetchAtom = atom(async (get) => {
  const id = get(articleIdAtom);
  if (!id) {
    return null;
  }
  
  // Make the API request.
  const response = await fetch(`/api/data/${id}`);

  if (!response.ok) {
    if (response.status === 404) {
      throw new Error('NotFound');
    }
    throw new Error('Failed to fetch article data');
  }

  return response.json();
});

// Use Jotai's 'loadable' utility to gracefully handle async states.
// This atom doesn't throw; it wraps the state into an object like { state: 'loading' | 'hasData' | 'hasError', data?, error? }
export const articleAtom = loadable(articleDataFetchAtom);

Usage in a React component:

import { useAtom, useSetAtom } from 'jotai';
import { articleIdAtom, articleAtom } from '../store/articleAtoms';
import { useEffect } from 'react';

const ArticleViewer = ({ initialId }) => {
  const setArticleId = useSetAtom(articleIdAtom);
  const [articleState] = useAtom(articleAtom);

  // Set the initial ID when the component mounts.
  useEffect(() => {
    setArticleId(initialId);
  }, [initialId, setArticleId]);

  switch (articleState.state) {
    case 'loading':
      return <div>Loading article...</div>;
    case 'hasError':
      // Error handling can be more granular here.
      if (articleState.error.message === 'NotFound') {
        return <div>Article not found.</div>;
      }
      return <div>Error loading article. Please try again.</div>;
    case 'hasData':
      const article = articleState.data;
      if (!article) {
        return <div>Select an article to view.</div>;
      }
      return (
        <article>
          <h1>{article.title}</h1>
          <p>By {article.author}</p>
          <hr />
          <div>{article.content}</div>
        </article>
      );
    default:
      return <div>Select an article to view.</div>;
  }
};

Jotai’s strength lies in its fine-grained reactivity. A change in articleIdAtom only triggers a re-computation of its dependent, articleDataFetchAtom, without causing unnecessary re-renders across the application. The loadable utility abstracts away the boilerplate of asynchronous logic, allowing component code to focus purely on UI presentation—a crucial advantage when dealing with state from a complex backend.

Limitations and Future Iterations

While this hybrid architecture solves our core problem, it also introduces new complexities. First, the operational boundary becomes blurred. The frontend and backend/SRE teams must collaborate closely, as the performance of a Vercel Function is directly impacted by the health of the self-hosted database. Network latency is a law of physics; while Stale Reads can mitigate read latency, write operations still require synchronization across data centers via the Raft protocol. This write latency is a necessary trade-off.

Second, the cost model is more complex. We must account for Vercel’s function invocations and execution duration, alongside the multi-cloud costs for VMs, storage, and cross-region data transfer. Accurate cost attribution and optimization require a more mature observability stack.

Future optimization paths could focus on the following:

  1. Data Placement Strategy: Deeply leverage TiDB’s Placement Rules feature to co-locate the Raft Leader and replicas of data with high regional affinity (e.g., user profiles) in the corresponding region, achieving “data locality” at the database layer.
  2. Connection Optimization: Explore introducing a lightweight, globally distributed proxy layer (like Envoy) between the Vercel Edge Functions and the TiDB cluster. This layer could manage connection pooling and intelligent routing to further reduce connection overhead on cold starts.
  3. CQRS Introduction: For write-intensive applications, consider implementing the Command Query Responsibility Segregation (CQRS) pattern. Write operations could be processed asynchronously via a message queue, while Vercel Functions would only read from materialized views, achieving ultimate read/write separation and performance.

  TOC