Evolving a High-Availability PHP Application on GKE From Synchronous Blocking to an Elastic Pub/Sub-Based Workflow


A request’s lifecycle shouldn’t exceed 500ms, but a core financial report generation API in our system was averaging 30 seconds. This endpoint was burdened with synchronous data aggregation, calculations, and PDF generation, making it a major bottleneck for our monolithic PHP application. During peak hours, the PHP-FPM process pool would become saturated, causing a cascade of 504 gateway timeouts for completely unrelated services. This was the primary technical debt our team had to tackle two sprints ago.

The initial proposed solution was simple and crude: vertical scaling with more powerful cloud servers. But this was a band-aid, not a cure. Costs increased linearly while performance gains saw diminishing returns. The root of the problem was the synchronous, blocking design pattern. Any time-consuming operation had to be decoupled and processed asynchronously.

Our stack was a Laravel-based PHP application deployed on traditional virtual machines. The team had already committed to a cloud-native migration, with Google Kubernetes Engine (GKE) as the target platform. Therefore, the solution had to align with this strategic direction. We needed more than a simple background job queue; we required a message-driven architecture that could seamlessly integrate with the GKE ecosystem and provide high availability and elastic scaling.

In our technology selection meeting, we compared several options:

  1. Redis + Supervisor: Easy to implement using our existing Redis instance. The problem, however, was that Redis Pub/Sub’s “fire-and-forget” model doesn’t guarantee message delivery. Using Lists as a queue meant we’d have to roll our own complex logic for retries, dead-lettering, and consumer management. Furthermore, managing processes with Supervisor isn’t “cloud-native” and couldn’t leverage K8s’ self-healing and scheduling capabilities.
  2. RabbitMQ: Powerful and supports various exchange types and message persistence. However, deploying and maintaining a highly available RabbitMQ cluster on GKE is a complex engineering task in its own right, requiring specialized operational knowledge.
  3. Google Cloud Pub/Sub: As a managed GCP service, it natively offers extreme availability and virtually infinite horizontal scaling. Its integration with GKE’s IAM and monitoring (Cloud Monitoring) is seamless. Its “at-least-once” delivery guarantee, message acknowledgement mechanism (ack/nack), dead-lettering, and subscriber pull/push models were a perfect match for our requirements. We chose it because it minimized operational complexity, allowing the team to focus on implementing business logic.

The entire refactoring effort was split into two two-week sprints. The goal of the first sprint was to develop the core message publishing and consumption logic and get it running in a local Docker environment. The second sprint focused on GKE deployment, elastic scaling configuration, and end-to-end stress testing.

Phase 1: Decoupling the Monolith, Implementing Message Production & Consumption

Our entry point was the controller method that blocked for 30 seconds. The goal was to refactor it to simply publish a “report generation” event and immediately return a 202 Accepted response.

1. Implementing the Publisher

We integrated the google/cloud-pubsub library into our existing Laravel application. To simplify management and configuration, we created a dedicated service class, ReportPublisherService.

<?php

namespace App\Services\GoogleCloud;

use Google\Cloud\PubSub\PubSubClient;
use Google\Cloud\PubSub\Topic;
use Psr\Log\LoggerInterface;
use Throwable;

class ReportPublisherService
{
    private PubSubClient $pubSubClient;
    private LoggerInterface $logger;
    private ?Topic $topic = null;
    private string $topicName;

    public function __construct(LoggerInterface $logger)
    {
        // In a production environment, projectId and keyFilePath should be injected via environment variables.
        // On GKE, if Workload Identity is configured, a keyFile is not needed as the SDK automatically acquires credentials.
        $this->pubSubClient = new PubSubClient([
            'projectId' => getenv('GCP_PROJECT_ID'),
            // 'keyFilePath' => storage_path('gcp-credentials.json')
        ]);
        $this->logger = $logger;
        $this->topicName = getenv('GCP_PUB_SUB_REPORT_TOPIC');
    }

    private function getTopic(): Topic
    {
        if ($this->topic === null) {
            $this->topic = $this->pubSubClient->topic($this->topicName);
            // Check if the Topic exists, create it if not, for added robustness.
            if (!$this->topic->exists()) {
                $this->topic = $this->pubSubClient->createTopic($this->topicName);
            }
        }
        return $this->topic;
    }

    /**
     * Publishes a report generation job.
     *
     * @param int $reportId
     * @param array $params
     * @param string $traceId
     * @return bool
     */
    public function publishReportJob(int $reportId, array $params, string $traceId): bool
    {
        $data = json_encode([
            'report_id' => $reportId,
            'parameters' => $params,
            'timestamp' => microtime(true),
        ]);

        // Attributes are crucial for message filtering and tracing.
        $attributes = [
            'eventType' => 'ReportGenerationRequested',
            'traceId' => $traceId,
            'source' => 'api-monolith'
        ];

        try {
            $topic = $this->getTopic();
            $topic->publish([
                'data' => $data,
                'attributes' => $attributes,
            ]);
            $this->logger->info('Report job published successfully.', ['report_id' => $reportId, 'trace_id' => $traceId]);
            return true;
        } catch (Throwable $e) {
            // Production-grade error handling: must log detailed context.
            $this->logger->error('Failed to publish report job to Pub/Sub.', [
                'exception_message' => $e->getMessage(),
                'exception_trace' => $e->getTraceAsString(),
                'report_id' => $reportId,
                'trace_id' => $traceId,
            ]);
            return false;
        }
    }
}

In the controller, the long-running process was replaced with a single method call, and the API response time plummeted from 30 seconds to just 150 milliseconds.

2. Building the Subscriber Worker

This was the core of the refactor. It’s no longer a PHP-FPM process triggered by an HTTP request, but an independent, long-running PHP CLI process dedicated to pulling and processing messages from a Pub/Sub subscription. We chose to write a vanilla PHP script without a heavyweight framework to ensure minimal memory footprint and the fastest possible startup time.

This worker had to solve several critical production challenges:

  • Graceful Shutdown: When GKE scales down or updates a Pod, it sends a SIGTERM signal. The worker must trap this signal, stop pulling new messages, and wait for the current message to finish processing before exiting. Otherwise, in-flight messages could be lost or processed twice.
  • Message Acknowledgement and Retries: Pub/Sub requires the consumer to send an ack (acknowledge) signal after successfully processing a message. If it doesn’t, the message will be redelivered after a timeout. If processing fails, it should send a nack (negative acknowledgement) to have Pub/Sub redeliver it immediately.
  • Memory Management: As a long-running process, it must be resilient to memory leaks. A common strategy is to periodically check memory usage within the loop and perform a graceful restart if necessary.
  • Idempotency: Because Pub/Sub guarantees “at-least-once” delivery, the consumer logic must be idempotent. This means processing the same message multiple times must produce the same result as processing it just once.

Here is the core implementation of the worker, worker.php:

<?php

require __DIR__ . '/vendor/autoload.php';

use Google\Cloud\PubSub\PubSubClient;
use Google\Cloud\PubSub\Subscription;
use Psr\Log\LoggerInterface;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;

// --- Configuration & Initialization ---
$projectId = getenv('GCP_PROJECT_ID');
$subscriptionName = getenv('GCP_PUB_SUB_REPORT_SUBSCRIPTION');
$maxMessages = (int)getenv('MAX_MESSAGES_PER_PULL') ?: 10;
$maxProcessingTime = (int)getenv('MAX_PROCESSING_TIME_SECONDS') ?: 300; // Max processing time for a single message

$logger = new Logger('pubsub-worker');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::INFO));

$pubSubClient = new PubSubClient(['projectId' => $projectId]);
$subscription = $pubSubClient->subscription($subscriptionName);

// --- Graceful Shutdown Handler ---
$shutdown = false;
pcntl_async_signals(true);
pcntl_signal(SIGTERM, function ($signo) use (&$shutdown, $logger) {
    $logger->info("Caught SIGTERM. Shutting down gracefully...");
    $shutdown = true;
});
pcntl_signal(SIGINT, function ($signo) use (&$shutdown, $logger) {
    $logger->info("Caught SIGINT. Shutting down gracefully...");
    $shutdown = true;
});

$logger->info("Worker started. Listening for messages on subscription: {$subscriptionName}");

// --- Main Loop ---
while (!$shutdown) {
    try {
        $messages = $subscription->pull(['maxMessages' => $maxMessages, 'returnImmediately' => false]);

        if (empty($messages)) {
            // Long poll timed out, continue to the next iteration.
            // This is normal behavior. Add a short sleep to avoid CPU-intensive empty polling.
            usleep(500000); // 0.5 seconds
            continue;
        }

        $ackIds = [];
        $nackIds = [];

        foreach ($messages as $message) {
            $startTime = microtime(true);
            $messageData = json_decode($message->data(), true);
            $attributes = $message->attributes();
            $traceId = $attributes['traceId'] ?? 'unknown';

            $logger->info("Processing message.", ['message_id' => $message->id(), 'trace_id' => $traceId]);

            try {
                // --- Core Business Logic ---
                // This is where the actual report generation service is called.
                // Idempotency checks are a must, e.g., via messageId or a business ID.
                $isSuccess = processReport($messageData, $logger);

                if ($isSuccess) {
                    $ackIds[] = $message->ackId();
                    $duration = microtime(true) - $startTime;
                    $logger->info("Message processed and acknowledged.", ['message_id' => $message->id(), 'trace_id' => $traceId, 'duration_ms' => $duration * 1000]);
                } else {
                    $nackIds[] = $message->ackId();
                    $logger->warning("Message processing failed, sending nack.", ['message_id' => $message->id(), 'trace_id' => $traceId]);
                }
            } catch (Throwable $e) {
                $nackIds[] = $message->ackId();
                $logger->error("Exception during message processing, sending nack.", [
                    'message_id' => $message->id(),
                    'trace_id' => $traceId,
                    'exception' => $e->getMessage()
                ]);
            }
        }

        // Batch acknowledgements and nacks are more efficient.
        if (!empty($ackIds)) {
            $subscription->acknowledgeBatch($ackIds);
        }
        if (!empty($nackIds)) {
            $subscription->modifyAckDeadlineBatch($nackIds, 0); // A nack is equivalent to setting the ack deadline to 0, forcing an immediate redelivery.
        }

    } catch (Throwable $e) {
        // Exception while pulling messages, e.g., permissions issue.
        $logger->critical("Error pulling messages from Pub/Sub. Worker might be unhealthy.", [
            'exception_message' => $e->getMessage(),
            'exception_trace' => $e->getTraceAsString(),
        ]);
        // On critical error, wait a bit before retrying to avoid a death loop.
        sleep(5);
    }
}

$logger->info("Worker shutdown complete.");

/**
 * Mock business logic function.
 * @param array $data
 * @param LoggerInterface $logger
 * @return bool
 */
function processReport(array $data, LoggerInterface $logger): bool
{
    // Pseudocode:
    // 1. Check for idempotency: query DB or cache to see if report_id is already processed or in-progress.
    //    $lock = acquireLock('report_processing_' . $data['report_id']);
    //    if (!$lock) { return true; /* Considered success, as another process is handling it */ }
    //
    // 2. Execute core logic: aggregate data, generate PDF, etc.
    //    sleep(15); // Simulate time-consuming task
    //
    // 3. Update report status.
    //    updateReportStatus($data['report_id'], 'completed');
    //
    // 4. Release lock.
    //    releaseLock('report_processing_' . $data['report_id']);

    $logger->info("Simulating report generation for report_id: " . ($data['report_id'] ?? 'N/A'));
    // Simulate a random failure to test the nack logic.
    if (rand(1, 10) > 8) {
        $logger->warning("Simulated processing failure.");
        return false;
    }

    return true;
}

Phase 2: GKE Deployment & Elastic Scaling

With local validation complete, we moved to the cloud-native deployment phase. The focus here was on configuring Kubernetes resources to run our PHP worker reliably and efficiently, allowing it to scale automatically based on workload.

1. Containerizing the Worker

We created a dedicated Dockerfile for the worker, following multi-stage build best practices to minimize the final image size and enhance security.

# --- Build Stage ---
FROM php:8.1-cli-alpine AS builder
WORKDIR /app

# Install system dependencies for extensions
RUN apk add --no-cache \
    $PHPIZE_DEPS \
    libzip-dev \
    zlib-dev

# Install PHP extensions required by google/cloud-pubsub and others
RUN docker-php-ext-install pcntl zip
RUN pecl install grpc && docker-php-ext-enable grpc
RUN pecl install protobuf && docker-php-ext-enable protobuf

# Install Composer
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

COPY composer.json composer.lock ./
RUN composer install --no-dev --no-interaction --no-scripts --optimize-autoloader

COPY . .

# --- Final Stage ---
FROM php:8.1-cli-alpine
WORKDIR /app

# Create a non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Copy only necessary files from the builder stage
COPY --from=builder /app/vendor ./vendor
COPY --from=builder /app/worker.php ./worker.php
COPY --from=builder /usr/local/etc/php/conf.d/docker-php-ext-grpc.ini /usr/local/etc/php/conf.d/
COPY --from=builder /usr/local/etc/php/conf.d/docker-php-ext-protobuf.ini /usr/local/etc/php/conf.d/
COPY --from=builder /usr/local/etc/php/conf.d/docker-php-ext-pcntl.ini /usr/local/etc/php/conf.d/
COPY --from=builder /usr/local/etc/php/conf.d/docker-php-ext-zip.ini /usr/local/etc/php/conf.d/

# Command to run the worker
CMD ["php", "worker.php"]

2. Kubernetes Deployment Configuration

The Deployment is the core of our setup. Here, we defined the Pod template, including the container image, resource requests/limits, environment variables, and the crucial liveness and readiness probes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-report-worker
  labels:
    app: php-report-worker
spec:
  replicas: 2 # Initial replica count
  selector:
    matchLabels:
      app: php-report-worker
  template:
    metadata:
      labels:
        app: php-report-worker
    spec:
      # Use Workload Identity for secure access to GCP services instead of service account keys
      serviceAccountName: pubsub-worker-sa 
      terminationGracePeriodSeconds: 360 # Give enough time to finish processing current messages before shutdown
      containers:
      - name: worker
        image: gcr.io/your-project-id/php-report-worker:v1.0.0
        env:
        - name: GCP_PROJECT_ID
          value: "your-project-id"
        - name: GCP_PUB_SUB_REPORT_TOPIC
          value: "report-generation-topic"
        - name: GCP_PUB_SUB_REPORT_SUBSCRIPTION
          value: "report-generation-subscription"
        - name: MAX_MESSAGES_PER_PULL
          value: "10"
        - name: MAX_PROCESSING_TIME_SECONDS
          value: "300"
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        # Liveness probe: K8s will restart the Pod if the script crashes
        livenessProbe:
          exec:
            command:
            - pidof
            - -x
            - php
          initialDelaySeconds: 30
          periodSeconds: 60
        # Readiness probe: Pod needs time to connect to Pub/Sub after startup; shouldn't receive traffic (good practice, even for pull-based workers)
        # For a pull-based worker, a simple file existence check is sufficient
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy # This file can be created by worker.php on startup
          initialDelaySeconds: 15
          periodSeconds: 30

The terminationGracePeriodSeconds setting is particularly crucial. It must be longer than the maximum processing time for a single message to ensure that after a SIGTERM signal is issued, the pod has enough time to complete its graceful shutdown logic.

3. Configuring Elastic Scaling (HPA)

This is the most valuable part of the cloud-native solution. We want the number of worker processes to adjust automatically based on the number of backlogged messages. When there are no tasks, we maintain a small number of replicas; when tasks surge, we scale up rapidly to clear the queue. This is achieved with a HorizontalPodAutoscaler (HPA) and a custom metric from Cloud Monitoring.

First, we need a metric that exposes the Pub/Sub subscription’s message backlog. GCP Cloud Monitoring provides pubsub.googleapis.com/subscription/num_undelivered_messages. We need to configure GKE to use this external metric.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-report-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-report-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: pubsub.googleapis.com|subscription|num_undelivered_messages
        selector:
          matchLabels:
            # These labels must exactly match the resource labels in Cloud Monitoring
            resource.labels.subscription_id: "report-generation-subscription" 
      # target.averageValue specifies the desired average number of messages per Pod
      # The HPA will trigger a scale-up when the backlog exceeds 10 * current Pod count
      target:
        type: AverageValue
        averageValue: "10"

This configuration tells the HPA to continuously monitor the number of undelivered messages in the report-generation-subscription. It will adjust the replica count of the php-report-worker deployment to try and maintain an average of 10 backlogged messages per pod. For example, when the backlog reaches 100 messages, the HPA will attempt to scale the number of pods to 10. As the backlog decreases, it will scale back down to a minimum of 2 replicas, achieving a balance between cost and efficiency.

Architecture Blueprint

The entire workflow can be visualized as follows:

sequenceDiagram
    participant User
    participant Monolith API (GKE)
    participant Pub/Sub Topic
    participant Pub/Sub Subscription
    participant Worker Pods (GKE HPA)

    User->>+Monolith API (GKE): POST /api/reports (Request report generation)
    Monolith API (GKE)->>+Pub/Sub Topic: Publish(message)
    Pub/Sub Topic-->>-Monolith API (GKE): Publish Acknowledged
    Monolith API (GKE)-->>-User: HTTP 202 Accepted
    
    Pub/Sub Topic->>Pub/Sub Subscription: Forward Message
    
    Note right of Worker Pods (GKE HPA): HPA monitors message backlog and scales pods dynamically
    
    loop Pull & Process
        Worker Pods (GKE HPA)->>+Pub/Sub Subscription: Pull(maxMessages=10)
        Pub/Sub Subscription-->>-Worker Pods (GKE HPA): Return Messages
        Worker Pods (GKE HPA)->>Worker Pods (GKE HPA): Process Message (Idempotent)
        Worker Pods (GKE HPA)->>+Pub/Sub Subscription: Acknowledge(messageId)
        Pub/Sub Subscription-->>-Worker Pods (GKE HPA): Ack Confirmed
    end

After two sprints of focused effort, we successfully transformed a critical, system-blocking synchronous task into a high-availability, auto-scaling asynchronous workflow. The system’s overall responsiveness and stability saw a quantum leap. In subsequent stress tests, even with a sudden influx of tens of thousands of report requests, the API endpoint maintained a response time under 200ms. The worker deployment, managed by the HPA, scaled out to 50 replicas within minutes, smoothly processed the entire backlog, and then scaled back down to the minimum replica count automatically.

This solution isn’t without its limitations. Logging and tracing are currently fragmented, making it difficult to trace a single request across the API and worker. The next step is to introduce OpenTelemetry for distributed tracing. Additionally, while the dead-letter queue is configured, we still need to build automated alerting and reprocessing mechanisms for messages that ultimately fail. This architecture provides a reliable blueprint for evolving PHP applications in the cloud-native era, proving that even traditional PHP stacks, when paired with modern architectural design, are more than capable of handling large-scale, high-concurrency cloud workloads.


  TOC