Synapptech.DEV

How We Scaled Our Next.js App to 3,000 Concurrent Users—And Beyond

Greetings from our Application Architect

• June 12, 2025

Intro

Like any modern tech team, we want our apps to not just survive, but thrive under real-world spikes in user traffic. Recently, we put our Next.js app to the ultimate stress test, simulating thousands of simultaneous users to measure, optimize, and—honestly—flex our scaling capabilities.

Here’s how we took our Next.js site from struggling with hundreds of users to confidently serving over 1,300 requests per second with nearly zero errors at a concurrency level that would humble most production workloads.

The Journey: From Bottlenecks to Bragging Rights

Round 1: The Humble Beginnings

We kicked off with a k6 spike test simulating 1,000 virtual users (VUs). The results were…let’s just say educational:

HTTP failure rate: 32.5%
Median response time: 378ms
Throughput: 207 requests/sec

Not bad for a start, but clearly not enough to handle serious load. Most of the errors were timeouts or server resource exhaustion. We had to do better.

Step-by-Step Improvements

Kubernetes HPA and Resource Tuning
First, we moved from a single static replica to Kubernetes Horizontal Pod Autoscaling (HPA), letting K8s scale our deployment up automatically during heavy traffic.
We also increased CPU and memory limits to give each pod breathing room:
```
resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi"
```
Optimized Container Startup and Probes
We set imagePullPolicy: IfNotPresent to avoid unnecessary image pulls and tweaked our liveness/readiness probes for faster failover and rolling updates.
Observability and Iterative Tuning
We used kubectl top pods and Grafana dashboards to watch resource usage and tweak as we scaled, avoiding OOM kills and excessive restarts.

Round 2: The Payoff

With those changes, our app handled more load, faster, with far fewer errors:

Throughput: 486 requests/sec
HTTP failure rate: 0%
Median response time: 31ms

But we didn’t stop there.

Final Boss: 3,000 Virtual Users

To really flex, we set k6 to simulate 3,000 concurrent users in a 1-minute spike.

Total HTTP requests: 81,452
Peak requests/sec: 1,339
Median latency: 55ms
95th percentile latency: 348ms
Success rate: 99.94%
Total data served: 1.1GB in 1 minute

Out of over 162,000 checks, only 90 failed—a failure rate of 0.05%. And most users saw page loads in well under 100ms.

What Made the Difference

Kubernetes HPA: Seamless scaling up and down with traffic spikes.
Right-sized resources: Prevented pod OOM and throttling.
Efficient container config: Fast restarts, smart probe timing, and avoiding image re-pulls.
Observability: Watching real metrics, not guessing.

What’s Next?

At this point, further gains require deep backend/database optimization, global distribution, and more advanced edge protections (WAF, CDN, rate limiting) for real-world traffic and security.

But for now? We’re way beyond the basics.
We built something that can take a punch—and keep on serving.

News