System Design for QA: What You’ll Be Asked in Interviews (and How to Not Bomb It)

Everything you need to understand about system architecture to stop testing blindly

System Design

Hi! I’m Atajan, QA engineer with 5+ years in testing. Over the past year I went through about twenty interviews for Senior QA positions, and here’s what I noticed: System Design questions started showing up almost everywhere. Used to be only developers got these, but now they expect testers to understand how the system works under the hood too.

And they don’t ask you to “design Twitter” (that’s still for backend engineers). It’s more like: “We have microservices, the database replicates, there’s a cache and a message queue - what would you test?” If you just stare blankly at that point… well, you’re probably not getting the offer.

I wrote this for people who haven’t really dealt with System Design before. No prior knowledge needed - I’ll explain everything from scratch. If you already know what a load balancer is, some parts will be boring, but maybe you’ll find something new in the queues or monitoring sections.

Okay, But Why Should QA Even Bother?

Honest answer: because without understanding the architecture you’re testing blind.

Here’s a real situation. A tester ran all tests on staging, everything green, deploy to production. An hour later the site is down. Turns out staging had one server, but production had three behind a load balancer, and under certain circumstances user requests hit different servers while the session was stored in memory on just one. You can’t catch that with tests if you don’t even know load balancers and sticky sessions exist.

Or this: QA tests an API, everything works. Then it turns out the responses were coming from cache, and the actual data in the database was completely different. If you don’t know Redis sits between the client and the database, you won’t even know where to look.

Bottom line - System Design for QA isn’t about designing systems. It’s about understanding where they can break.

Load Balancer

Let’s start simple. You have a web app running on a server. One server handles, say, 500 requests per second. 2000 users show up at once - the server chokes, site goes down.

Solution: set up multiple servers and put something in front of them that distributes requests. That’s the load balancer. Basically a dispatcher. Sort of like the reception desk at a hospital - you don’t pick which doctor to see, they assign you. The analogy is wonky because hospitals assign by specialty and this is more about workload, but you get the idea.

There are several ways to distribute requests:

Round Robin - take turns: first request to server 1, second to server 2, third to server 3, fourth back to 1. Dumb, but works.
Least Connections - to whichever server has the fewest active connections. Makes more sense, but slightly more complex.
IP Hash - the same user always lands on the same server. Useful if sessions are stored locally.

What do you test here? Plenty. The most obvious - kill a server and see what happens. Does the user see an error? Or does the balancer silently switch them to another server? And if the user was logged in on the server that died - is the session gone? Did they get kicked out of their account?

Another thing: health checks. The balancer needs to know a server is dead and stop sending it requests. But how fast? If it checks every 30 seconds, then for half a minute some users will be getting errors. If you mention this in an interview, that’s a plus.

Caching

Cache is the thing that regularly ruins testers’ lives. You update data, check the page - old data. Ctrl+F5 - new data. Cool, it works? No, it doesn’t work, because normal users don’t Ctrl+F5 out of habit.

How it works: between the client and the database there’s an intermediate “memory.” Usually Redis or Memcached. When a request comes in, the system checks the cache first. If the data is there (cache hit) - it responds instantly without even touching the database. If not (cache miss) - it goes to the database, gets the data, saves a copy in the cache for some time, and responds to the client.

Client request
       |
       v
  +----------+     cache hit      +----------+
  |  CACHE   | ----------------> | Response  |
  |  (Redis) |                   | instantly |
  +----------+                   +----------+
       |
       | cache miss
       v
  +----------+     save to        +----------+
  |    DB    | ----------------> |  CACHE   |
  | (Postgres)|    cache          |  (Redis) |
  +----------+                   +----------+
       |
       v
  +----------+
  | Response |
  | to client|
  +----------+

Cache exists at different levels: in the browser (images, CSS), on the CDN (more on that later), on the server (Redis), in the database itself (query cache). And at every level it can screw you over.

The main pain - cache invalidation. That’s when the data in the database changed but the cache still holds the old version. Classic bug: user changed their avatar, but all their friends still see the old one. Or worse: the price of a product changed, but the cache serves the old price - the user thinks they’re paying one amount, but gets charged another.

In interviews they often ask: “The user updated their profile but sees old data - what do you check?” Answer: CDN cache, browser cache, Redis cache, Cache-Control and ETag headers. Open an incognito window - if the data is new there, it’s a browser cache problem. If it’s old there too - dig into the server cache.

Databases: Replication and Sharding

Two separate topics that people often confuse.

Replication is when you have one main database (master) and several copies (replicas). Data always gets written to master, and reads come from replicas. Why? Because reads usually account for 90% of the load, and if all requests go to one database, it won’t handle it.

  INSERT / UPDATE / DELETE          SELECT (reads)
         |                          /          \
         v                         v            v
  +-----------+             +-----------+ +-----------+
  |  MASTER   | ---------->|  REPLICA  | |  REPLICA  |
  |  (write)  | replication | (read)    | | (read)    |
  +-----------+             +-----------+ +-----------+

There’s a gotcha here that I ran into on a real project. User creates a post, gets redirected to their profile page. Post doesn’t show up. Refreshes the page - it appears. What happened? The write went to master, but the read came from a replica that hadn’t synced yet. This is called replication lag, and it’s a real bug that’s hard to catch with straightforward testing.

Sharding is a different story. That’s when one database can’t hold all the data (or can’t handle the load), and you split the data into chunks. Users with names A-M go to one database, N-Z to another. Each chunk is a shard.

  All users (1,000,000)
         |
    Shard Key: first letter of name
         |
   +-----+-----+
   |           |
   v           v
+---------+ +---------+
| SHARD 1 | | SHARD 2 |
|  A - M  | |  N - Z  |
| 520,000 | | 480,000 |
+---------+ +---------+

For QA the main problem here is queries that touch both shards. For example, “show all users sorted by registration date.” If users are spread across two databases, someone has to gather data from both and sort it. Sometimes this works poorly or slowly. Worth checking.

Message Queues

Say a user registers. You need to: save them to the database, send a welcome email, create a profile, award a bonus, send a push notification. If you do all this synchronously, the user waits 10 seconds for the page to load. Or even worse, if the email service isn’t responding, the entire registration process hangs.

That’s why queues were invented. The main service puts a task in the queue (RabbitMQ, Kafka, SQS - doesn’t matter), and a separate worker picks it up and processes it in the background. The user sees “Registration successful” in half a second, and the email arrives a few seconds later.

+----------+      +-----------------------------------+      +----------+
| PRODUCER | ---> |         MESSAGE QUEUE             | ---> | CONSUMER |
| (service)|      |  [msg1] [msg2] [msg3] [msg4] ... |      | (worker) |
+----------+      |     RabbitMQ / Kafka / SQS        |      +----------+
                  +-----------------------------------+
                        |                     |
                  messages wait          processed
                  in line                one by one

What can go wrong? Oh, plenty:

Message gets processed twice - user receives two identical emails. Happens if a worker crashes during processing, the message returns to the queue, and then another worker processes it too.
Message gets lost - worker grabbed it, crashed, and it didn’t return to the queue. Email never sent at all.
Queue overflows - the producer pushes tasks faster than the consumer can handle them. At some point everything grinds to a halt.

Had this question in an interview: “User paid for an order, but the confirmation email arrived 2 hours later. Where do you look?” This is exactly about queues - either not enough consumers and the queue backed up, or there were processing errors and the message kept getting thrown into retry.

Microservices

When an app is small, all the code lives in one project - a monolith. One deploy, one database, one process. Simple. But when the team grows and features multiply, the monolith turns into a swamp: every change potentially breaks something elsewhere, deploying is scary, tests run for two hours.

So many teams switch to microservices: Auth separate, Users separate, Orders separate, Payments separate. Each service has its own code, its own database, its own deploy. They communicate over the network via API or queues.

    MONOLITH                          MICROSERVICES

+----------------+          +------+    +-------+
|  Auth          |          | Auth | <--> | Users |
|  Users         |          +------+    +-------+
|  Orders        |              |            |
|  Payments      |              v            v
|  Notifications |          +--------+  +----------+
|                |          | Orders | <--> | Payments |
| one process   |          +--------+  +----------+
| one deploy    |              |
| one DB        |              v
+----------------+         +---------+--------+
                           | Notifications    |
                           +------------------+

                           each service = own
                           process, deploy, DB

For QA this is both good and bad. Good - because you can deploy and test one service without touching the rest. Bad - because bugs appear at the seams. The Orders service expects a field user_id from Users, but the Users developer renamed it to userId. Unit tests for both services are green. But in integration - 500 error.

Another interesting thing - circuit breaker. That’s when one service stops calling another if it’s not responding. Like a fuse in electrical wiring - it trips so it doesn’t fry everything else. Worth checking: if Payments is down, does Orders at least show “Payment temporarily unavailable,” or does it crash too?

Scaling

This one’s short because the idea is simple. Two options:

Vertical - get a beefier server. Had 8 GB RAM - put in 64. Simple, but there’s a ceiling. The most powerful server in the world is still finite.

Horizontal - add more servers. Had 2 - now you have 20. Theoretically infinite, but you need a load balancer, need to think about sessions, data, synchronization.

In an interview they might ask: “The API responds in 200ms with 100 users. What happens at 10,000?” If it’s not scaled - response time grows, timeouts start, some users see errors. QA should be checking this with load testing (k6, JMeter) before traffic actually grows.

Monitoring

This one really gets to me. I’ve seen projects where load tests, unit tests, integration tests - everything exists. And in production the service goes down, and the team finds out 40 minutes later from a user complaint in a Telegram chat. 40 minutes! Because there’s no monitoring.

Google’s SRE book describes four signals to watch:

Signal	What it is	When to raise alarm
Latency	Response time	API responding slower than 2 seconds
Traffic	Request count	Traffic suddenly dropped 80%
Errors	Error rate	5xx errors exceeded 1%
Saturation	Resource load	CPU 95%, disk 90% full

Monitoring can be split into levels. Bottom up:

  Level 4:  BUSINESS METRICS ........... conversion, revenue, signups
            ________________________________________________
  Level 3:  APPLICATION MONITORING ..... errors, latency, throughput
            ________________________________________________
  Level 2:  INFRASTRUCTURE ............ CPU, RAM, disk, network
            ________________________________________________
  Level 1:  UPTIME / AVAILABILITY ..... is the site alive? SSL valid?

Start with level one - at minimum check that the site is actually alive. For my projects I use SiteGuard - it checks uptime every 10-15 minutes and sends alerts to Telegram. Not email, which I check once a day, but Telegram, which I always have open. It also monitors SSL certificates and checks that forms on the site work. The free plan is enough for pet projects.

By the way, in an interview “I set up monitoring and caught downtime before users did” is a really strong argument. Most QA folks don’t even think about monitoring, consider it a DevOps task. Big mistake.

What to check about monitoring on a project: are alerts even set up? Kill a service on staging - did the notification come? How many seconds did it take? Are there false positives? If alerts fire 50 times a day, the team ignores them and will miss a real incident.

Rate Limiting

If your API doesn’t limit the number of requests, any script can DDoS it. Or an attacker will brute-force passwords a thousand times per second.

Rate limiting is a restriction: for example, 100 requests per minute from one IP. Request 101 gets 429 Too Many Requests.

What to check:

Send more requests than the limit - do you get 429?
Headers X-RateLimit-Remaining and Retry-After - do they even exist? The client needs to know when they can retry.
Is the limit tied to user or IP? Because behind a corporate NAT there could be a thousand people with one IP. If the limit is per IP, the entire office gets blocked because of one overly active colleague.
Different endpoints - different limits? Login is usually limited more strictly (5-10 attempts per minute), while reading a catalog is softer.

CDN

If your server is in New York and the user is in Ashgabat, data travels halfway around the world. You can’t fool physics, the speed of light is finite, plus routing, plus packet loss. Result: the site loads in 3 seconds instead of 300ms.

CDN (Content Delivery Network) is a bunch of servers around the world that store copies of your static files (images, JS, CSS). The user gets files from the nearest server, not from the origin.

  WITHOUT CDN:

  Ashgabat ------[ 8,000 km ]-----> New York (origin)
                                    latency: ~200 ms


  WITH CDN:

  Ashgabat ---[ 2,000 km ]---> Istanbul (edge)    New York (origin)
                               latency: ~30 ms      |
                               content already       | copied first
                               cached          <-----+ time

What to test here is mainly caching. Updated an image on the site - is the CDN serving the old one? How fast does it update? Some CDNs cache for 24 hours, and if you uploaded a broken banner, it’ll be sitting on all edge servers worldwide for a full day.

High Availability

Availability is measured in “nines”:

Availability	Downtime per year	Like this
99%	3.6 days	Fine for a personal blog
99.9%	8.7 hours	Most SaaS products
99.99%	52 minutes	Finance, e-commerce
99.999%	5 minutes	Banks, telecom. Very expensive

Each additional “nine” costs exponentially more. And QA plays an important role here: you need to verify that failover works. The primary data center went down - did the backup take over? Was any data lost? How much time passed between failure and recovery?

And a separate topic - backups. Everyone makes backups. Few people verify they can actually restore from them. I’m serious. Run a drill on your project - try to bring up the service from a backup. There’s a non-zero chance the backup is corrupt or the recovery procedure isn’t documented. Better to find out during a drill than at 3 AM during a real incident.

Interview Task: “Design a Notification System”

This one comes up often. Don’t panic - they don’t expect a perfect solution from QA. They expect the right questions.

First things to ask:

What channels? Email, SMS, Push, Telegram - or just one?
How many notifications per day? A thousand is one story, ten million is a completely different architecture.
How critical is delivery? An OTP code for login needs to arrive in 10 seconds. A marketing newsletter - well, if it arrives in an hour, no big deal.

Rough architecture:

  +------------+
  | API Server |  <-- "Send notification to user #42"
  +-----+------+
        |
        v
  +---------------+
  | MESSAGE QUEUE |  <-- buffer so providers don't get overwhelmed
  +--+---+---+----+
     |   |   |   |
     v   v   v   v
  +----+ +---+ +----+ +--------+
  |Email| |SMS| |Push| |Telegram|   <-- workers (handlers)
  +--+--+ +-+-+ +-+--+ +---+----+
     |      |     |         |
     v      v     v         v
  SendGrid Twilio Firebase  Bot API   <-- external providers
     \      |     |         /
      v     v     v        v
  +--------------------------+
  |    NOTIFICATION DB       |   <-- log: who, what, when, status
  +--------------------------+

What I would test:

The notification actually arrives via each channel. Not just “sent” in the log, but verify it was received.
Worker crashes - is the message lost? Does it go to retry? How many attempts before giving up?
Duplicates - user doesn’t receive two identical SMS?
Priorities - does the OTP code skip the queue? Or is it stuck behind 50,000 marketing newsletters?
User unsubscribed - did notifications actually stop? (This is also a legal requirement in some countries.)

Common Interview Mistakes

Over twenty interviews I collected the typical mistakes. Here are the five most common:

Drawing architecture right away. Without clarifying requirements, you’ll design the wrong thing. The first 5 minutes should be nothing but questions. Interviewers appreciate that.
Forgetting about load. “It works” isn’t enough. Does it work at 10,000 requests/sec? At 100,000? How much data is in the database - a thousand records or a billion?
Not asking “what if this goes down?” For QA this should be a reflex. Every component on the diagram is a potential failure point.
Saying nothing about monitoring. If you proactively add “and we also need monitoring - alerts when errors spike, a latency dashboard” - that sets you apart from other candidates.
Saying “that’s not my responsibility.” Technically - sure, developers do System Design. But a Senior QA who doesn’t understand architecture only tests forms. And bugs go to production.

Checklist: What to Study Before the Interview

Load Balancer - what it is, distribution algorithms, how to test failover
Cache - Redis, caching levels, cache invalidation
Replication - master/replica, replication lag
Sharding - why, how, problems with cross-shard queries
Queues - RabbitMQ/Kafka, retry, duplicates, dead letter queue
Microservices - contract testing, circuit breaker
Scaling - vertical vs horizontal
Monitoring - 4 golden signals, tools
Rate Limiting - 429, per user vs per IP
HA/DR - nines, failover, backups

You don’t need to know all of this in detail. The main thing is to understand that these components exist, why they’re needed, and what can go wrong with them. That’s enough to not sit there in silence during an interview.

Free QA course with hands-on exercises at annayev.com (English, Russian, Turkce).

If your project is already in production and has no monitoring, start with SiteGuard. Uptime checks + SSL monitoring + Telegram alerts. Free, no card needed, takes a minute to set up.