Load balancing
Load balancing is the practice of distributing incoming requests or workload across multiple computing resources - servers, containers, or services - rather than directing all traffic to a single resource. A load balancer acts as a traffic director, deciding which backend server should handle each request based on factors like current load, server health, or configured rules. This distribution enables higher availability, better performance, and more efficient resource utilization.
Why it matters
Single servers have limits. They can only handle so many requests, store so much data, and process so much computation. When traffic exceeds those limits, users experience slow responses or complete outages. When a single server fails, the entire service goes down.
Load balancing solves both problems. By spreading traffic across multiple servers, the system handles more total load than any single server could. By routing around failed servers, the system stays available even when individual components fail.
For product managers, load balancing is foundational infrastructure that enables scale and reliability. Understanding it helps set realistic expectations about system capabilities and make informed decisions about architectural trade-offs.
How load balancing works
A load balancer sits between users and servers, intercepting requests and forwarding them to appropriate backends.
Request arrives. A user makes a request - loading a webpage, calling an API, submitting a form.
Load balancer receives. Instead of going directly to a server, the request goes to the load balancer.
Backend selection. The load balancer chooses which server should handle this request based on configured algorithm and current conditions.
Request forwarding. The load balancer sends the request to the selected server.
Response return. The server processes the request and sends the response back through the load balancer to the user.
This all happens in milliseconds, invisible to users who simply see a responsive application.
Load balancing algorithms
Different algorithms determine how traffic gets distributed.
Round Robin sends requests to servers in rotation - first to server A, then B, then C, then back to A. Simple but doesn't account for differing server capacities or current load.
Weighted Round Robin extends round robin with weights. A server weighted "2" gets twice as many requests as one weighted "1". Useful when servers have different capacities.
Least Connections sends requests to the server currently handling the fewest connections. Better adapts to actual load but requires tracking connection counts.
Least Response Time sends requests to the server responding fastest. Accounts for both load and network conditions.
IP Hash uses the client's IP address to determine which server handles their requests. The same user consistently reaches the same server - useful for session affinity.
Random selects servers randomly. Simple and often effective when servers are homogeneous.
The right algorithm depends on application characteristics, server homogeneity, and session requirements.
Types of load balancers
Load balancers operate at different network layers with different capabilities.
Layer 4 (Transport) load balancers route based on IP addresses and ports without examining request content. They're fast and efficient but can't make routing decisions based on request details.
Layer 7 (Application) load balancers examine request content - URLs, headers, cookies. They can route based on request path ("/api" goes to API servers, "/images" goes to media servers), implement sophisticated rules, and provide features like SSL termination.
Hardware load balancers are physical devices dedicated to load balancing. They offer high performance and reliability but are expensive and inflexible.
Software load balancers run on standard servers. More flexible and cost-effective than hardware, they scale horizontally. Examples include HAProxy, NGINX, and cloud provider offerings.
Cloud load balancers are managed services from cloud providers (AWS ELB, Google Cloud Load Balancing, Azure Load Balancer). They reduce operational burden and integrate with other cloud services.
Health checks
Load balancers must know which servers are healthy.
Health checks periodically test backend servers - making requests and verifying responses. Servers that fail checks are removed from rotation until they recover.
Types of health checks:
Health check configuration includes frequency (how often to check), timeout (how long to wait for response), and threshold (how many failures before marking unhealthy).
Properly configured health checks prevent traffic from reaching failed servers while avoiding false positives that remove healthy servers unnecessarily.
Session persistence
Some applications require users to reach the same server across multiple requests.
Stateful applications store user session data on the server. If a user's second request reaches a different server, their session data isn't available.
Session persistence (or "sticky sessions") ensures users return to the same server. Methods include:
Trade-offs exist. Session persistence can create uneven load distribution and complicate server failures. Stateless application design (storing session externally) is often preferable when possible.
Load balancing benefits
Properly implemented load balancing provides several advantages.
Scalability. Add servers to handle more traffic. Remove them when demand decreases. Scaling becomes adding capacity rather than replacing infrastructure.
High availability. If a server fails, traffic routes to healthy servers. Users may never notice the failure.
Performance. Distribute load to prevent any single server from becoming overwhelmed. Response times stay consistent under varying demand.
Flexibility. Deploy updates gradually (rolling deployments). Test new versions with subset of traffic. Maintain old and new versions simultaneously.
Efficient resource use. Distribute work based on actual capacity and load rather than static assignment.
Load balancing challenges
Implementation involves various considerations.
Single point of failure. The load balancer itself can fail. High-availability setups use redundant load balancers.
Configuration complexity. Health checks, routing rules, and algorithm selection require careful tuning.
Session handling. Stateful applications require persistence strategies that complicate load distribution.
SSL/TLS termination. Where encryption ends affects security and performance. Terminating at the load balancer simplifies certificate management but means traffic is unencrypted internally.
Cost. Load balancers consume resources and may have associated licensing or service costs.
Debugging difficulty. When requests pass through load balancers, tracing problems becomes more complex.
Load balancing in modern architecture
Contemporary applications use load balancing at multiple levels.
External load balancing distributes traffic from users to application entry points.
Internal load balancing distributes traffic between services in microservices architectures.
Container orchestration (Kubernetes) includes built-in load balancing for containerized services.
Service mesh technologies provide sophisticated load balancing with features like circuit breaking and retry logic.
CDN integration distributes static content globally while load balancing dynamic requests.
Understanding load balancing helps product teams reason about system behavior, capacity planning, and reliability characteristics - essential knowledge for products that need to work reliably at scale.

