What is a Load Balancer?

A Load Balancer is a system (hardware or software) that distributes incoming network traffic across multiple backend servers. Its goal is to ensure no single server is overwhelmed, making your application:

  • Scalable: Handles more traffic by distributing it efficiently.
  • Highly available: Survives server failures by rerouting traffic.
  • Efficient: Improves performance by using all servers optimally.

Real-World Analogies:

🧍🏽‍♂️🧍🏾‍♀️ Restaurant Queue with a Host

  • Imagine a busy restaurant with multiple tables and a host at the entrance.
  • As people arrive, the host seats them at available tables evenly.
  • If one waiter is too busy, the host avoids assigning them new guests.
  • If a table is closed for cleaning, the host skips it.

➡️ Just like the host, the load balancer decides which “server” (table/waiter) handles each new “request” (customer).

Why Load Balancing is Important?

Load balancing plays a critical role in modern distributed systems and web applications. It ensures the smooth functioning, scalability, and reliability of services accessed by millions of users daily.

1. Scalability

  • Load balancers enable horizontal scaling by allowing you to add more backend servers as traffic increases.
  • You can handle growing user demands without redesigning the system architecture.
  • Example: In a social media platform, as users and content grow, load balancers help distribute requests across dozens or hundreds of servers seamlessly.

2. High Availability

  • If one server fails or becomes unhealthy, the load balancer automatically reroutes traffic to other healthy servers.
  • This eliminates single points of failure, increasing system uptime.
  • Example: During a cloud zone failure, traffic can be routed to a different availability zone.

3. Improved Performance

  • Distributing traffic reduces response time by preventing any single server from becoming a bottleneck.
  • Load balancers can use intelligent algorithms (like Least Connections or Response Time) to route requests to the most responsive servers.

4. Fault Tolerance and Reliability

  • If a backend server crashes or becomes unresponsive, the load balancer stops sending traffic to it (based on health checks).
  • Ensures continued service even during server outages.

5. Security

  • Acts as a reverse proxy, hiding the internal infrastructure and preventing direct access to backend servers.
  • Can help throttle suspicious traffic, defend against DDoS attacks, and offload SSL/TLS termination.

6. Maintainability and Flexibility

  • Enables zero-downtime deployments by draining traffic from specific instances before updates.
  • Supports A/B testing and canary releases by routing a portion of traffic to new code.

🛠️ Types of Load Balancers

TypeDescriptionExamples
Layer 4 (Transport)Works at TCP/UDP levelAWS NLB, HAProxy (TCP mode)
Layer 7 (Application)Works at HTTP/HTTPS layerAWS ALB, NGINX, Envoy
Hardware-basedDedicated applianceF5, Citrix Netscaler
Software-basedRuns on standard serversNGINX, HAProxy, Envoy

Common Load Balancing Algorithms:

🔁 1. Round Robin

  • Requests are sent to servers in a fixed circular order.
  • Simple and effective when all servers have similar capacity.

Best for: Uniform server capacity and stateless apps.
🚫 Weakness: Doesn’t account for server load or response time.


⚖️ 2. Weighted Round Robin

  • Like Round Robin, but assigns weights to servers.
  • A more powerful server will get more requests than a weaker one.

Best for: Systems with different server capacities.
🚫 Weakness: May not react well to dynamic traffic/load changes.


🔗 3. Least Connections

  • Routes new requests to the server with the fewest active connections.
  • Dynamically adapts to server load.

Best for: Long-lived connections (e.g., database queries).
🚫 Weakness: Ignores server response time or performance.


⚖️ 4. Weighted Least Connections

  • Combines weights and connection count.
  • Server with the lowest (active connections / weight) gets the next request.

Best for: Mixed-capacity servers and variable workloads.


🧠 5. Least Response Time

  • Routes traffic to the server with the lowest average response time.
  • Helps reduce user-perceived latency.

Best for: User-facing apps where response time matters.
🚫 Weakness: Needs accurate monitoring to be effective.


🎲 6. Random

  • Sends each request to a randomly selected server.
  • Simple and surprisingly effective for evenly distributed servers.

Best for: Stateless apps with equally powerful servers.


🌐 7. IP Hash

  • Uses a hash of the client IP address to choose the server.
  • Ensures the same user always goes to the same server.

Best for: Sticky sessions (e.g., shopping carts, user login).
🚫 Weakness: Poor load distribution if client base is skewed.


🛣️ 8. URL Hash / Path Hash

  • Uses the request URL or path to determine server.
  • Often used in content-based routing (Layer 7 load balancing).

Best for: Microservices or content-type-specific servers.


🧾 Summary Table

AlgorithmBest ForConsideration
Round RobinSimple, uniform serversNo load awareness
Weighted Round RobinVarying server capacityStatic weights only
Least ConnectionsDynamic workloadsIgnores response time
Weighted Least Conn.Dynamic + varying capacityNeeds config and monitoring
Least Response TimeFastest user responseRequires performance tracking
RandomStateless, balanced setupsNot intelligent
IP HashSession persistenceLoad imbalance risk
URL/Path HashAPI gateways, microservicesURL-based routing only

Categorized in:

HLD,