Imagine you are trying to buy tickets for a successful rockstar concert or a public event which draws huge crowd and their booking system is online. The moment people start booking at once , server crashes causing booking issues to everyone.

The same is valid for an enterprise scenario as well where user log in and do the same action at a time causing a very bad experience.
At the root of this problem is scaling and it is one of the most important factors in designing any application or choosing a framework. At the core of it , it is the threading model that is doing most of the work As this answers scaling part of the application and complexity that comes with it. Let us understand about these models.
What is I/O?
To understand this problem better, let us start with the basics:
One of the fundamental laws of physics is you cannot transmitt the data instantly from place A to place B , it will take time . I/O stands for Input/Output. In simple terms, it refers to any operation where your program communicates with the outside world.
This includes:
- Reading data from a source (like reading a file from disk, receiving data over the network, or fetching information from a database)
- Writing data to a destination (like saving a file, sending a response to a client, or storing data in a database)
Think of it this way: Imagine you are ordering food from a restaurant. The Input is you placing the order, and the Output is the restaurant delivering the food. The process of waiting for the food is what we call an “I/O operation.”
When your program performs I/O operations, it often has to wait. This waiting time is crucial to understand because it directly affects how many requests your server can handle simultaneously. This is where the concept of blocking vs non-blocking I/O becomes important.
Blocking I/O thread model
In blocking I/O model, the client makes a request to the server. The server then creates a new thread and processes the request, and the thread waits for the data. As the load increases, beyond a point server will stop responding to requests and shut down eventually.

Working
- A request arrives.
- The server assigns one thread to handle it.
- Thread does the following work
- read request body (blocking read)
- call another service (blocking HTTP client)
- query the database (blocking JDBC)
- During each I/O step, the thread is parked until data arrives.
- When the response is ready, the thread writes it back (blocking write) and is freed.
Server Stacks that use blocking thread model
- Java based
- Servlet containers/ Java EE apps servers when used in synchronous way
- Apache Tomcat
- Jetty
- JBoss/ WildFly/ WebLogic WebSpehere
- Spring web
- Servlet containers/ Java EE apps servers when used in synchronous way
- Apache HTTP Servers
- Apache httpd with prefork / worker PM
- Django / Flask
- Ruby on rails
However, many servers today use non-blocking network connections, but the application model still be blocking. For example, Tomcat with NIO connector, socket layer may be non-blocking, but servlet thread still blocks on JDBC and file IO and synchronous HTTP clients.
Non-Blocking I/O thread model
In the non-blocking thread model, a thread does not wait for while an I/O operation which is in progress. Instead, it starts the I/O, then moves on to other work and comes back only when the I/O is ready.

How Does it work ?
Simple Idea behind this model is :
- One (or a few) threads watch many sockets
- When a scoket becomes ready (data available to read/ space available to write ) the OS notifies that thread.
- That thread handles small piece of work quickly and then returns to watching the other sockets.
- These servers are implemented using OS mechanisms like
- select
- poll
- epoll
- kQueue
- IOCP
Why It scales better ?
In this model threads are used for active work, not waiting for I/O to complete
It brings some of these strong advantages to the table
- Fewer threads handling more concurrent connections
- Lower Memory Usage
- Less Context switching overhead
Coding Style Changes
While it is important to understand, we also need to change how we write our code to use non blocking systems. We should always use the asynchronous way of writing the code.
- Callbacks – do something when this operation is done
- Future/Promises – An async handle that represent a result that will arrive later
- Async/Await
Important Rules
- No matter what, never block the reactor thread.
- In case of compute intense operations, offload heavy CPU work to separate worker thread.
- Always use reactive libraries for database, message brokers etc. The operation might be blocking because of the library being used is blocking library.
Server Stacks that use non-blocking thread model
Here are common server stacks that use a non-blocking (event-driven) I/O + threading model (i.e., a small number of threads/processes can handle many concurrent connections without “waiting” per connection).
Reverse proxies / edge servers (very commonly non-blocking)
- NGINX — event-driven architecture with non-blocking I/O (worker processes handle many connections).
- HAProxy — event-driven, non-blocking daemon (uses event multiplexing).
- Envoy Proxy — worker threads run a non-blocking event loop for connection I/O.
App server runtimes / web servers
- Node.js HTTP servers (Express/Fastify/etc.) — built around an event loop enabling non-blocking I/O.
- ASP.NET Core (Kestrel) emphasizes asynchronous I/O (sync I/O is discouraged because it can starve threads).
Java/JVM servers/frameworks (non-blocking I/O at the core)
- Jetty: uses Java NIO; core I/O is “completely non-blocking”.
- Undertow: Java web server based on non-blocking I/O (supports both blocking and non-blocking APIs).
- Netty-based servers: Netty is an asynchronous, event-driven networking framework used to build non-blocking servers/clients.



