When I started to code, a long time ago, one of the premises was to keep hardware usage to the minimum. Hardware was expensive and a constraint. A few years later hardware stopped to be a constraint and became a cheap asset. If a solution needed to scale, it was cheaper to buy hardware than to refactor the source code. Scaling was about enlarging the datacenter and no Event-Driven Services didn’t exist.
Global delivery allows having flexible resources on land or on high seas. Read our blog Challenges and Solutions on Global Delivery Models to learn more.
These days the need to scale and the load that solutions must handle cannot attend only with hardware architectures. How to achieve 1000 transactions per second on a transactional system? How can you manage bulk operations that may have millions of transactions and continue living? How can your solution linear scale as you add more hardware? One possible answer is to use an event-driven design as the core engine for your solution.
It is common to think our use cases and APIs as synchronous use cases. A transaction starts (or is invoked) and follow an execution sequence until its ends. The execution sequence may execute remote calls to other processes and get involved in Input/Output Operations (IOPs). Every execution context gets associated with a thread. Thanks to multithreading support, thread pools and cores of this architecture works almost in every situation.
However, you will always find a roof in scaling with synchronous implementation. As I mentioned, one thread always attaches a request, even if the execution context is waiting for IOPs. In that case, the thread is in the waiting state and can´t be reused. Even worse, all the resources that associated with this thread may be blocked too. An example could be database session or a file descriptor to maintain the http connection.
Imagine a typical use case:
- Starts with API post
- Input data validation
- Validate data with database
- Invoke remote process and wait for response
- Save response to database
- Returns general response
When the whole system is working, the response time will be acceptable. What happens when you have to attends hundreds of request per seconds? The response time will start to grow exponentially.
Despite the amount of hardware, if your transaction complexity involves IOPs or remote calls, the way we design our solution may be the difference when scaling applications.
Event-Driven Services Design
Another way of designing the previous use case is to make your API asynchronous and add events to the execution sequence. Our transaction won’t have a linear execution context and will acquire state.
Let rethink the use case.
- Start Api post
- Generates transactionId and save it to database
- Publish an event of “newTransaction”
- Returns transactionId
Step 2 (in a different thread pool or application)
- Consumes the event “newTransaction.”
- Validate data and update TransactionState to “validated.”
Step 3 (the remote process)
- Consumes the event “newTransaction”
- Do stuff
- Throw event with the response
Step 4 (another thread pool)
- Consumer event from remote process
- Save data to database
- Update state of transaction to “Done”
To check that transaction state and result you must provide a new endpoint. It is important that your client supports async operations.
What change? Though the atomicity of operations is at the minimum, the response time will increase. From the client’s point of view, that usage has changed. When you make a request, only returns an id must be re-query to get the final information. Now the use case can handle tons of request with least impact on response time, using fewer resources.
In the real world, this usage is very common. When you go to the bank and need assistance from a cashier, you get a ticket and wait for your turn. Using ticketing system, the bank can handle customers queues. This types of examples are very common in the real world and apply to software development as well.
More evidence and fewer words
I developed a few components to compare different implementations using the use case above. I based my development stack on Java & spring framework. For event transport, I use Apache Kafka. My testing tool was Jmeter. The test was configured to run for 10 minutes. I ramped up to 40 concurrent users.
Results from synchronous implementation
- Transaction per second increase as concurrent users increased.
- Response time was over 500ms on average. The whole transaction has to wait for RPC calls and database IOPs.
- If the client does not support multithreading, the tps would be very low.
- 29k request where process. The test had an average of 79 operations per second.
Results from event-driven implementation
- Even with a few concurrent users, the max TPS was achieved.
- Response time was under 100ms the whole test.
- Even if the client does not support multithreading, high TPS could be achieved.
- 229k request very process. The test had an average of 560 operations per second.
I have only shown the tip of the iceberg. Event-driven solutions are the answer to a lot of use cases that are having a hard time with load peaks. On smartphones, you can install end users applications because user’s experience is key to success. The need for real-time response times and be able to handle peaks of loads are mandatory and not negotiable. Luckily technology has evolved, and we have a lot of tools that help to cope with the demands. Apache Kafka is a great ally. You must take it into account to solve architectural challenges and achieve insane scaling.
Nothing is free in life. We must manage new terms like eventual consistency or distributed transactions based on states. Coding will become more complex and difficult to follow.
Do I have to switch to Event-Driven right away? Like everything in life, there is no single answer to all question. However, I invite you to get in touch with this architectural pattern to make better solutions in the future.
High Availability is a quality of a system that guarantees operational performance throughout time. Here are some general ideas and concepts regarding HA.