Five scalability pitfalls to avoid with your Kafka application

Apache Kafka is a highly scalable event streaming platform that offers high-performance capabilities. However, to fully utilize Kafka’s potential, it is essential to carefully consider the design of your application. Otherwise, you may encounter scalability issues and poor performance. Thankfully, IBM has provided the IBM Event Streams service since 2015. This service is a fully-managed Apache Kafka service running on IBM Cloud® and has helped numerous customers and IBM teams overcome scalability and performance problems with their Kafka applications. In this article, we will discuss some common problems of Apache Kafka and provide recommendations to avoid scalability issues in your applications.

1. Minimize network round-trips:
Some Kafka operations involve the client sending data to the broker and waiting for a response. Although a round-trip may take only 10 milliseconds, it limits the number of operations to 100 per second. To maximize throughput, it is advised to avoid such operations whenever possible. Kafka clients offer ways to bypass these round-trip times. Ensure that you take advantage of these features by following these tips:
– Do not check the success of every message sent. Kafka’s API allows you to separate sending a message from confirming its successful reception by the broker. Minimize network round-trip latency by sending multiple messages before checking if they were received.
– Avoid committing offsets after processing each message. Committing offsets synchronously requires a network round-trip with the server. Commit offsets less frequently or use the asynchronous offset commit function to avoid the round-trip for every message.

2. Differentiate processing times from consumer failures:
Kafka monitors the “liveness” of consuming applications and disconnects any clients that appear to have failed. However, it cannot distinguish between a client taking longer to process messages and a client that has actually failed. To prevent this confusion, you can:
– Configure the maximum time between poll calls using the “max.poll.interval.ms” setting.
– Adjust the maximum number of messages returned by a single poll using the “max.poll.records” setting.
– Pause and resume the flow of messages to allow Kafka to detect client failures while processing individual messages.
– Monitor Kafka client metrics, such as average and maximum time between polls, to identify downstream system issues affecting message processing.

3. Reduce the cost of idle consumers:
Kafka consumers send a “fetch” request to brokers to receive messages. By default, consumers instruct brokers to wait up to 500 milliseconds for at least 1 byte of message data. If your application has mostly idle consumers and scales to a large number of instances, this can lead to a significant number of requests doing nothing but consuming CPU time. To mitigate this:
– Increase the “fetch.max.wait.ms” setting to reduce the number of requests made by idle consumers.
– Consider adjusting other Kafka consumer configurations to optimize message retrieval based on your application’s requirements.

By following these recommendations, you can avoid common scalability problems and optimize the performance of your Apache Kafka applications.

Source link