Debugging Redis Latency

This article is about how at work we solved the issue of high response time while executing Redis commands from Node.js server to a Redis compatible database known as dragonfly. Background After introducing metrics to our Node.js service, we started recording the overall response time whenever a Redis command was executed. We had a wrapper service around a Redis driver known as ioredis for interacting with our Redis-compatible database. Once we set up Grafana dashboards for metrics like cache latency, we saw unusually high p99 latency numbers, close to 200ms. This is a very large number, especially considering the underlying database query itself typically takes less than 10ms to complete. To understand why this latency was so high, we needed more detailed insight than metrics alone could provide. As part of a broader effort to set up our observability stack, I had been exploring various tracing solutions – options ranged from open-source SDKs (OpenTelemetry Node.js SDK) with a self-deployed trace backend, to third-party managed solutions (Datadog, Middleware, etc.). For this investigation, we decided to proceed with a self-hosted Grafana Tempo instance to test the setup and feasibility. (So far, the setup is working great, and I’m planning a detailed blog post on our observability architecture soon). With tracing set up, we could get a waterfall view of the path taken by the service while responding to things like HTTP requests or event processing, which we hoped would pinpoint the source of the delay in our Redis command execution. ...

April 6, 2025 · 11 min

B-Tree Latch Optimisation

References 5.6 Problem Generally when traversing the index made up of btree we have to take latch on it. In MySQL 5.6 the approach of taking latch depends on the possible operation we are doing: If the operation is a read operation then taking a read lock is sufficient to prevent any writes to happen to the pages we are accessing in Btree while reading If the operation is a write operation then there are again two possibilities: Optimistic Locking If the write is limited to modifying the leaf page only without modifying the structure of the tree (Merging OR Splitting) then it’s an optimistic locking approach where we take read latch on root of the tree and write latch only on the leaf node to modify ^ab3c53 Pessimistic Locking But if the operation result is in any type of restructuring of the tree itself then that will be known to us only after reaching the target leaf node and knowing its neighbours and parents. So the approach is first to try with optimistic locking defined above and then go for pessimistic locking Pessimistic locking involves taking a write latch on the root resulting in full ownership of the tree by the current operation (until the operation is complete no other operation can take a read or write latch, so all the other operations has to wait even if they are read operations and involve only optimistic locking). When the leaf node is found we take write latch on the leaf’s neighbours as well as its parent and do the restructuring and if the same restructuring needs to happen at parent level then we will take similar write locks recursively up the tree. ^17a3ff ...

November 17, 2024 · 4 min