Latency is the delay between something happening and the system responding to it. In software terms, it’s the gap between a user action (for example, clicking a button, submitting a form, or opening a page) and the moment they see the result. It is usually measured in milliseconds, but humans feel it in frustration and impatience.
Latency is not the same as speed in general. A system can have high throughput (handling lots of requests) and still feel slow if each request takes too long to respond. This is why latency matters so much in modern systems, especially APIs, distributed services, and real-time applications. Latency can come from many places. Network delays, slow databases, overloaded servers, inefficient code, too many service-to-service calls, or even third-party integrations. In distributed systems, latency often becomes the hidden tax you pay for complexity.
From a Quality Engineering perspective, latency is not just a performance metric. It is a user experience risk. If a checkout takes five seconds to confirm, users don’t care whether the system is “working”. They assume it is broken. Latency is one of those quality signals that rarely fails loudly but can quietly destroy trust over time.
Latency thresholds are the agreed limits for how long something is allowed to take before it becomes a problem. They are usually expressed as response time targets, such as “this API must respond in under 200ms” or “checkout must complete within 3 seconds”. The key point is that a threshold defines what “good enough” looks like, not what is technically possible on a perfect day.
Good latency thresholds are based on user expectations and business impact. A login page might tolerate a short delay. A payment confirmation or live trading screen might not. And internal systems may have different expectations than customer-facing ones.
Latency is not the same as speed in general. A system can have high throughput (handling lots of requests) and still feel slow if each request takes too long to respond. This is why latency matters so much in modern systems, especially APIs, distributed services, and real-time applications. Latency can come from many places. Network delays, slow databases, overloaded servers, inefficient code, too many service-to-service calls, or even third-party integrations. In distributed systems, latency often becomes the hidden tax you pay for complexity.
From a Quality Engineering perspective, latency is not just a performance metric. It is a user experience risk. If a checkout takes five seconds to confirm, users don’t care whether the system is “working”. They assume it is broken. Latency is one of those quality signals that rarely fails loudly but can quietly destroy trust over time.
Latency thresholds are the agreed limits for how long something is allowed to take before it becomes a problem. They are usually expressed as response time targets, such as “this API must respond in under 200ms” or “checkout must complete within 3 seconds”. The key point is that a threshold defines what “good enough” looks like, not what is technically possible on a perfect day.
Good latency thresholds are based on user expectations and business impact. A login page might tolerate a short delay. A payment confirmation or live trading screen might not. And internal systems may have different expectations than customer-facing ones.