Chaos with a purpose: practising resilience through a contrived API

04 Jul 2026

Decrypt the Narrative is a deliberately contrived API testing challenge designed to simulate the kind of behaviour you only really encounter once systems start misbehaving in production. On the surface, it looks simple. You authenticate, receive a token, call an endpoint to collect fragments, and reconstruct a single canonical sentence. Submit it, and you’re done. That is the game layer.

The value comes from how the system behaves while you do it. It deliberately introduces patterns you will recognise from real microservice environments, including rate limiting (where requests are restricted to control load), intermittent errors, latency spikes and responses that are technically successful but not actually useful. It is designed to punish brute force approaches and reward control, measurement and strategy instead.

You can approach it however you like. Postman, curl, a quick script or a full client in any language all work. If you can make HTTP requests, you can take part. That flexibility is intentional. QEs, SDETs, developers and anyone working with APIs can all approach it differently, but still run into the same underlying lessons.

How to try it in two minutes
You can interact with the platform using any HTTP-capable tool, including Postman, curl, Python, JavaScript or any other language client. You make requests live without any tooling by using the in-built swagger docs here.
A minimal flow looks like this.

First, authenticate to receive a token:
curl -X POST https://dcrypt.run/auth \
  -H "Content-Type: application/json" \
  -H "team: demo" \
  -d '{}'

You will receive a response similar to:
{"token":"afe2ca9c-2084-420e-8ac4-eedc2080be79","team":"demo","remaining":20}

You can then use that token to request fragments:
curl https://dcrypt.run/fragment \  -H "Authorization: Bearer abc123..."

Each response returns a small part of the overall sentence:
{"fragment": "resilience",  "position": 4}

You repeat this process until you have enough fragments to reconstruct the full sentence and submit it.  At first, this feels straightforward. Then the behaviour starts to shift.

Why this matters, even if your APIs look “fine”
It is easy to read something like this and assume it only applies to large-scale or external systems. In reality, these behaviours show up in your own services as well. They tend to appear when traffic changes, when dependencies slow down, or when small changes begin to compound across services.

The problems are rarely full outages. They are usually the more frustrating middle ground. A downstream service becomes slower after a deployment but nothing is technically down. Retries (automatically repeating failed requests) quietly multiply load until something tips over. Timeouts behave inconsistently, so outcomes depend on timing and luck. Caches make the first request behave differently to the next. Partial failures (where only part of a system succeeds) get masked behind responses that look successful. You can have strong unit and contract testing in place and still be caught out by these behaviours.

This becomes even more important when integrating with third-party APIs. With your own systems, you can often adjust timeouts, add tracing (linking requests across services to follow behaviour) or tune behaviour. With third-party services, you get none of that control. You only discover how they behave when they start behaving badly.

So the real question is not whether an API exists. It is how it behaves under pressure. Does it degrade cleanly or stall? Does it return useful errors or vague ones? Do retries help or make things worse? If you are not designing and testing for those behaviours, you are building integration on hope.

How this played out in a live session
Decrypt the Narrative was originally built for a Staffordshire University hackathon, where over 40 students across 13 teams were given limited time to solve the challenge using any tooling they chose. On the surface, the task appeared simple. Authenticate, collect fragments and reconstruct a sentence. In practice, the system behaved much closer to a production service under pressure. It introduced rate limits, latency, intermittent failures and deliberately unstable behaviour that forced teams to think about how they were interacting with it. Most teams began in the same way. They wrote quick scripts and started sending requests as fast as possible. For a short period, it looked like they were making progress. 

Then things changed. Rate limits started to apply, duplicate fragments became more common and useful responses became harder to identify. Progress slowed, and in some cases stopped entirely. A smaller group took a different approach. They slowed down, tracked what they were receiving, avoided duplicates and adjusted their request patterns based on how the system behaved. Instead of reacting to failures, they adapted to them.  They finished first. Not because they were faster, but because they were more deliberate.

At peak, the system handled just over 100 requests per second as multiple teams ran retries, parallel requests and competing strategies. What stood out was not the volume, but the difference in behaviour. Teams with brittle logic amplified the instability. Teams with structured approaches stabilised it. The challenge stopped being about reconstructing a sentence fairly quickly. It became about whether their engineering approach could hold up under unpredictable conditions.

The core idea: make failure part of the experience
Decrypt the Narrative rewards people who treat an API like a real dependency rather than a toy service. It is not about sending more requests until something works. It is about building confidence under imperfect conditions.

The system includes rate limits that punish uncontrolled traffic, chaos events that introduce delay and failure, and responses that look valid but do not move you forward. These behaviours are not there to make the problem harder. They are there to reflect what happens in real systems. If your approach only works when everything is perfect, it is not a resilient approach. In distributed systems, the world is rarely perfect for long.

Lesson 1: A 200 does not mean you are winning
One of the first things people notice is that a successful request does not always mean progress. You might receive a response like this:

{  "fragment": "the",  "position": 3}

Repeatedly. 

The request has succeeded and the response is valid, but your overall state has not improved. You are not actually moving forward.
This reflects a common production issue. Some of the hardest failures are not outages, but degradations that hide behind successful responses. Systems can be slow without failing, return partial or duplicated data, or behave inconsistently due to eventual consistency (where systems take time to synchronise state).

A lot of testing still treats outcomes as binary, either pass or fail. That works for correctness, but it is not enough for resilience. Resilience lives in the grey area, where systems are technically working but not behaving well.

A practical way to approach this is to take one critical journey in your system and define what good looks like under stress. Consider acceptable latency ranges, how the system should behave when dependencies are slow or unavailable, what gets logged and how it is traced, and what the user experience should be during partial failure. If those things are not clearly defined, they are assumptions rather than requirements.

Lesson 2: Retries are not a strategy
Retries are a useful tool, but they are easy to misuse. A naive client might continuously send requests and quickly encounter rate limiting: HTTP 429 Too Many Requests. At that point, simply retrying immediately often makes the situation worse. More traffic is added to a system that is already struggling, increasing instability rather than reducing it.

In the challenge, these approaches tend to burn through quotas and generate more noise than useful progress. More effective approaches introduce control. Requests are spaced out, responses are tracked and behaviour is adjusted based on what the system is doing. Retries are used deliberately, often with backoff (increasing delay between retries) and variation, rather than as a default reaction.

This is a pattern that shows up repeatedly in production. When a dependency starts to fail, the behaviour of the caller often determines whether the situation stabilises or escalates. A simple experiment in a non-production environment can make this visible. Introduce latency or intermittent errors into a dependency and observe how the caller behaves. Does it retry immediately or adapt? Does it distinguish between different failure types? Does it eventually stop, or does it keep applying pressure? If those answers are unclear, there is likely a resilience gap.

Lesson 3: Hidden state is where the pain lives
The platform itself is mostly stateless. It does not store long-term progress for teams or quietly correct mistakes. That forces a useful habit, which is to track state explicitly and manage it carefully.

In real systems, hidden state exists in many forms. Caching can make the first request behave differently to subsequent ones. Load balancing can change which instance handles a request. Sessions, asynchronous processing and eventual consistency all introduce behaviour that is not immediately visible.

Flaky behaviour is often not random. It is the result of state that is not being accounted for. A practical way to explore this is to take a flaky scenario and break it down. Look at what changes between a pass and a fail. Consider whether it is affected by timing, sequence, caching or repetition. Then test those variables deliberately. That is how you move from “it is flaky” to something you can explain and fix.

Lesson 4: Observability is part of testability
One of the intended experiences in the challenge is the moment where something is clearly wrong, but it is difficult to prove why.
That is what incidents feel like when observability is weak. In distributed systems, a system that cannot explain itself will always feel fragile. Not necessarily because it fails more often, but because it is difficult to understand what is happening when it does.

A simple way to improve this is to define a small set of signals for a critical flow. For example, completion rate, latency and the distribution of different error types (error categorisation, grouping failures into meaningful types). Then deliberately introduce failure into a dependency and see whether those signals help you understand what is happening.

Can you identify what failed first? Can you see how it affected the user? Would you detect it quickly in production? Would you know what to do next? If the answers are unclear, the gaps are usually in tracing, error categorisation or the visibility of system behaviour.

Lesson 5: The best teams do not beat chaos. They manage it
Watching teams progress through the challenge tends to follow a consistent pattern. Initial attempts focus on volume, sending as many requests as possible. That shifts into measuring behaviour and tracking results. Eventually, the focus moves towards certainty, where teams optimise for consistent progress rather than raw throughput. This progression reflects what resilience looks like in practice.

A resilient system is not one that never fails. It is one that fails in known ways, stays within acceptable bounds, protects the core journey and recovers without unnecessary complexity. The difference between fragile and resilient systems is rarely the number of failures. It is whether failure is expected, understood and observable.

Try it with your team
If you want a practical way to explore these behaviours, you can try the platform at https://dcrypt.run/. The platform is completely open and free for anyone to use, whether you are running a team exercise, exploring it on your own or just curious to see how your approach holds up under a bit of controlled chaos.

Run it as a team exercise using different approaches, whether that is Postman, scripts or full clients. Compare how different strategies behave, not just how quickly they complete the task.

The real value comes from the discussion afterwards. Where do your own systems behave like the brute force approaches? Where do you rely on retries instead of design? Where might successful responses be hiding degraded behaviour? What hidden state do you not currently test? What would be difficult to explain during an incident?

Those questions are the point.

What do you think?
Where in your systems might successful responses be hiding underlying issues? How confident are you in your retry behaviour under stress? What would be hardest to explain if something started to degrade today? If you ran this with your team, what do you think would happen?

Adam Davis profile image
Adam Davis
Senior Systems Development Manager

Adam is a seasoned Software Engineering Manager with over a decade of experience leading medium to large-scale teams of testers, QA professionals, developers, and release engineers. He is passionate about cultivating collaborative environments that spark innovation, both in professional settings and through side projects. Adam believes that side quests are crucial for nurturing creativity and skill development, allowing individuals to explore new technologies and approaches. With a focus on creating scalable solutions across diverse infrastructure platforms, he combines a practical and analytical mindset to address complex challenges. Adam is dedicated to fostering a culture where teams can thrive and achieve outstanding results.

Attending MoTaCon 🤝
Sign in to comment
Explore MoT
QA Leadership Summit - The AI-Native Edge: Leading the Future of QA image
QALS Summer 2026: a leadership summit to move beyond AI testing pilots and build production-ready, AI-first QA organizations - powered by the BrowserStack AI Test Platform and 25+ connected AI agents
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Into The Motaverse image
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Subscribe to our newsletter