Reading:
Lessons in quality engineering from working with Cursor and Windsurf
TestBash Brighton 2025 image
On the 1st & 2nd of October 2025 we're back in Brighton for TestBash: the largest software testing conference in the UK

Lessons in quality engineering from working with Cursor and Windsurf

Experiment with AI development tools to understand their impact on delivery speed and the quality challenges they create for engineering teams

Lessons in quality engineering from working with Cursor and Windsurf image

AI developer tools that use generative AI to assist developers in their work are gaining traction. These tools include "copilots" and IDEs with integrated agents and chatbots. Developers use them to carry out tasks such as code completion, analysis, bug fixing, and full-blown development. 

What impact will these tools have on the quality of developers’ work and the products they build? I decided to investigate by setting myself a project using two popular AI development IDEs, Cursor and Windsurf. Here’s what I learnt, and my observations on how these increasingly popular tools might affect our work as quality engineers.

Understanding the development and testing context

Let’s begin with the project itself. Over the past few years, I’ve been developing a website for practising test automation and other testing activities, known as restful-booker-platform. A deployed version of it can be found at https://automationintesting.online. 

This site has served as the backbone for a lot of the training I’ve done. However, over the past few years, it’s started to get a bit dated. For some time, I’ve wanted to update the front end of the site with a more modern framework. 

When I first made the tool public, the front end worked as shown in the diagram below:

Diagram 1: The initial architecture of restful-booker-platform. A user connects to rbp-proxy, which in turn retrieves React components from assets-api. rbp-proxy can also connect to other APIs.

This architecture was a bit of a hack and caused issues with deployments, so it needed updating. I opted to replace it with NextJS, a front-end framework that would take care of my components and all my routing. The challenge, however, was that I knew nothing about how NextJS worked. So this was the perfect project to try out AI development tools that would be able to guide me in building the new front end and migrating the old back end as well. 

The NextJS architecture I wanted to create is shown in diagram 2:

Diagram 2: The proposed NextJS architecture for restful-booker-platform. A user connects to a NextJS API that provides React components as well as routes to other APIs. This simplifies the process of rendering the UI and accessing other APIs.

With that in mind, my success criteria were:

  1. The UI should work in the same way as it did before the move.
  2. All automated tests should still pass.
  3. I will develop an understanding of how NextJS works.

With a goal set, I selected my first AI development IDE, loaded up the project, and got started.

What are Cursor and Windsurf?

Before we get into my experiences with Cursor and Windsurf, it’s probably worth discussing what they are and how they differ from other IDEs. 

Over the past few years, since the emergence of tools like ChatGPT and GitHub Copilot, a lot of attention has been paid to how large language models can assist developers in their work. Initially, we saw this through auto-completion via GitHub Copilot and code suggestions through ChatGPT conversations. 

What AI development IDEs like Cursor and Windsurf do is embed the features (and more) directly into the IDE for a user to access quickly. Both IDEs are built upon the open-source Visual Studio ("VS") Code stack, but unlike the ‘vanilla’ VS Code, these tools offer:

  • Auto-completion for code and terminal commands
  • A chatbot to discuss development ideas and solve problems
  • AI agents that can take a request, break it down into tasks, and add the resulting code into the project

Add this to features like LLM model selection, dedicated shortcuts, and the ability to provide additional context through rules and file/folder references, and you end up with an IDE that is streamlined to use AI in your development workflow. 

For some, auto-complete will be sufficient by itself to speed up the boring bits of development. And for others, they can generate whole portions of projects purely by prompting an AI to do the work for you.

Getting started with Cursor

With my objectives in mind, I began the migration, starting with Cursor. Both Cursor and Windsurf offer trials that allow you a certain number of auto completions and calls to their respective AI agents. The plan was to begin with Cursor until I burned all my free credits and then move on to Windsurf.

Project migration with Cursor: walking before I could run

My strategy was to take the migration step by step. Rather than just saying, ‘Hey AI, migrate my project and make sure it works,’ I focused on distinct tasks to complete the migration. 

I started by creating a barebones, 'hello-world' style NextJS project. Asking the Agent to do this went without a hitch, and in a matter of minutes, I had a working NextJS project with a basic workflow in it.

Next, I migrated one component from the old project into the new project and focused on ensuring that it could grab data from my back-end APIs. Once again, success! 

I started growing in confidence and began to migrate more and more components into single requests, all the while deferring more and more responsibility to the AI agent within Cursor. First, I asked it to migrate one component, then two components together, then four, and so on. Things were going great, I was saving time, and the newly migrated front end was starting to come together. 

And then, it all started to go wrong.

Dealing with the consequences of using Cursor: overconfidence and other issues

My initial progress through the migration had been rapid. And then two significant problems occurred at the same time.  First, I exhausted my allowance for the Cursor trial. Second, weird things started happening in the migrated front end. 

To cut a long story short, I had fallen victim to overconfidence. Instead of managing smaller tasks, I had given more and more responsibility to the AI to build my new front-end. In my hubris, I’d allowed Cursor to make structural decisions in how the project was arranged. To put it bluntly, it was a mess! 

For example, three different files, not just one, were handling the routing to my Room API. What made matters worse was that I still didn’t understand how NextJS worked. I’d been too reliant on Cursor.

Given that my AI credits in Cursor were used up, I wanted to move over to Windsurf. But first, I needed to fix the mess that I had made and get myself up to speed with what was going on in my code. So I took some time to go through the code to understand it better and clean it up, manually. 

When it came time to try Windsurf, I decided to slow things down this time and focus on small, sliceable tasks. I also relied more frequently on autocompletion to rearrange the code in my project into a more sensible setup. And I began to expand the scope of work ONLY when I felt comfortable with what was going on and when I was happy with the project’s arrangement.

Moving to Windsurf: Migrating unit tests

Finally, once I had organised my project correctly and migrated all the components, I started to move my unit tests over. As I moved them, I initially decided not to worry about whether they passed or not. I just wanted to get them running. After that, I worked with the Windsurf agent to fix each of the failing tests. 

This process worked well, and it didn’t take too much time to get them all to pass. Here, I could rely on the AI agent to make tweaks, since it focused on specific tests, and I knew what a "pass" looked like. However, on the flip side, I did feel that the amount of code generated by the AI agent was more than expected, and I made a note to clean it up shortly. 

That said, the tests were passing, and the new front end was looking good. All that was left was to give it a quick exploratory test when I discovered….

The caching bug! Working with Windsurf and Claude

Ah, ‘The caching bug’ that reared up its ugly head near the end of the project, and almost derailed the whole project. 

After I'd completed the migration of production code and tests, fixed the tests, and bundled everything into a Docker image, an unusual error appeared. What should happen: whenever an unread message is opened on restful-booker-platform, the unread count is reduced by one, and the counter on the nav menu should decrease by one or disappear if the count is zero. This was certainly the case when I ran the application locally, but when I ran it via Docker… the count always stayed the same!

At this point, I had once again burnt through my allowance on Windsurf. I tried using the basic agent in Windsurf to fix the problem, but it wasn’t helpful. This was partially because it was a free AI agent as opposed to the new and shiny agents available in Pro, but mostly because the issue was environment-specific. Because the code worked locally, it couldn’t see any issues, and it went around in circles, tweaking and modifying to no avail. 

So I had to try a new tactic. Instead of relying on the AI agent, I spent multiple evenings debugging with Claude Sonnet to work out what was going on. This time, I went step by step, debugging what might have gone wrong. There were a few missteps, but eventually, I realised that I had to force my application not to cache the unread count. 

Thankfully, my debugging eventually paid off. If you fancy revelling in my pain, why not check out the chat thread with Claude. Alternatively, you can read through the chat in the attached PDF.

With the caching issue resolved, I was able to consider my migration project complete and reflect on what I had learnt.

What I learnt

You may or may not have come across the term ‘vibe coding’. The idea is that a developer gives up control entirely to the AI whilst developing projects or new features. When I started my migration project, I wasn’t aware of the term. However, looking back, I might well have been vibe-coding without knowing it. 

Now, the idea of no human behind the proverbial development wheel might strike fear into your tester's heart. However, I think a lot can be learnt from it. I believe that there are no absolutes when it comes to the relationship between developers and AI tools. In the future, we will see more and more developers adopt tools like Cursor and Windsurf. However, the degree to which developers will defer responsibility to said AI will depend on the individual. 

Vibe coding will exist on some scale, which brings me to my first observation as a quality engineer:

Understanding how much AI is being used on our teams

If we want to support developers in building quality products, we will need to tailor our responses to the individual. As an example, I might be working with a developer who is all-out vibe coding, having the AI do the work. In that case, my focus as a tester would be on how the developer knows what "good" looks like, how they are breaking down the work, and how they communicate requests to an AI agent. In this situation, I need a broader set of tasks and skills compared to the ones we currently use to support developers who are simply using AI for auto-completion. 

These are extremes, and I’m not saying one way is better than the other, but most developers will ultimately live on some point of the vibe coding scale. During my experiment, I felt I had travelled much of the scale myself. But once I had found my happy spot within it, the work flowed quickly. 

Which raises the question...

Responding to the increased development throughput that comes from AI assistance in coding

AI tools, used correctly, have the capacity to speed up our delivery massively. This might prove challenging for quality engineers and testers who work with multiple developers on a team and remain responsible for a certain amount of testing. 

So, how do we manage that? To me, this underlines the need for teams to evolve faster. Teams should own quality, and getting buy-in for that mindset team-wide is more of a priority than ever before. Otherwise, we risk reckoning with testing bottlenecks that teams seek to optimise in ways that can damage quality instead of elevating it.

Another observation with these AI tools is that different situations require different AI solutions. Consider my caching bug. To fix that issue, I had to do a lot of debugging myself, picking and choosing as and when to use AI to help. Pairing with AI helped solve my issue, but I couldn’t wholly rely on it because the AI-generated code created the bug in the first place! We can’t rely too much on AI agents or chat because we can easily lose understanding of what we’re building. 

Just the same, we don’t necessarily want to eliminate any advantage we might get and reject these tools altogether. Teams will need to figure out how to use these tools effectively. This means, as QEs, we will need to:

Supporting teams in using AI in the right way for the right task

Developers are under pressure to deliver, so they might not have the time to do due diligence and understand what works best for them. However, if we take some time to experiment and learn, we can pass that knowledge on to our teammates so they can move faster and more confidently.

To wrap up

There are lots of opportunities to move faster with AI, but it requires informed and intelligent ways of working with them. I haven’t touched on the code quality that an AI produces because that is a discussion in its own right. But whether they are better or worse at producing code, AI tools are going to be used. 

This means we need to get wise to what these tools can and can’t do and how best to coach and support developers to use them smartly and sensibly. If we can get that right, then we can reap the benefits and minimise the risks. 

So I encourage you, regardless of your abilities, to try these tools out for yourself, learn from them, and share your perspectives with others. Because the more we learn, the better prepared we are to support our fellow developers.

For more information

Tester, Toolsmith, Author and Instructor
Mark Winteringham is a tester, toolsmith and author of AI-Assisted Testing and Testing Web APIs, with over ten years of experience providing testing expertise on award-winning projects across a wide range of technology sectors, including BBC, Barclays, UK Government and Thomson Reuters. He is an advocate for modern risk-based testing practices and trains teams in Automation, Behaviour Driven Development and Exploratory testing techniques. He is also the co-founder of Ministry of Testing Essentials a community raising awareness of careers in testing and improving testing education. You can find him on Twitter @2bittester or at mwtestconsultancy.co.uk
Comments
TestBash Brighton 2025 image
On the 1st & 2nd of October 2025 we're back in Brighton for TestBash: the largest software testing conference in the UK
Explore MoT
Webinar: Beyond the Dashboard - Efficient GUI Testing for Android Automotive image
Tue, 24 Jun
Testing Android Automotive apps is getting more complex. In just 45 minutes, see how Squish helps QA teams automate testing across embedded systems—efficiently and at scale.
MoT Software Quality Engineering Certificate image
Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.
Leading with Quality
A one-day educational experience to help business lead with expanding quality engineering and testing practices.
This Week in Testing image
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter
We'll keep you up to date on all the testing trends.