Reading:
Testing AI-coded applications: Practical tips for software testers
Reduce flakiness. Try Squish for free. image
Enhance test coverage, and streamline automation. Take a tour!

Testing AI-coded applications: Practical tips for software testers

Tackle the messy reality of AI-coded apps with lessons learned, testing tips, and debugging insights.

Testing AI-coded applications: Practical tips for software testers image

How I got started in testing AI-coded apps 

Hey hey Rafaela Azevedo here !! For the past 17 years, I’ve worked as a software development engineer in test (SDET), specializing in QA, test automation, Web3, DevOps, and also monitoring and alerting. 

Efficiency has always been a core passion of mine, and so I jumped to test automation ! It led me to experiment with AI tools. And today, many projects I work with are developed using AI tools.

The reality of testing AI-coded apps: an experience report

AI development tools bring unique challenges to testing, particularly in ensuring code qualitysecurity and scalability. Most people who code with AI tools discover that while the tools can generate impressive prototypes, the complexities of real-world systems still require significant human intervention. In my opinion, this has placed QAs and testers at the forefront, now more critical than ever.

When AI tools first appeared on the scene, they seemed like the ultimate solution for efficient development, capable of producing quality code at lightning speed. Tools like Lovable.devReplitManusA0.dev, and Firebase Studio excel at rapid prototyping. But the code they generate is often lacking when it comes to context, scalability, and system integration.

A two-panel meme featuring the "Hide the Pain Harold" stock photo character, an older man smiling uncomfortably while using a laptop and holding a mug. Top panel text: "MY AI JUST WROTE THIS 10000 LINES CODE IN 2 MINUTES" Bottom panel text: "NOW I WILL SPEND 2 DAYS DEBUGGING IT" The meme humorously captures the irony of AI quickly generating code that still requires significant human effort to fix and debug.

Debugging AI code is painful.

Just as an example: one of our projects was to build a Stripe-like platform for cryptocurrency. The AI coding agent produced the front-end prototype in seconds, but it overlooked vital aspects like the entire database and backend. So, we continued adding more details to the prompt, hoping for more complete results. But even with more complex prompts, the resulting code still lacked authentication protocols, data mapping, and optimized performance under load. 

If you are not extremely concise and direct, the AI coding agent will simply create whatever it thinks is the best for you. It will mock data and produce a result as fast as possible, but it won't generate good code structure. Many times it will simply create one huge file containing all of the components and details, creating a huge and complex refactoring task. 

Long story short: when you create your app, you can't give too much creative freedom to the AI coding agent.

In our case, the “finished” product required weeks of refinement to meet operational standards. Here are some of the code and the issues we found:

Integration complexities 

AI-generated code frequently lacks context about existing systems. In one instance, we worked on an AI-built analytics dashboard that didn’t align with the startup’s data pipeline, leading to significant rework during integration.

The product that the AI coding agent built was messy due to this integration, to the point where sometimes we considered it might have been better just to start from scratch and code it ourselves. We spent considerable time simply understanding what the AI was trying to build. Imagine a junior developer, eager and enthusiastic to accomplish things, but utterly lacking in planning abilities, ability to handle complex structure, and deep knowledge. That's how AI coding assistants are. 

Don't get me wrong: agents helped a lot and I advocate for their use. But people need to be aware of the downsides of using them and align their expectations.

Scalability issues 

AI tools prioritize delivering quick results, often at the expense of scalability. For example, an AI-designed payment gateway integration failed to handle transaction spikes, requiring a complete rewrite. So you need to instruct the AI coding agent specifically to produce a scalable result. 

Quality assurance and debugging 

Testing AI-generated code is pure black-box testing (relies on the user’s perspective with no code insight). You have to understand why the code works (or doesn’t), which is always a challenge, especially due to the lack of documentation and inconsistent adherence to best practices.

In our case, the unit and integration tests that the AI coding agent created were very basic, incomplete, and full of incorrect assumptions. This might have been due to our prompts not always being the most concise possible. 

End-to-end tests were created with Playwright. Again, if the prompt that generates the tests is not accurate, then the tests are also going to be quite simple.

And running the tests in our GitHub workflow was far from easy. The AI agent did not do basic things like syncing  package-lock.json and package.json. We wound up having to copy and paste the tests and send messages to Lovable, the company that had provided the tool, so they could fix the problems. The technical person makes a lot of difference here as it is faster and easier to just change the code straight away.

Hard-coded mock data surprises 

As you can see from the code screenshot below, the AI coding agent generated the front end first. Then the back end and functionality were created in line with the following prompts. 

Screenshot of a React component displaying hardcoded course categories in the front-end code. The section is part of a website footer for "TechEdge Academy" and includes list items like "Artificial Intelligence," "Blockchain," "Quantum Computing," and "Web3 & DeFi." A red arrow and annotation highlight the issue: "Front-end mixed with backend (categories of the courses hardcoded in the frontend)." This example demonstrates a common problem with AI-generated code—lack of separation between concerns and poor scalability.

The problem is that much of the code will be hardcoded first and then test data is mocked. The mock is integrated with the functions so tightly that it's hard to tell what has been mocked and what hasn't. So when we tested the app, we had to proactively check which data were mocked. That took a long time.

Don't rely on the tests your AI tools produce: human testers are essential

In this era of AI-generated code, the role of testers and quality engineers has become more important than ever. AI tools produce code quickly, but ensuring the code's quality and reliability requires human expertise. The AI coding agents make many assumptions, many incorrect, and usually produce only the simplest of tests. As noted in Filip Hric’s article, the rise of AI coding tools does not diminish the need for testers—it amplifies it.

Testers now act as the primary line of defense against production defects. They are tasked with:

  • Reviewing AI-generated outputs for alignment to requirements and standards
  • Identifying and resolving scalability and integration challenges
  • Ensuring that AI-coded apps are transformed into robust, production-ready solutions
  • Documenting gaps and addressing areas where AI tools fall short

Practical takeaways for testing AI-coded projects

Test thoroughly: Pay close attention to edge cases, scalability, and integration points.

Document continuously: Check that documentation is up to date with the platform.

Embrace adaptability: Prepare for unexpected debugging and optimization challenges, particularly for mission-critical systems.

AI tools represent a new frontier in software development, accelerating processes and tackling repetitive tasks. However, they are far from a one-size-fits-all solution. Ensuring quality, scalability, and seamless integration demands skilled teams and clearly defined processes.

For testers and quality engineers, this era is an opportunity to redefine our roles. By embracing the challenges of AI-driven development, we can position ourselves as indispensable collaborators, ensuring that the promise of AI-generated code translates into real-world success.

For more information

QA & SDET Consultant | Founder at The Chaincademy
Diving into the entrepreneurship world @ The Chaincademy, a tech career builder, assessing skills gaps, creating a roadmap + matching with jobs. TechWomen100 Award Winner, QA, SDET, TAU Instructor
Comments
Reduce flakiness. Try Squish for free. image
Enhance test coverage, and streamline automation. Take a tour!
Explore MoT
Leading with Quality image
Tue, 30 Sep
A one-day educational experience to for quality engineering leaders
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Leading with Quality
A one-day educational experience to help business lead with expanding quality engineering and testing practices.
This Week in Testing image
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter
We'll keep you up to date on all the testing trends.