How I got started in testing AI-coded apps
Hey hey Rafaela Azevedo here !! For the past 17 years, I’ve worked as a software development engineer in test (SDET), specializing in QA, test automation, Web3, DevOps, and also monitoring and alerting.
Efficiency has always been a core passion of mine, and so I jumped to test automation ! It led me to experiment with AI tools. And today, many projects I work with are developed using AI tools.
The reality of testing AI-coded apps: an experience report
AI development tools bring unique challenges to testing, particularly in ensuring code quality, security and scalability. Most people who code with AI tools discover that while the tools can generate impressive prototypes, the complexities of real-world systems still require significant human intervention. In my opinion, this has placed QAs and testers at the forefront, now more critical than ever.
When AI tools first appeared on the scene, they seemed like the ultimate solution for efficient development, capable of producing quality code at lightning speed. Tools like Lovable.dev, Replit, Manus, A0.dev, and Firebase Studio excel at rapid prototyping. But the code they generate is often lacking when it comes to context, scalability, and system integration.
Debugging AI code is painful.
Just as an example: one of our projects was to build a Stripe-like platform for cryptocurrency. The AI coding agent produced the front-end prototype in seconds, but it overlooked vital aspects like the entire database and backend. So, we continued adding more details to the prompt, hoping for more complete results. But even with more complex prompts, the resulting code still lacked authentication protocols, data mapping, and optimized performance under load.
If you are not extremely concise and direct, the AI coding agent will simply create whatever it thinks is the best for you. It will mock data and produce a result as fast as possible, but it won't generate good code structure. Many times it will simply create one huge file containing all of the components and details, creating a huge and complex refactoring task.
Long story short: when you create your app, you can't give too much creative freedom to the AI coding agent.
In our case, the “finished” product required weeks of refinement to meet operational standards. Here are some of the code and the issues we found:
Integration complexities
AI-generated code frequently lacks context about existing systems. In one instance, we worked on an AI-built analytics dashboard that didn’t align with the startup’s data pipeline, leading to significant rework during integration.
The product that the AI coding agent built was messy due to this integration, to the point where sometimes we considered it might have been better just to start from scratch and code it ourselves. We spent considerable time simply understanding what the AI was trying to build. Imagine a junior developer, eager and enthusiastic to accomplish things, but utterly lacking in planning abilities, ability to handle complex structure, and deep knowledge. That's how AI coding assistants are.
Don't get me wrong: agents helped a lot and I advocate for their use. But people need to be aware of the downsides of using them and align their expectations.
Scalability issues
AI tools prioritize delivering quick results, often at the expense of scalability. For example, an AI-designed payment gateway integration failed to handle transaction spikes, requiring a complete rewrite. So you need to instruct the AI coding agent specifically to produce a scalable result.
Quality assurance and debugging
Testing AI-generated code is pure black-box testing (relies on the user’s perspective with no code insight). You have to understand why the code works (or doesn’t), which is always a challenge, especially due to the lack of documentation and inconsistent adherence to best practices.
In our case, the unit and integration tests that the AI coding agent created were very basic, incomplete, and full of incorrect assumptions. This might have been due to our prompts not always being the most concise possible.
End-to-end tests were created with Playwright. Again, if the prompt that generates the tests is not accurate, then the tests are also going to be quite simple.
And running the tests in our GitHub workflow was far from easy. The AI agent did not do basic things like syncing package-lock.json and package.json. We wound up having to copy and paste the tests and send messages to Lovable, the company that had provided the tool, so they could fix the problems. The technical person makes a lot of difference here as it is faster and easier to just change the code straight away.
Hard-coded mock data surprises
As you can see from the code screenshot below, the AI coding agent generated the front end first. Then the back end and functionality were created in line with the following prompts.
The problem is that much of the code will be hardcoded first and then test data is mocked. The mock is integrated with the functions so tightly that it's hard to tell what has been mocked and what hasn't. So when we tested the app, we had to proactively check which data were mocked. That took a long time.
Don't rely on the tests your AI tools produce: human testers are essential
In this era of AI-generated code, the role of testers and quality engineers has become more important than ever. AI tools produce code quickly, but ensuring the code's quality and reliability requires human expertise. The AI coding agents make many assumptions, many incorrect, and usually produce only the simplest of tests. As noted in Filip Hric’s article, the rise of AI coding tools does not diminish the need for testers—it amplifies it.
Testers now act as the primary line of defense against production defects. They are tasked with:
- Reviewing AI-generated outputs for alignment to requirements and standards
- Identifying and resolving scalability and integration challenges
- Ensuring that AI-coded apps are transformed into robust, production-ready solutions
- Documenting gaps and addressing areas where AI tools fall short
Practical takeaways for testing AI-coded projects
Test thoroughly: Pay close attention to edge cases, scalability, and integration points.
Document continuously: Check that documentation is up to date with the platform.
Embrace adaptability: Prepare for unexpected debugging and optimization challenges, particularly for mission-critical systems.
AI tools represent a new frontier in software development, accelerating processes and tackling repetitive tasks. However, they are far from a one-size-fits-all solution. Ensuring quality, scalability, and seamless integration demands skilled teams and clearly defined processes.
For testers and quality engineers, this era is an opportunity to redefine our roles. By embracing the challenges of AI-driven development, we can position ourselves as indispensable collaborators, ensuring that the promise of AI-generated code translates into real-world success.
For more information
- Are software testers using AI for test planning?, Aj Wilson
- An exploration of 10 AI software testing tools, Aniket Kulkarni
- Is test automation a hard skill that pushes out the soft?, Ady Stokes