Introduction
The role of the tester has never been static! From the personal touch of verification to automated regressions, Quality Assurance (QA), and now Quality Engineering, software testing has evolved alongside the software industry's transformations. Yet the rise of Artificial Intelligence (AI), Machine Learning (ML) and Self-Adaptive Systems introduces a fundamentally different view for software testing. Self-adaptive systems are structured to change their behaviour while running. They do this in response to changes in their environment or within the system itself. They are systems that learn, decide and evolve autonomously. Testing systems is not just about verifying static requirements. We have to think about the bigger picture, dynamic reliability, ethical integrity, and the big thing. Ongoing client or customer trust.
Why does traditional testing fall short?
Traditional testing, even when automated, is fundamentally deterministic. We operate on the assumption that a system will behave in a certain way and that a user will operate it in a certain way. Testers design cases to confirm that a system produces the expected outputs, measure coverage and ensure compliance with functional requirements.
This model works well for static or general rule-based systems, such as banking transaction software, e-commerce checkout flows, or even RESTful APIs, where the logic is well-defined and repeatable. However, there is a new generation of software, such as AI-driven, Self-Adaptive Systems and context-sensitive systems, that no longer follow these predictable patterns.
The nature of probabilistic behaviour
Probabilistic behaviour refers to a system's behaviour being governed by probability rather than being fully deterministic. In such systems, the same input or situation may lead to different outcomes, each with a certain likelihood. AI systems, in particular those that are based on ML, operate in a more probabilistic manner. Instead of executing predefined logic, they refer to outcomes and patterns learnt from data.
- A chatbot trained on natural language models might often provide a different response to the same query, influenced by previous context or by randomisation in language generation.
- A recommendation engine recalibrates its results continuously in response to shifting user behaviour and external trends.
- An autonomous vehicle perception system might interpret identical sensor data differently depending on environmental data or prior model adjustments.
In such cases, there may not be a single “correct” output but a range of acceptable behaviours that meet confidence thresholds. This variability essentially challenges the logic of pass/fail testing.
Data Dependency and Model Drift
Unlike deterministic systems whose behaviour is embedded in the code, AI behaviour emerges from data. Testing must therefore validate not just the code that has been written but also the quality of the data and its inherent bias.
- If the data distribution changes (for example, due to the user's demographics or seasonal behaviour), the system's performance may degrade without any code changes.
- Models retrained on incomplete or skewed datasets may often unintentionally alter the system's fairness or accuracy.
Traditional test automation rarely monitors data drift, leaving organisations unaware of subtle or cumulative quality degradation.
The limits of scripted automation
Even the most robust automation frameworks often struggle with adaptive systems! Scripted tests are fragile when the UI or even API’s evolve frequently. In AI systems where behaviour may change with each retrained cycle, maintaining a static set of test scripts becomes unsustainable.
To solve this, testers must build more adaptive, self-learning validation frameworks that can respond to uncertainty. This is a major shift from static regression testing to continuous, intelligent regression testing.
Evaluating explainability, not just accuracy
Another limitation of traditional testing is its narrow-minded focus on output correctness. For AI systems, the correctness alone is not enough, and testers must evaluate the explainability factor, whether the system can justify its decision. For example, a credit-scoring model produces predictions, but if you cannot explain its reasoning, then this may fail with compliance audits or even ethical standards, even if this is technically right.
Complexity and emergent behaviour
AI-driven systems often exhibit emergent properties, such as unexpected behaviours arising from the interactions among multiple models or agents. Testing such emergent complexity requires some scenario simulation, behavioural analytics and stress testing across all interacting components.
In summary, the traditional testing assumes stability, whereas AI systems embrace evolution. Software testing’s job is no longer just to validate outcomes but to assess confidence, interpret variability, and safeguard trust not only in the system but also in the AI's behaviour.
The emerging challenge
As AI transitions from a supporting technology to an operational decision maker, the quality conversation expands from technical reliability to include ethical and social accountability.
The critical question moves from “does it work as designed?” to “can we trust it to act responsibly?".
Redefining quality in the age of AI: The AI FART Model
Historically, when we say the word “Quality”, we mean meeting both functional requirements and quality characteristics. Now, in the AI world, we need a broader view of quality that includes human and societal impacts. This broader view is captured in the AI FART model:
- Fairness: The system must not discriminate or disadvantage people
- Accountability: There must be clarity on who is responsible for the AI outcomes
- Resilience: Systems must adapt safely to changing data and context
- Transparency: Decisions should be explainable and traceable
This much wider definition of quality demands new metrics, including fairness indices, confidence intervals, and interpretability scores, alongside traditional KPIs such as accuracy, defect density, and pass / fail. These additional measures help us assess whether our AI systems are living up to the AI FART model in practice.
The trust gap
AI systems often make decisions faster and at a scale that humans can’t match. However, when they fail, the consequences can be bigger than belief. Biased hiring algorithms, self-driving car accidents, and even misdiagnosed medical conditions are among the catastrophic outcomes.
Software testing has become the guardian of trust, ensuring that automation doesn’t remove accountability. Building this trust requires combining quantitative validation (accuracy, robustness, and regulatory compliance) with qualitative assurance (ethics and user perception) and continually asking whether Fairness, Accountability, Resilience and Transparency are being upheld.
Blending technical and ethical assurance
The new generation of software testing professionals must operate at the intersection of three areas:
- Technical Assurance: Understanding ML pipelines, testing AI explainability and verifying data integrity
- Ethical Governance: Applying frameworks such as ISO and IEEE 7000 to guide responsible testing
- AI Literacy: Being able to interpret model outputs, understand bias mechanisms and engage with data meaningfully
This fusion transforms software testing from a downstream activity into a strategic governance function, one that not only detects bugs but shapes organisational ethics and risk culture around fairness, accountability, resilience and transparency.
Continuous reasoning over static execution
AI systems always develop after deployment. They retain, readapt and sometimes self-optimise in real time. This means testing can't always end at the release. It becomes a continuous loop, monitoring live data, recalibrating benchmarks and validating how the system learns.
Preparing for the next phase
In the coming years, we will find that software testing will become less about tools and more about focusing on judgment.
The software testing of the future must:
- Question whether system behaviour aligns with societal values
- Collaborate with data and ethics teams to define risk thresholds
- Audit autonomous decisions for fairness and transparency
- Interpret AI-driven decisions, not just verify them
The professionals who succeed will not just find bugs but also build trust in intelligent systems.
The rise of autonomous testing agents?
Autonomous testing agents represent the most advanced form of AI-augmented software testing, and unlike rule-based automation frameworks, these agents leverage ML, natural language processing (NLP), and reinforcement learning to independently create, maintain, and adapt tests.
Capabilities and mechanisms
- Self-healing test automation: When a UI element or API changes, the agent dynamically updates its locators and parameters using pattern recognition
- Autonomous test generation: Agents use application models, code coverage metrics or behaviour-driven patterns to generate test cases without human scripting
- Intelligent prioritisation: ML models analyse past defect trends and runtime data to focus on high-risk areas
- Continuous learning: Through reinforcement feedback loops, the system learns from past executions to improve efficiency and accuracy
- Predictive analytics: Agents forecast potential failure points or regression hotspots before they occur
These capabilities extend testing beyond automation and move towards autonomy, enabling the system to operate, adapt, and optimise independently within predefined ethical and operational constraints.
Benefits and real-world impact
Autonomous testing accelerates development cycles and strengthens reliability in continuous delivery environments.
- Efficiency: Testing coverage increases without taking time away from the sprint and with minimal human effort
- Resilience: Systems adapt to change without constant script maintenance
- Scalability: Large-scale and distributed testing becomes more feasible
- Cost reduction: Long-term maintenance and regression costs decline
Emerging implementations
- AI-powered frameworks: Tools like Mabl and Testim already employ ML to generate and self-heal tests
- Model-based testing with AI: Systems such as Diffblue Cover (for Java) use AI to automatically create unit tests from code
- Autonomous agents in DevOps: AI-driven platforms that integrate directly into CI/CD pipelines and perform tests in real time, whilst adjusting coverage based on release frequency
With all the positive points mentioned in this section, there is also a risk of introducing new complexities, such as opaque decision-making, ethical risks, and potential dependence on systems that testers cannot fully explain.
The ethical dimension of AI in software testing?
Testing at its core is about trust. But trust in AI cannot be assumed. It must be engineered, audited and governed.
Bias and fairness
AI systems learn from data that often encodes human prejudice. In testing, this can lead to:
- Biased datasets: Automated validation might ignore edge cases involving underrepresented user groups
- Discriminatory logic: Algorithms may replicate societal inequities (e.g Lending decisions or hiring filters)
- Confirmation bias in test selection: AI agents could favour scenarios similar to previous successful ones, which in turn reduces the coverage diversity
An ethical software testing framework requires testers to act as bias detectors by using data-sampling audits, fairness metrics, and adversarial testing to uncover hidden inequities.
Transparency and explainability
“Black box” AI testing presents a critical problem when an autonomous testing agent decides to skip, fail or prioritise a test. The reasoning can often be cloudy.
Software testing must then push for explainable AI techniques such as visualisations, traceability logs and model introspection that make decision-making transparent. This transparency is not optional. We should classify it as the focus for accountability.
Privacy and Data Ethics
Test data often mirrors production environments, and in an AI context, this means personal or sensitive information might inadvertently train or inform models. Ethical testing demands that:
- Data anonymisation is complicit
- Synthetic data generation for privacy-preserving tests
- Clear data retention policies aligned with GDPR and ISO 27001
The Ethical Testers Mandate
Ethical software testing goes beyond compliance. It represents a commitment to societal responsibility and, in turn, ensures that technology enhances rather than exploits human welfare. Future software testers must be fluent in ethical frameworks (e.g. IEEE 7000, EU AI Act) and able to operationalise them in testing pipelines.
Human oversight: Why machines still need us?
Trust but verify
Autonomous agents excel at pattern recognition but lack the moral reasoning that humans bring to the table. They can’t understand why a bug matters, only that a pattern deviated. Human oversight provides the interpretive layer that converts data anomalies into actionable insight.
The human cognitive edge
Human testers bring a wealth of different talents and expertise. They bring intuition, empathy and contextual reasoning that no algorithm can replicate.
- A human recognises when an error impacts accessibility for visually impaired users
- A human senses reputational risks beyond metrics
- A human questions whether “passing” AI behaviour is ethically acceptable
The oversight models
Modern software testing strategies should employ what some people may call a Human-in-the-loop (HITL) or Human-on-the-loop (HOTL) model:
- HITL: Humans actively guide or correct AI actions during testing
- HOTL: Humans supervise at a strategic level and intervene only when anomalies appear
This oversight ensures a balance between automation efficiency and human discernment with an ethical compass.
Evolving roles and responsibilities in software testing?
From executors to orchestrators
The job of software testing is shifting from testing to designing intelligent environments. Software testing leaders will manage networks of software testing agents, interpret results and align outcomes with the business strategy. Testing then becomes an act of orchestration, coordinating team members, tools, and data flows toward measurable quality outcomes.
New role archetypes
- AI Trainer: Feeds and calibrates testing agents with the relevant data
- AI Curator: Validates model outputs by ensuring alignment with goals and expected outcomes
- Ethics Champion: Embeds fairness and transparency criteria into acceptance criteria
- AI Auditor: Evaluates AI-driven software testing pipelines for compliance and accountability
- Data Steward: Oversees data governance and synthetic data generation for ethical testing
The evolution of skills in an AI context
The table below explains how traditional software testing skills evolve in the context of AI. The intention is not to suggest that future-oriented skills directly replace traditional ones. Many of these traditional skills can be built on, extended or recontextualised. This can be done by amending or updating existing traditional testing practices to address the unique characteristics of AI-based systems. Such as the uncertainty, learning behaviour and ethical concerns that AI brings.
For example, test case design remains an important part of the process. But in AI systems, test case design can be completed by an AI tool. Therefore, we need the skills to undertake model validation, which assesses the model's performance, bias, robustness, and generalisation. Rather than on fixed input / output correctness. Similarly, automation scripting evolves into configuring and evaluating learning processes, such as reinforcement learning, rather than scripting deterministic test flows.
This table highlights a shift in emphasis from testing predictable, rule-based software to testing systems that adapt, learn, and operate under uncertainty. Traditional skills are therefore not discarded, but augmented with new competencies that reflect the demands of AI-driven systems.
| Traditional software testing skill | Future-oriented skill |
| Test case design | AI model validation |
| Automation scripting | Reinformment learning configuration |
| Bug reporting | Ethical impact assessment |
| Regression testing | Continuous risk evaluation |
| Tool expertise | AI literacy and governance |
Organisational integration
Future software testing teams will collaborate across multiple domains:
- With developers: embedding testing earlier through AI-driven shift left practices
- With AI engineers: co-designing interpretable and fair models
- With compliance teams: ensuring legal and ethical conformity
- With UX and product teams: connecting quality to user experience
Mapping the software testing horizon: Future scenarios
Fully autonomous pipelines
In this vision, AI-driven agents continuously generate, execute and evaluate tests reporting to the software testing team for validation. Systems learn from real-time findings, enabling autonomic quality control and an evolution parallel to self-driving vehicles in DevOps environments.
Key risks:
- Over-reliance on machine decisions
- Lack of explainability in automated go / no-go gates
- Drift between organisational ethics and AI behaviour
Hybrid human-AI collaboration
A more balanced future sees the software testing team and AI working more in sync. AI handles scalability, coverage and optimisation, whilst software testing focuses on ethics, creativity and strategic direction. This hybrid model is the most practical near-team scenario and aligns with current business adoption methods.
Key risks:
- Over-reliance on AI outputs
- Skill Gaps and Misalignment
- Decision Accountability
- Cultural Resistance
- Data Dependency
Regulation-driven software testing
As AI testing becomes integral to most critical infrastructure, governments and industry bodies' standards will begin to be structured more formally around AI assurance, auditability, and ethical compliance. Software testing will become a more regulated discipline, emphasising certification, traceability and ethical conformance. Software testing teams will need to demonstrate not only technical competence but also adherence to emerging frameworks such as ISO/ISEC 42001:2023 (AI management systems) and the EU AI Act.
Key risks:
- Regulatory Fragmentation
- Compliance Over Innovation
- Audit Fatigue
- Evolving Standards
- Accountability Gaps
The tester as an AI Auditor
In this scenario, the software testing team member evolves into an AI auditor who specialises in reviewing, validating and certifying the quality decisions made by autonomous systems. This role demands multidisciplinary expertise spanning ML, data governance, ethics and risk analytics. Software testing in this model serves as the final line of defence, ensuring that AI-driven decisions remain transparent, explainable, and aligned with values. In many ways, the AI auditor becomes the guardian of the AI FART model across the organisation.
Key risks:
- Knowledge Barriers
- Tooling Limitations
- Bias Blind Spots
- Conflict of Interests
- Liability Exposure
To sum up
The evolution of testing toward autonomy is not just a technical revolution. It is a philosophical one. As AI-driven systems take on greater responsibility for quality, the definition of “testing” itself expands from verifying code correctness to ensuring the integrity of intelligent systems.
Autonomous agents will bring unprecedented precision, scalability and adaptability. Yet, they will also magnify the ethical, social and governance implications of technology. Human oversight will remain indispensable, not because machines are incapable, but because quality is ultimately a human value, rooted in judgement, empathy and responsibility.
The future of testing will belong to those who embrace this duality and master the science of AI while also championing the art of ethics. In doing so, testers will transcend their traditional role, becoming not just guardians of software quality but stewards of digital trust in an AI world. Simple frameworks like the AI FART model can help keep that trust work concrete and focused.
What do YOU think?
Got comments or thoughts? Share them in the comments box below. If you like, use the ideas below as starting points for reflection and discussion.
Questions to discuss
- How is your team adapting software testing practices for AI-driven or self-adaptive systems?
- Have you implemented autonomous testing agents in your pipeline? What challenges or benefits have you observed?
- How do you currently ensure ethical oversight, fairness and transparency in your testing processes? Do you use any simple quality models, such as the AI FART model, to support those conversations?
- In hybrid human-AI software testing models, how do you balance automation efficiency with human judgment?
- What skills do you think software testing professionals need to thrive in a world of AI-driven testing?
Actions to take
- Educate yourself on AI-driven and self-adaptive systems
- Conduct a proof of concept with AI tooling
- Capture your thoughts and share with others
References:
- EU AI Act: first regulation on artificial intelligence - European Parliament
- Prompting for testers course by Rahul Parwal
- Advanced prompting for testers course by Rahul Parwal
- Using personal data in test safely: How to comply with the GDPR article by Ioan Solderea
- ISO/ISEC 42001:2023 (AI management systems) - ISO