tanvi Mittal
QA lead
I am Open to Write, Mentor, Speak, Podcasting, Review Conference Proposals
Tanvi Mittal is an AI quality engineer and test automation leader specializing in testing LLM and agentic systems. She is a WCSC 2026 keynote speaker, IEEE Senior Member, and community mentor.
Achievements
Certificates
Level up your software testing and quality engineering skills with the credibility of a Ministry of Testing certification.
Activity
achieved:
This badge is awarded to members who Log in to MoT five days in a row.
achieved:
This badge is awarded to members who contribute a new term or an alternative definition to the software testing glossary.
earned:
Glossary
earned:
Beyond test coverage: engineering trust in AI-powered systems
earned:
Beyond test coverage: engineering trust in AI-powered systems
Contributions
Learn how test coverage is a necessary condition for shipping AI, but not a sufficient condition for trusting it.
AGENT-F Taxonomy A classification framework of 19 agent failure modes developed by Tanvi Mittal, used to build targeted test suites for AI agent systems. Failure modes include Authority Overreach, Context Hallucination, Tool Misuse, and Cascading Failure, among others.
Adverse Action Explanation A legally required notice provided to a consumer when a credit application is denied, explaining the reasons for the decision. Under CFPB guidelines, AI-generated adverse action explanations must be accurate, non-discriminatory, and human-interpretable.
Adversarial Input A deliberately crafted input designed to manipulate, deceive, or exploit an AI system, causing it to behave in unintended or harmful ways. Distinct from standard edge cases, which assume a cooperative user.
Authority Boundary The explicitly defined set of actions an AI agent is authorized to take within a given context, user session, or regulatory scope. Actions taken outside this boundary constitute Authority Overreach.
Authority Overreach An AGENT-F failure mode in which an AI agent takes actions beyond its defined authorization scope, for example, executing a transaction it was only authorized to initiate a request for.
Benchmark Evaluation A structured assessment of an AI model's accuracy using a defined set of test inputs with known correct outputs. Used to measure the Accuracy dimension of the SAFE-R rubric.
Cascading Failure An AGENT-F failure mode in which an error in one agent's output is passed as trusted input to the next agent in a pipeline, compounding silently through multiple processing stages without any individual node appearing to fail.
CFPB (Consumer Financial Protection Bureau) A U.S. federal regulatory agency that oversees consumer financial products and services. Relevant to AI testing in its requirements around fair, accurate, and interpretable adverse action explanations for AI-driven credit decisions.
Context Coherence A PACE dimension that validates whether an AI agent maintains an accurate understanding of the conversation state across a multi-turn session, verifying it does not hallucinate prior context or forget established facts.
Demographic Slicing An analysis technique used in AI fairness testing that segments model outputs by demographic variables such as geography, age, or protected class indicators, to identify disparities in output quality or framing across population groups.
Determinism Testing Testing that verifies whether an AI system produces consistent outputs across repeated runs with identical inputs. Used to measure the Consistency dimension of the SAFE-R rubric.
Direct Injection A PromptArmor attack category in which an attacker embeds malicious instructions directly in a user-facing input field, targeting the model's instruction-following behavior.
Disparity Analysis A quantitative comparison of AI system outputs across different user populations to identify statistically significant differences in accuracy, framing, or outcome used in fairness dimension testing.
Distributional Drift A failure mode in which an AI model's real-world performance degrades over time because the distribution of inputs it encounters in production has shifted away from the distribution it was trained or tested on.
Emergent Behavior A failure mode in which an AI system exhibits unintended behavior that only appears at the intersection of multiple inputs or conditions not detectable through individual unit or integration tests.
Escalation Correctness A PACE dimension that validates whether an AI agent correctly identifies scenarios requiring human review and routes them appropriately, rather than attempting autonomous resolution beyond its authorized scope.
Extraction Attack A PromptArmor attack category in which the goal is to cause the AI agent to output internal information such as its system prompt, context window contents, or proprietary logic that it should not surface to end users.Goal Hijacking A PromptArmor attack category in which an adversarial input redirects the AI agent's objective shifting it from its intended task to one chosen by the attacker, such as data extraction or unauthorized action execution.
Indirect Injection A PromptArmor attack category in which the malicious instruction is embedded not in the direct user input but in content the agent retrieves or processes such as a document, database record, or external webpage.
LogMiner-QA An open-source tool developed by Tanvi Mittal that generates privacy-preserving AI test cases from production logs. USPTO Provisional Patent #63/918,325. Available on GitHub under 77QAlab.
Model Risk Management (SR 11-7) A supervisory guidance document issued by the U.S. Federal Reserve that establishes standards for managing risks associated with the use of models in financial decision-making. Requires validation methodology that goes beyond functional correctness testing.
PACE Framework A framework developed by Tanvi Mittal for validating AI agent behavior across four dimensions: Prompt Integrity, Action Appropriateness, Context Coherence, and Escalation Correctness.
Prompt Injection An attack technique in which malicious instructions are embedded in user-controlled input fields, causing an AI agent to execute unintended actions or disclose unauthorized information. PromptArmor's six-category taxonomy classifies the major prompt injection attack types.
Prompt Integrity A PACE dimension that validates whether an AI agent's system prompt and operating instructions remain intact and unmodified under adversarial input conditions.
PromptArmor An open-source adversarial LLM testing library developed by Tanvi Mittal, aligned to the OWASP LLM Top 10 and MITRE ATLAS. Built around a six-category prompt injection attack taxonomy. Available as a VS Code extension and GitHub repository under 77QAlab.
Reg E (Electronic Fund Transfer Act) U.S. federal regulation governing electronic fund transfers, including consumer dispute resolution rights. Requires human accountability in the dispute resolution process for certain dispute types and thresholds directly relevant to agentic AI system scope definition.
Regression Testing A testing approach that re-executes existing test cases after system changes to verify that previously working functionality has not been broken. Relies on deterministic output assumptions that do not hold for AI systems.
Role Hijacking A PromptArmor attack category in which an adversarial input attempts to override the AI agent's assigned role or identity causing it to abandon its defined behavior and adopt one chosen by the attacker.
SAFE-R Rubric A structured AI output evaluation framework developed by Tanvi Mittal, scoring outputs across five dimensions: Safety, Accuracy, Fairness, Explainability, and Reliability. Designed to operationalize trust measurement as a gateable engineering artifact. Published on Zenodo, DOI: 10.5281/zenodo.18972360.
Test Coverage A metric expressing the proportion of a system's specified behaviors that are exercised by a test suite. High test coverage indicates breadth of scenario testing but does not validate system behavior in unspecified or adversarial conditions.
Tool Misuse An AGENT-F failure mode in which an AI agent calls the wrong tool for a given scenario, or calls the correct tool with incorrect parameters producing technically successful API calls that generate incorrect or harmful downstream outcomes.
Trust Score A composite measurement of an AI system's trustworthiness across the five SAFE-R dimensions. Used to establish release thresholds — minimum acceptable scores below which a system does not proceed to production.
Variance Scoring A technique for quantifying the degree of output variation an AI system produces across repeated runs with identical inputs. Used in Consistency dimension testing within the SAFE-R rubric.
Zenodo AI QA Practitioner Series A five-chapter open-access publication series authored by Tanvi Mittal, introducing the SAFE-R rubric, PACE Framework, six-category Prompt Injection Taxonomy, and AGENT-F Taxonomy. DOI: 10.5281/zenodo.18972360.