MindMap: Testing in Production
This week we have a MindMap on Testing in Production contributed by Seth Eliot - Senior Knowledge Engineer with Microsoft Test Excellence. You might want to follow him on Twitter to – @setheliot. He did a great ‘Future of Testing’ series over at The Testing Planet, which partly included Testing in Production.
I’ve personally been inspired by the concept of Testing in Production and find that it is not an area that is well known about.
Please feel free to provide any feedback or ask any questions in the comments field.
Download (Formats include: PNG, PDF, Mind42, Freemind, Mindjet Manager, RTF)
Testing in Production
Results
TestOps (http://blogs.msdn.com/b/seliot/archive/2012/03/14/testops.aspx)
- Component Availability
- Transaction Latency
- System Data
- User Data
Static Results
- Test Case Pass/Fail
Data
- Big Data
- Ambient Data
- Data Retention
- Data Sanitization
- Real-time (streaming) Data
Risks
Data Issues
- PII: Personally Identifable Data
- Test/Production inter-mingling
- Impact on BI
- Corruption
Partner System Impact
- False Transactions
- Load Issues (DOS)
User Impact
- Loss of user functionality
- Bad PR with Users
- Effect on Enterprise or VIP users
SUT (System Under Test) Impact
SUT Failure
- Poor SUT performance
- Security Vulnerability
- System Complexity
- Cost of Bugs Found (Note: “Anything that *can* be cheaply found upstream should be found upstream prior to TiP”)
Motivation
- Real Users and Scenarios
- Real Production Ecosystem
Benefits
- Fidelity (Real Production)
- Proactive Response to Problems
Live Site Focus
- TestOps
- Tests as Monitors
- Testing Real USer Scenarios
Lower Costs
- Less up-front testing
- Faster Cadence
Team
Operations
- Data Center
- Cloud
Developers
Program Managers
- Business Representatives
- Customer Representatives
Testers
Test Design / Best Practices
Test Methodologies
Passive Monitoring
- Data Mining
- Real Performance Measurement
- Environment Validation
Active Transactions
- API Level
- End To End User Scenarios
Experimentation (Note: “Iterative techniques”)
- Controlled Experimentation
- A/B Design Testing
- Vnext/Vcurr Quality Testing
- Uncontrolled Experimentation
- Dogfood
- Beta
System Stress
- Load Testing (Note: “on top of real traffic”)
- Fault Injection (Note: “Like Netflix Chaos Monkey”)
Techniques
Exposure Control
Production Test hooks
Outside-In Transactions
Instrumentation
- Client
- Javascript
- GIF request
- Server
Crowd Sourcing
- Bug Bounties
- Scenario Walkthroughs
- Dogfood/Beta
- Focus Groups
Test Data Handling
- Data tagging
- Automated data cleanup
Partner Testing
- Partner Isolation
- Integrated Testing
Things to Avoid
- Significant User Impact
- Insufficient Up-Front Testing
- Ignore Risk / Lack Mitigations
- Impact Partners
- Skew Metrics (includes BI)
- Exclude Operations
- Mis-handle PII (Personably Identifiable Information)
- Unnecessarily expose test artifacts
Tools
Internal
- In-Production Test Harness
- Load Generation
- Chaos Monkey (e.g. Netflix)
- Operational Data Store (ODS)
External
Distributed Load
- SOASTA
- Load Storm
Outside-In Transactions
- Global Service Monitor for System Center 2012 (http://blogs.technet.com/b/momteam/archive/2012/06/19/global-service-monitor-for-system-center-2012-observing-application-availability-from-an-outside-in-perspective.aspx)
- Keynote
- Gomez
- Pingdom
Instrumentation
- .NET Application Performance Monitoring
- New Relic
- Javascript
- java.lang.instrument (http://docs.oracle.com/javase/6/docs/technotes/guides/instrumentation/index.html)
Data Collection and Processing
- Hadoop: Hadoop Ecosystem
- Ganglia
- Scribe
- Social: TweetStats - Others….
CrowdSourcing
- uTest
- Amazon MTurk
- Mob4Hire
A/B Design Testing
- WebTrends Optimize
- Adobe Test & Target
- Google Website Optimizer

You seem to have some attachments embedded in this MM.
Are they available?
Hands up I’ve not read the article so this may have been addressed.
The risks mentioned are massive! I’m not sure that any benefits could sufficiently outweigh them nor how you would go about mitigating them. Any thoughts?
Jennifer…
The original article is here: http://www.thetestingplanet.com/2011/11/the-future-of-software-testing-part-one-testing-in-production/
First, Testing in Production (TiP) is a set of tools and methodologies that are to be used as part of a test strategy. In other words TiP does not mean that you do not do up-front testing (UFT) also. I maintain the good coverage unit testing is “table stakes” to play. TiP however can reduce your UFT effort and cost. The point in brief is that you should not burn up a huge effort with trying to test up-front what only can be tested in production, while on the other hand you should never find a bug in production that you could have found up-front
Second, Testing in Production encompasses several methodologies (see the top of the red branch). These cover the range in risk from low to high. Lower risk methodologies include the “passive monitoring” ones where you are merely observing real data. This has some risks around data handling (especially personally identifiable info) and how you collect the data, but overall should have little impact on the system under test or its users. Active Transactions carry more risk as you are injecting synthetic data, but these are tractable with proper mitigations. Even the System Stress methodologies might seem just too risky, but we see evidence over and over that with careful engineering you can build in mitigations that bring the value to risk ratio into the green.