Term: Conclusive Results
Anne’s Definition: Also known as statistically valid results, these are measured test results from which a formal conclusion may be safely drawn. You are hoping the results of one test version are significantly better (or worse) than the other so a winner may be declared. Testing platforms, including Google Content Experiments and Omniture, help with the statistical math by showing you the “Chance to Beat Original” for each cell as the test runs. Most statisticians say they do not consider this data conclusive until that Chance reaches 95%.
(We’ve heard statisticians argue vehemently over this though. Some say they need to see a 98% probability before rolling out to the “winning” version. Others say no matter how accurately we think we measure, in some ways the numbers will always be fuzzy enough that 90% is strong enough if you can’t get any higher easily. As journalists, we look for the 95% and cross our fingers. )
In order to achieve conclusive results, you need a certain volume of conversions – at minimum a few hundred – within a short time period – at maximum perhaps four-six weeks. Use your current conversion rate to calculate how much unique traffic you’ll need to the test for the estimated time period.
If you have an extremely high volume of conversions, you may also want to run tests for a longer time than the conversions alone might warrant, such as at least a week, including a non-holiday weekend. That’s because time of day, day of week, and many other external factors may cause an unusual blip in results. Your goal in testing is not to find what works at this precise second, but rather what may be used as your new control until you resume testing again someday.
Test design also affects conclusiveness. For example, if you’re running an A/B test and change two elements at the same time on a single test page, you won’t be able to draw a conclusion as to how each one affected results. One element could lift response 5% and the other depress it 5%, thus appearing as though your changes had no affect at all!
Also, you can’t extend conclusive results to other circumstances (although this is very tempting). If you send a different type of traffic to that page (past buyers instead of newbies, clicks from a different PPC search term, or banner clicks from a different media buy) you’ll get different results. Time itself is another varying factor – a page that was a red-hot-conversion-machine at one time may start to lag after a few months due to any number of (mostly external) circumstances. This is why you need to create and test different controls for varying demographics and why you need to continue testing for better performing controls over time. For best results, testing is never completely done. No test is ultimately and definitely conclusive.
Lastly, beware of assuming your conclusions from one test will hold true on other site pages – or worst of all, on other sites. Testing experts all agree they’re frequently surprised by WhichTestWon. Prices, graphics, hotlinks, copy, etc. that work gangbusters for one page or site may bomb on another. That’s why testing is never boring and why you have to test even if you built your site relying 100% on best practices.
> Free PDF: Get 40-page booklet of all 125 Testing Terms
Looking for a full-service testing firm to improve your site's results more easily?
> Click here for info on our Sponsor WiderFunnel