Head-to-head testing and its limitations
Head-to-head testing involves what the name implies—it compares two vendors directly. It’s done by randomly splitting an audience in half, assigning one vendor to each group, and then evaluating performance based on which partner claims more conversions.
However, this oversimplification masks a core problem. Both vendors may take credit for conversions that would have happened anyway. These are essentially organic or branding-related sales that a provider is taking credit for—it’s known as credit capture and it can fog up the window on true value.
A strict set-up is needed, with each partner using the same KPIs, identical creatives, and the same evaluation criteria. It’s a focused look at two competing vendors, but this methodology misses one key detail. Namely, from a business point of view, that you want vendors working for you and your brand, rather than working against each other.
In practice, campaigns are not run and measured in isolation. Multiple retargeting creates synergy based on overlap, reinforcement, and combined impact. Even if one vendor emerges as the “winner” or “cheaper partner” in the like-for-like test, head-to-head testing fails to reveal the overall additive value that each vendor brings to the broader marketing mix.
Revealing more—incrementality testing as the gold standard
Incrementality testing shifts the focus from relative performance to causal lift. It’s about determining additive sales, calculating the incremental return on ad spend, with a focus on real, rather than attributed results. It’s the fullest picture and best indicator of the worth of a vendor. Unlike head-to-head testing, it represents the true value that a vendor brings to your campaigns—the causal value, in particular, the conversions that would disappear without that vendor in your team. For this reason, many marketers view incrementality testing as the most complete and reliable indicator of partner effectiveness.
Incrementality testing is widely considered the best method to validate performance, but despite the emergence of advanced tools and improved ways of testing—such as ghost ads—misconceptions persist as to the complexity and cost involved. So-called opportunity cost is also often a concern. These are the conversions you may lose out on by intentionally withholding ads from—or serving ghost ads to—the control group. However, only differing conversions between the test group and the control group will provide you with conclusive insights on incremental value. It’s important to note that opportunity cost will also be as present with head-to-head testing.
Requirements for head-to-head testing
We’ve already mentioned that to set up head-to-head testing, you need a truly random 50/50 audience split. But the requirement for equivalence goes beyond audience allocation. But the requirement for equivalence goes further. Both vendors must receive identical budgets and operate during the same execution window. Of course, each partner must optimize toward the same KPIs. What’s more, both vendors must deploy the exact same creatives and calls to action to isolate pure technology performance. Let’s pause here for a second to appreciate the import of this.
Vendors differentiate themselves not only through technology but also through creative innovation. Ad formats, design quality, and visual impact can and should vary widely. Head-to-head testing ignores these differences, even though conversions result from the combined effect of creative and technology working together.
For a head-to-head test to gather enough statistically significant data, an uninterrupted period of 4-8 weeks is needed. An optimal run time of 30 days is recommended, following a ramp-up time of a week, in which vendors can algorithmically “gear up” their reading of user behavior patterns. Budget allocation usually falls between 5% and 10% of total monthly ad spend.
An isolated test structure is essential to ensure accuracy, and to avoid last-click credit capture (for example, by using different ID ranges for each vendor). Also, it’s important to bear in mind that improper frequency caps can increase costs without increasing conversions. These are all factors which can affect the cost and time involved. While head-to-head testing appears conceptually simple, its execution demands careful planning and carries meaningful performance risk if segmentation fails.
Incrementality testing—what’s involved
The purpose of an incrementality test is to ascertain the overall synergistic value a given vendor is bringing to your business. This is accomplished by comparing performance between an exposed group and a control group that either receives no ads or sees ghost ads. There are three main methods to do this:
Geo-testing—in which geographic areas are used as a unit of comparison.
User-based holdouts—where an audience is split into a test group, that will be exposed to the vendor's ads, and a control group, which will not be exposed to that vendor’s ads.
Intent-based holdouts—a method by which ads are suppressed for certain high-intent search items, and then observing if organic conversions can fill the shortfall.
As with head-to-head testing, incrementality testing requires sufficient run time and scale to generate statistically meaningful results. Most tests run for 2-3 months and rely on large sample sizes. Also, successful execution depends on close collaboration between brands and vendors, including access to CRM or first-party revenue data as a source of truth. As with head-to-head testing, a comprehensive run will typically cost 5-10% of a brand's monthly ad spend. It’s also important to have an internal analytics team to verify the results along with your vendor.
Setting the highest standard for partner evaluation
Both head-to-head testing and incremental testing can bring valid results. But only one of them can measure the value a given partner contributes as part of your wider marketing ecosystem. Incrementality testing evaluates value through collaboration and synergy rather than through isolated competition.
Head-to-head testing is like comparing two midfielders at a soccer club by goals scored. The metric matters, but no serious analyst would draw firm conclusions from it alone. The two players also manifest their worth to your team through assists, tackles, completed passes, ground covered, and many other metrics. So, the question has to be: how many matches do these players win for us when they play together? Ultimately, it’s about the “W”.
Contact us to learn more.
