MaxDiff analysis is a popular technique used in marketing research to evaluate lists of attributes, features, or statements. A Bradley-Terry model could be an attractive alternative to a MaxDiff in cases of sparse data and/or small sample size.
One of the most common tasks for marketing researchers is to understand consumer preferences within a list of attributes. These could be different features or benefits of a product or service, factors driving consumer interest in a category, values, attitudes or behaviors describing a person, etc. Often the lists of attributes to be evaluated are lengthy and can include dozens of items.
The most straightforward way to test an attribute list in a survey would be a monadic desirability, importance, or applicability rating question on a 5-to-10-point scale. Unfortunately, respondents are known to rush through long ratings, simplifying and creating shortcuts and empirical rules. They are using limited scales which results in many ties across items. And response scale biases in ratings lead to problems with statistical testing and multivariate analysis.
A ranking question would improve the results and provide more differentiation compared to a rating. But if the attribute list is too long, it would be difficult for respondents to accurately rank the items, and reliability of the ranking tends to be low.
One of the approaches actively used by researchers to identify “winners among winners” and “losers among losers” in a long list of attributes is a MaxDiff analysis developed by Jordan Louviere. MaxDiff offers an efficient and engaging way of questioning: a respondent is shown a series of screens with a small subset of attributes and is asked to select the most and least liked attribute on each screen. MaxDiff belongs to the family of Choice-Based Conjoint (CBC) techniques and utilizes the same estimation methods as any other CBC. The result is a score on a continues scale that represents the strength of each item in the list compared to all other items. MaxDiff has multiple advantages compared to ratings and rankings – it is more accurate, it emphasizes differentiation across items in the list, and it puts the scores on a continues scale.
Bradley-Terry (BT) is a traditional statistical technique mostly applied in sports analytics. Extensions of the BT model have been used to rank chess players (Elo 1978) and NASCAR drivers (Hunter 2004). For a non-sporting example, the BT has been used to derive influence rankings for journals, where the comparison between two journals are the citations from each to the other (Stigler 1994; Varin, Cattelan, and Firth 2016).
In modern marketing research BT is somewhat forgotten and often replaced by a MaxDiff, even in cases where a BT would be a better fit. Like a MaxDiff, it is used to derive a measure of preference or scores reflecting interest, importance, or appeal of multiple items (attributes, statements, features, etc.); and the score’s interpretation is the same as for a MaxDiff score. Similarly to a MaxDiff, in ensures better differentiation among items and estimates scores on a continuous scale.
The BT method is based on a win-loss matrix. It is a matrix that summarizes frequencies of an item “winning” against every other item in a “tournament”. Different kinds of questions could be used to build a win-loss matrix. The most standard question is a pairwise comparison, therefore the BT is also called a “pairwise comparison method”. A choice exercise could be used to collect data for a BT analysis. It also could be a full ranking question or a series of full or partial rankings. This flexibility gives the BT approach some advantage compared to a MaxDiff if a list of items is relatively long. If dozens of items are evaluated (or if statements to evaluate are lengthy), a MaxDiff exercise can become too tedious and time consuming for respondents. An alternative could be a BT with a series of screens presenting random subsets of items from the list for partial rankings. For example, if the list contains 70 items for evaluation, it is enough to show seven screens with 10 items each and ask a respondent to rank top 3 items on each screen. This kind of question would be easier and more engaging for a respondent than any reasonable MaxDiff exercise with the same number of items.
The main difference between a MaxDiff and a BT is an algorithm used for the analysis. MaxDiff is a variation of a CBC, and most often a Hierarchical Bayesian (HB) estimation is used to build a model. In recent years, the de facto method for fitting BT has been the maximum likelihood estimate using the MM-algorithm (Hunter 2004). As an alternative to finding the MLE, Caron and Doucet (2012) have proposed a Bayesian approach. HB estimation used to a MaxDiff is producing a set of utilities for every item in the list individually for each respondent. It could be useful in some cases, for example if a MaxDiff model is used for a segmentation, but usually these utilities are processed to deliver MaxDiff scores in total and possibly for subgroups. BT techniques are not used to build individual models, but they are more robust with sparse data and small sample sizes, which could be a significant advantage in many studies. This property made the BT model useful in context of Machine Learning; it was re-parameterized and is applied as a single-layer artificial neural network (Menke and Martinez, 2007).
Finally, as mentioned above, the BT scores have exactly the same interpretation and could be delivered in exactly the same way as scores in a MaxDiff study. Similarly to a MaxDiff, the score represents a relative strength of an item (attribute, statement) compared to all other items in the list. And like in a MaxDiff, the scores can be anchored and presented on an absolute scale if needed.
At Big Village, we have experience using the BT analysis in studies with relatively small sample and with long lists of attributes. The method demonstrated high stability and reproducibility and produced meaningful and actionable results.
Written by Faina Shmulyian, Vice President, Data Science at Big Village Insights.