In a previous post, we documented the existence of MTurk workers that are disproportionately likely to show up in academic studies, potentially leading to foreknowledge of experimental procedures. We report here a study that illustrates the potential challenges that foreknowledge can have for MTurk data validity.
The Cognitive Reflection Test (CRT; Frederick, 2005) is typically used to measure a stable individual difference in cognitive orientation. It consists of three questions, each of which elicits an intuitive response that can be recognized as wrong with some additional thought. As a result, the number of correct answers to the CRT can serve as a parsimonious measure of the individual’s tendency to make reflective decisions. Foreknowledge – either as a result of previous exposure or through information shared by others who have completed the task – is problematic for the CRT because it increases the likelihood that the individual has discovered the correct response and can provide it without reflection, or at a minimum is aware that there is a “trick” that necessitates that the question receive additional scrutiny.
The CRT appears frequently on MTurk, therefore workers who spend more time using MTurk should be more likely to provide the correct answer. We recruited one hundred workers that varied in their (known) prior experience on MTurk. Participants completed a study that included, among other measures, the original version of the CRT (Frederick, 2005). We found that workers who were known to have completed more research studies on MTurk answered more CRT questions correctly, suggesting that their performance in fact improving with experience.
One alternative explanation to this finding is that more productive workers differ in some meaningful way from less productive workers. For example, they could be more reflective or conscientious. To rule this out, we asked the same workers to complete a “novel” version of the CRT (from Finucane & Gullion, 2010) prior to completing the “original” version. The questions posed by the two tests are logically identical and only differ in terms of their familiarity to workers. As expected, performances on the original and novel test were highly correlated. However, whereas prior experience significantly predicted performance on the original CRT, it did not predict performance on the novel CRT. This suggests that the results were not caused by a fundamental difference in the cognitive style of more experienced workers.
This study illustrates that using measures that are familiar to MTurk workers and at the same time assume that participants are naïve is problematic. By collecting the CRT among non-naïve participants, researchers might draw false conclusions about their levels of cognitive reflection and about the relationship between CRT performance and any variable that correlates with worker experience. Moreover, nonnaiveté introduces another source of error that might obscure the relationship between cognitive reflection and other constructs of interest.
See full paper for more details about the study and a broader discussion of the challenges connected to worker nonnaïveté.
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté Among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers. Behavior Research Methods, 46(1), 112-130
Finucane, M. L., & Gullion, C. M. (2010). Developing a tool for measuring the decision-making competence of older adults. Psychology and Aging, 25(2), 271.
Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25–42.