Guest post by Jesse Chandler
In a new Psychological Science article we provide direct evidence that effect sizes for experimental results are reduced among participants who have previously completed an experimental paradigm. Specifically, we recruited MTurk workers who participated in the Many Labs 1 series of two-condition experiments and invited them to participate in a research study that included the exact same package of experiments. We found that effect sizes decreased the second time around, especially when among those who were exposed to opposite conditions at the two different time points.
Previous studies have demonstrated that MTurk worker performance changes as workers become more experienced. For example, we have demonstrated that worker scores on the Cognitive Reflection Task (a commonly used measure of intellectual ability) is correlated with worker experience. Likewise, Dave Rand and Winter Mason have led projects that provided evidence that workers get better at economic games over time. All of these findings are new twists on older observations that attitudes of survey panel members tend to change over time (a phenomena known as panel conditioning in the survey literature) and that people tend to improve on measures of aptitude (a phenomena known as a practice effect within the psychometric testing literature).
Our findings illustrate that participant experience can also affect experimental results, even when dependent measures are not straightforward measures of ability. These findings are surprising (at least to us) because we have tended to assume that workers are being relatively unengaged while completing HITs and complete so many tasks that any individual experiment could hardly be memorable. But apparently they are.
Fortunately there is good news. First, we see some evidence that this effect wears off over time, suggesting that people eventually forget whatever information they may have seen. Second, if all you care about are the direction and sign of an effect (rather than an exact point estimate) smaller effect sizes can be offset by increased sample size. Third, there are an increasing number of tools available to prevent duplicate workers from participating in an experiment.
- The MTurk API or GUI allows you to create qualifications that filter out unwanted workers.
- Qualtrics can be set up to check workers against a predefined list of workers and exclude those with matching WorkerIDs. This is useful if you know of workers you want to exclude, but have not worked with them before.
- TurkGate will do something similar. It needs to run on a server, but it is likely easier to maintain than Qualtrics, particularly for a lab group or research team that wants to coordinate their efforts.
- TurkPrime and UniqueTurker are newer solutions that seem easier to use for individual researchers. We have not tested them, but readers may wish to experiment with them.
In short, there are lots of ways to limit worker participation. Of course, this does raise serious concerns for experimental paradigms that are used to the point of abuse (trolley problem, I’m looking at you) and highlights that the finite size of the MTurk pool means a finite limit on the amount of times a particular experimental paradigm can be run.