Posted by: Gabriele Paolacci | April 22, 2013

TurkGate

Guest post by Gideon Goldin and Adam Darlow

As MTurk was not designed for psychological research, it cannot be expected to provide the experimental control that psychologists typically exercise when recruiting participants for laboratory studies. In particular, MTurk lacks:

(I) The ability to exclude participants that have already participated in related studies.

(II) The ability to prevent study previews.

(III) The ability to verify participants’ completion of a study.

TurkGate, or Grouping and Access Tools for External surveys (for use with Amazon Mechanical Turk), gives researchers an easy-to-use web application for providing such control when using MTurk with externally hosted studies (e.g., Qualtrics surveys).

TurkGate groups related HITs together, such that participants may only access one HIT per group. These HITs only link to surveys after workers accept them, and only if they have not already accessed related HITs. Workers attempting to preview HITs receive information about the HIT, but no link. Once the HIT is accepted, the worker’s ID is tested against a database to verify eligibility before the worker is granted access to the study. TurkGate also generates completion codes that can be automatically verified.

Pe’er et al., (2012) have already created a relatively simple method to resolve some of these issues, which we refer to as the Qualtrics method since it is based on functionality inherent to Qualtrics. Like TurkGate, their method supports hosting studies on a variety of sites, as users can adopt (the free version of) Qualtrics for the purpose of screening participants before redirecting them elsewhere. In addition, their method also uses a script to prevent study previews. Researchers who use the Qualtrics method–and who also track response IDs–can address all of the MTurk limitations described above without needing to configure or maintain any soft- or hardware.

In contrast, TurkGate needs to be downloaded and installed on a web server (e.g., Apache HTTP Server) with a database management system (e.g., MySQL). As such, the security and reliability of TurkGate will depend on the system it is installed on. If the server goes down, TurkGate goes down with it–an unlikely occurrence with a professional service like Qualtrics. And although TurkGate is designed to work out-of-the-box, it may require some administration, such as updating versions or setting up database backups. These requirements suggest that TurkGate is best suited for an entire laboratory or department of researchers, where a single computer-savvy individual or IT professional can maintain it.

In return for this investment, however, TurkGate offers a streamlined workflow with several, distinct advantages:

(i) TurkGate manages related studies with minimal overhead. Researchers using either TurkGate or the Qualtrics method add a script to their Web Interface HITs that checks workers’ IDs against a list of restricted IDs. The difference is that the Qualtrics method maintains the list of IDs within each survey, whereas TurkGate maintains a global list of IDs for all surveys, separated by group. Having a centralized database has several advantages, especially when researchers collaborate. Namely, there is no need for researchers to store, share or update lists because all surveys use the same list and multiple researchers can use the same TurkGate installation. To make any pair of studies mutually exclusive, researchers simply assign them the same group name. Another benefit is that researchers can run their studies simultaneously, since TurkGate’s list is updated automatically and in real-time. Obviating the need to manage multiple lists of IDs makes creating surveys faster. A researcher simply submits a URL and group name for their survey into TurkGate to get the aforementioned script. They are then ready to create their HIT and run their survey. It is this highly optimized workflow that represents the original raison d’etre of TurkGate.

(ii) TurkGate disables HIT previews while preventing unnecessary HIT returns. The Qualtrics method, like TurkGate, prevents HIT previews by sending workers to an intermediary page prior to the actual study. However, explicit care was taken in developing TurkGate to prevent workers from ever needing to return HITs for which they are not eligible. This prevents the artificial and undesirable inflation of workers’ return rates. Instead, workers are provided with a group name, and if they recall having participated in the group, they know not to accept the HIT. If they do not recall participating, they can simply verify their eligibility by submitting their worker ID.

(iii) TurkGate offers intuitive, verifiable, and anonymous completion codes. TurkGate’s completion codes were crafted to support a number of features. First, the codes themselves contain useful information in human-readable form, including MTurk worker ID, group name, survey identifier, and an optional Qualtrics or LimeSurvey record ID (researchers can augment their codes with any number of additional key-value pairs). Critically, each code also contains an encrypted segment used to prevent fake codes. After running a batch of HITs, researchers simply copy and paste their MTurk results file into TurkGate, which then instantly flags invalid records and duplicates. For experimenters (or IRBs) concerned about anonymity, TurkGate can also verify participation without using response IDs (that are coupleable with study data). Those using Qualtrics often use response IDs as completion codes (http://www.qualtrics.com/university/researchsuite/faqs#codenumber), but this requires manual verification and precludes complete anonymity.

For researchers in psychology laboratories or departments that already have access to IT support capable of configuring and maintaining a web server and database, TurkGate will likely serve as a convenient and long-term solution. However, the Qualtrics method is better suited for researchers who are uninterested in the overhead of deploying a separate system, especially if they already use Qualtrics.

TurkGate is an actively developed, open-source project that users are free to download (and modify) via GitHub. It is used in multiple laboratories and continues to evolve based on the feedback and contributions of its users. Learn more about TurkGate here.

(Suggested citation: Goldin, G., Darlow, A. (2013). TurkGate (Version 0.4.0) [Software]. Available from http://gideongoldin.github.com/TurkGate/)

References

Pe’er, Eyal, Paolacci, Gabriele, Chandler, Jesse and Mueller, Pam, Selectively Recruiting Participants from Amazon Mechanical Turk Using Qualtrics (May 2, 2012). Available at SSRN: http://ssrn.com/abstract=2100631 or http://dx.doi.org/10.2139/ssrn.2100631

Posted by: Gabriele Paolacci | March 13, 2013

Using MTurk to Study Clinical Populations

Guest Post by Danielle Shapiro and Jesse Chandler

Relative to behavioral sciences, clinical sciences have been slow to adopt MTurk as a recruitment tool. This is unfortunate because a major obstacle of clinical research is locating individuals who score in the extremes of clinically relevant variables, who by definition make up the minority of the population and are thus hard to find in large numbers.

We investigated the use of MTurk as a recruitment tool for populations of interest to clinical scientists. In line with numerous previous studies of MTurk, we found that data quality was high. Scale items used to measure underlying psychological constructs held together well. More importantly, the relationship between self-reported demographic information, life experiences, and psychological constructs was largely consistent with prior research, e.g., unemployment predicts depression, women report more anxiety, and men drink more alcoholic beverages. In general, workers looked a lot like the US population as a whole, except they reported surprisingly high levels of social anxiety.

We also learned a few things that may be of interest to researchers in other fields. First, in line with previous results, we found that workers are basically honest about personal details when payment is not contingent on their responses. We asked workers to report demographic information at two different time points more than a week apart, and for workers from US IP addresses, virtually all of them reported the same information both times.

Second, workers may be less honest when details relevant to payment are concerned. For example, a surprising number (around 6%) of workers who claimed US residence in fact came from IP addresses assigned to Eastern Europe and India. This is probably because US based workers are paid in cash rather than Amazon credit. Similarly, we measured malingering – the tendency to report symptoms that seem clinically relevant but are in fact rarely reported in clinical populations. We found a substantial portion of the population (around 10%) reported unusually high levels (>3 SD above the norm) of malingering. One interpretation of this finding is that workers infer the purpose of a survey and try to provide information that is relevant to what the requester wants. This interpretation is in line with earlier research that shows a higher level of social desirability bias among MTurk workers than among other populations (Behrend, Sharek, Meade & Wiebe, 2011).

Moreover, we learned that a substantial proportion of workers are unemployed or underemployed – far more than the US national average. The extent to which these workers are willing to work for very low wages should not be construed as satisfaction with current payment rates.

You can find our full report here.

References

Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800-813

Shapiro, D.N., Chandler, J., & Mueller, P. (in press). Using Mechanical Turk to Study Clinical Populations. Clinical Psychological Science.

Posted by: Gabriele Paolacci | January 21, 2013

Conference on Experiments with Crowd Sourced Subjects

The Nuffield Centre for Experimental Social Sciences at the University of Oxford (UK) has organized a one-day Conference on Experiments with Crowd Sourced Subjects for February 14, 2013. The conference will introduce the development and use of crowed sourced experiments to researchers who are interested in conducting such experiments in their fields, with special focus on AMT. See here for the conference program and registration details.

Posted by: Gabriele Paolacci | October 9, 2012

Slides from ACR 2012

On October 5, we held a special session at the Association for Consumer Research North American Conference called “Inside the Turk: Methodological Concerns and Solutions in Mechanical Turk Experimentation.” Below you can find the presenters’ slides (click on the title of the talks).

Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples (Joseph Goodman)
We compare Mechanical Turk participants to community and student samples on personality, financial, and consumption dimensions, as well as classic decision-making biases. We find many similarities between Mechanical Turk participants and traditional samples, but also find important differences researchers should consider when using Mechanical Turk for consumer research.

Screening Participants on Mechanical Turk: Techniques and Justifications (Emily Peel)
Concerns about the quality of Mechanical Turk participants induce researchers to screen participants. We evaluate screening strategies according to their discriminant ability to identify observations that contribute only noise. Our results suggest omitting participants based on these indicators would likely bias the sample rather than improve data quality.

Under the Radar: Determinants of Honesty in an Online Labor Market (Dan Goldstein)
Online subject pools depend on participants’ honesty. After establishing a baseline level of dishonesty on Mechanical Turk, we manipulate the incentives to cheat and the probability of detection. We find workers act like intuitive statisticians, cheating at a level below statistical detection at the individual, but not aggregate, level.

Non-Naivety among Experimental Participants on Amazon Mechanical Turk (Gabriele Paolacci, including introduction to the session)
We conducted two studies to identify the extent to which participant cross-talk and duplicate participation contribute to non-naivety among participants in Mechanical Turk. Whereas cross-talk is not a critical issue, there is evidence of numerous duplicate participants. We discuss the implications for Mechanical Turk experimentation.

 

 

 

Posted by: Gabriele Paolacci | September 25, 2012

AMT Special Session at ACR

We organized a Mechanical Turk Special Session at the Association for Consumer Research North American Conference (October 4-7 2012, Vancouver). The session is called “Inside the Turk: Methodological Concerns and Solutions in Mechanical Turk Experimentation” and will take place on Friday, October 5 at 2pm.

Whereas initial evaluations of AMT as a source of experimental data have emphasized its compelling strengths, fewer efforts have been made to explore and quantify its potential unique drawbacks and limitations. The special session will focus on some of the issues that threaten experimental validity on AMT and on providing easily implementable solutions to avoid these problems. The four papers included in the session deal with diverse issues. Joe Goodman will discuss differences between AMT workers and more traditional subject populations that are of high relevance to consumer behavior research. Julie Downs will discuss strategies for restricting data collection and data retention to attentive participants, together with their implications for the generalizability of AMT data. Dan Goldstein will address issues of participant honesty, including the results of experiments designed to detect dishonest behaviors among AMT participants and identify some of their predictors. Gabriele Paolacci will address the issue of non-naïvety among AMT workers by presenting studies about cross-talk and duplicate participation and provide simple remedies to attenuate this concern.

We plan to share our slides on the blog after the conference, so stay tuned!

Posted by: Gabriele Paolacci | August 29, 2012

Official AMT blog

AMT has an official blog that often publishes information relevant to academic researchers (e.g., how to improve data quality). Keep an eye on it!

Posted by: Gabriele Paolacci | May 11, 2012

Can AMT be Used for Learning and Memory Experiments?

Guest post by Todd Gureckis and John McDonnell

AMT has been used in behavioral research for a variety of purposes including norming stimuli, simple judgement and decision making tasks, or collective behavior experiments. While previous work has suggested that AMT data compares well to data collected in the laboratory, fewer studies have looked specifically at the types tasks that might interest many cognitive scientists.  For example, we are not aware of work addressing tasks that involve learning a non-trivial concept over a number of trials. There is good reason to suspect that AMT may not be ideal for such tasks because they typically require sustained focus, 20-60 minutes of time, and careful consideration of the task instructions.
We explored this issue by replicating a classic study in cognitive science on human concept learning (Shepard, Hovland, & Jenkins, 1961).  Key features of this study include that it has been widely replicated in different labs and depends on learning and problem solving extending over many trials.
Our first attempt at replication did not result in the same qualitative pattern of effects found reliably in laboratory replications. However, in a sequence of follow-up studies we explored the variables that effect the quality of data on AMT. In our first follow-up, we explored the impact that participant incentives have on the quality of AMT data. We found that payment had little effect on performance or the quality of data and mostly effected the signup rate.
However, in our final experiment we did something entirely obvious but perhaps a little insidious. We had participants answer a simple questionnaire at the end of the instruction phase of the experiment that tested for comprehension of the key details of the study. If the participant did not answer perfectly we had them return to the instruction phase.  This repeated in an infinite loop until participants could master all the questions. After this simple no-cost manipulation our data became much more similar to previous laboratory reports. Many participants repeated the instruction phase more than three times.
Based on this experience, we offer three lessons to researchers interested in using AMT in their work.  First, it is important to verify that subject understand the instructions (obvious, but we didn’t realize how important that could be).  Second, like other reports we found that the magnitude of payment doesn’t effect the overall quality of the data. Third, we found that it is possible to conduct learning and memory experiments that result in data that is very similar to that obtained in laboratory settings.

Read about our full results here.

References

Shepard, R., Hovland, C. L., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs: General and Applied, 75(13), 1–42.

Posted by: Gabriele Paolacci | March 13, 2012

CrowdScope.org

Siddharth Suri and Winter Mason launched a new crowdsourcing-related website: Crowdscope.org. The goal of the site is for people to share what they have learned through their experiences with crowdsourcing platforms such as AMT. It includes how-tos, information about past and forthcoming scientific meetings, links to further resources, etc. CrowdScope.org is a wiki: It is in continuous development, and the crowdsourcing community can and should actively contribute to its growth.

Posted by: Gabriele Paolacci | March 5, 2012

Emailing Workers Using Python

Researchers sometimes need to contact workers, e.g., to recruit them for experiments that require simultaneous participation (see Winter Mason’s SPSP 2012 slides). Especially when the message is the same for all workers, emailing them one at a time using the web is inefficient and time consuming. In the tutorial linked below, we provide instructions on how to email up to 100 workers at the same time using Python.

Emailing workers using Python
Pam Mueller (Princeton University), pamuelle@princeton.edu
Jesse Chandler (Princeton University)

(Suggested citation: Mueller, Pam and Chandler, Jesse, Emailing Workers Using Python (March 3, 2012). Available at SSRN: http://ssrn.com/abstract=2100601)

Posted by: Gabriele Paolacci | February 27, 2012

Screening AMT workers using Qualtrics

Duplicate respondents across related experiments are a substantial problem for conducting programmatic research on AMT (Chandler, Mueller, & Paolacci). In the tutorial linked below, we provide a straightforward alternative that allows researchers who use Qualtrics to exclude workers who participated in a previous study. This approach allows researchers to exclude workers who have completed any HIT before, without having to use AMT’s Command Line Tools.

We conducted a test HIT with 100 AMT workers who were asked to follow a link to Qualtrics that was posted using the script in the tutorial. In terms of browser, there were Firefox, Chrome, Internet Explorer, and Opera users. In terms of operating system, there were Windows users, Mac users, and Linux users, with various versions. Nobody reported seeing any error message and all proceeded to the survey. All of the Worker IDs recorded on Qualtrics matched the results of the HIT, therefore the code seems to be stable and reliable. However, the tutorial is subject to improvements. Should you find any bug or have suggestions, please contact the authors or better comment on this post. The tutorial was updated on May 2, 2012. Further tutorials will follow, and you will find an updated list in the Resources section.

Selectively Recruiting Participants from Amazon Mechanical Turk Using Qualtrics
Eyal Peer (Carnegie Mellon University), epeer@andrew.cmu.edu
Gabriele Paolacci (Erasmus University)
Jesse Chandler (Princeton University)
Pam Mueller (Princeton University)

You can also download a sample .qsf file that you can import on your Qualtrics account.

(Suggested citation: Pe’er, Eyal, Paolacci, Gabriele, Chandler, Jesse and Mueller, Pam, Selectively Recruiting Participants from Amazon Mechanical Turk Using Qualtrics (May 2, 2012). Available at SSRN: http://ssrn.com/abstract=2100631 or http://dx.doi.org/10.2139/ssrn.2100631)

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: