People who participate in a research study differ depending on when the study occurs. For example, people who work in traditional white collar jobs may be unavailable to complete a study during regular business hours (when they would ordinarily be at work). Likewise, in college samples, students who sign up to complete studies at the beginning of the semester can differ from students who sign up to complete studies at the end of the semester. These differences likely vary across populations and recruitment methods, and in many cases evidence for their existence resides only in the beliefs and tacit knowledge of study recruiters and researchers.
Online experiments expand the times at which studies can be completed and the population available to complete them. At the same time, online experiments often distance researchers from the recruitment process, making it harder to have a sense of who is available to complete a study at different times. To address this knowledge gap, we present results from two recent working papers that explore differences among Turkers participating in HITs at different times: Arechar et al. 2016 (hereafter AKR) examine the demographics, big-5 personality traits, and incentivized economic behaviors of 2,336 Turkers as a function of their local time when completing the study; and Casey et al. 2016 (hereafter CCLBS) examine demographics and big-5 personality traits in 9,770 Turkers as a function of the time at which HITs were posted. Our studies show convergent evidence of some differences in demographics and personality based on when HITs are posted, and when within a HIT’s posting period a subject participates. The incentivized behavioral measures of AKR, however, do not in general find that these differences translate into meaningful differences in actual behavior.
Available workers are different on different days and at different times.
Both of our papers found that participants with more prior experience on MTurk (self-reported in AKR, cross-referenced for participation in a prior study by CCLBS) were more likely to complete HITs earlier in the day. AKR also found that participants with less prior experience on MTurk were more likely to complete HITs on weekends.
We both found that participants’ age varies with time and day, although the observed patterns were less consistent: CCLBS observed that workers were younger on Thursdays and older on Saturdays, and AKR observed that participants were younger in the evening and older in the morning.
Participants also differed across time in terms of personality. In both studies, participants who scored lower on the big-5 personality dimension of conscientiousness were more likely to complete HITs later in the day. AKR also found that participants who scored higher on the big-5 personality dimension of neuroticism were more likely to complete HITs later in the day.
AKR also examined a personality dimension not considered by CCLBS: intuitive versus reflective cognitive style (using the “Cognitive Reflection Test,” math problems with intuitively compelling but incorrect answers (Frederick, 2005)). Cognitive style has been linked to a wide range of behaviors and beliefs, with more deliberative people (who scored better on this test) being for example less impatient, less religious, less inclined to hold traditional moral values, and less susceptible to pseudo-profound bullshit (Pennycook et al. 2015). AKR found that participants on the weekends performed more poorly on this task, indicating that they were less deliberative (i.e. more intuitive).
CCLBS also examined additional demographics characteristics and found that workers recruited later in the day were more likely to complete the survey using cellphones and to be of an Asian American ancestry. Moreover, they found that participants were more likely to be Asian on Wednesdays, and more likely to have a full-time job on Sundays, and that single participants were more likely to complete the survey later in the day.
Finally, AKR examined a range of incentivized economic behaviors: various measures of prosociality (Prisoner’s Dilemma, Dictator Game, a charitable giving decision, and an honesty task where participants could lie to earn more money), third party punishment, and intertemporal choice. They found no significant day or time differences in behavior, except that participants late at night donated more money to charity (and took longer to complete the study). They also found that that participants on the weekends failed comprehension questions more often.
Early responders are different from late responders.
Both studies found that workers who participated earlier in the data collection process were substantially more experienced, and tended to score higher on the big-5 personality dimension of agreeableness.
CCLBS also found that participants tend to be older, more likely to have a full-time job, more likely to be Asian American, less neurotic, more conscientious and more likely to be male earlier in the data collection process. And AKR found that early participants are more likely to pass comprehension questions correctly, to be more deliberative (i.e. perform better on the Cognitive Reflection Test), to give less to charity, and to take less time completing the study.
These findings have several implications for researchers using MTurk:
First, differences across day and time are most crucial for researchers trying to make point estimates about the worker population (e.g., to understand the average number of completed experiments, opinions about piece work employment or online labor market dynamics). Studies on these topics need a sampling strategy that appropriately weights workers who are online at different times and who have different levels of experience.
Second, there is variation in the impact of study launch time on participant characteristics. Paradigms in which the reported differences are likely to matter should be posted at carefully selected times. For example, studies that rely on workers using a computer (e.g., reaction time studies) are more efficient to run earlier in the day and studies of charitable donations may benefit from larger variance observed in evening populations. Also, complex paradigms for which risk of non-comprehension is high may benefit from more highly experienced, conscientious, and/or reflective participants who are more likely to participate earlier in the day and on weekdays. However, there are paradigms for which timing of the study launch does not seem to have much impact, such as behavior in the simple economic games and decisions examined by AKR. Given the variation observation variation in participant characteristics, we recommend reporting when (time of day and days of week) data is collected from MTurk in the study methods section as a best practice.
Finally, timing of participation during a study (serial position) matters. Early responders are more experienced at completing surveys and may also be more diligent (as reflected by differences in the CRT and comprehension checks in AKR). Crucially, differences impact not only early and late responders in large studies, but also participants across sequences of studies that exclude workers who have completed previous studies in the sequence. These changes may influence the potential to replicate earlier studies in the sequence: as measurement error increases (because participants don’t understand the question, don’t think about it carefully or are less interested in helping the researcher) so too will the necessary sample size. Replication will become more difficult if researchers do not account for this change.