The Promises and Perils of Peer Production

Psychological studies are often criticized for their use of undergraduate students as research subjects. As the argument goes, it is difficult to generalize findings that are primarily based off of the attitudes of college students at a small number of universities in the United States. And there is certainly truth to such criticism. But there’s also another truth the critics often ignore: finding a large number of randomly selected research subjects who come from diverse demographic backgrounds is rarely feasible, either economically or time-wise.

But there’s potentially a new method of quickly choosing research subjects at a minimal cost. Several websites allow both private businesses and academic researchers to hire a large number of workers for the purpose complete short, simple tasks. The workers are independent contractors and the cost is cheap—often five to twenty cents for five to ten minutes of work; the result is that individual workers who provide a small amount labor are able to collectively complete a large task for a single employer. It’s called peer production, and the largest service providing this labor is Amazon’s Mechanical Turk. And it’s potentially a system that allows researchers to quickly afford thousands of research subjects from around the world—the famous “college sophomore problem” may have finally found a solution.

Mechanical Turk, however, creates its own problems for researchers. A survey of Mechanical Turk users by New York University professor Panos Ipeirotis found that approximately 50% of Mechanical Turk’s workers come from the United States. The other major country is India, whose workers make up 40% of the site’s workers. Within the United States, the average Mechanical Turk user is a young, female worker who holds a bachelor’s degree and has an income below the U.S. household median. This does not reflect the demographic makeup of the United States. As such, there may still be problems of generalizability and accuracy. However, this may not be as large of a problem as it first appears; several studies provide evidence that researchers can limit the population of Mechanical Turk users they chose from and adjust results so as to make results more generalizable than traditional undergraduate studies.

The larger risk for academics is not generalizability or accuracy, but rather the quality of work Turkers provide. In my own experience with a survey experiment that was put on Mechanical Turk, most survey responses were complete and all quality control questions were answered accurately. However, a large number of surveys were completed in an extremely short time period and some responses were incoherent or appeared to involve little thought. Ipeirotis’ survey provides some clues as to why this might be the case: according to his research, 15% of Mechanical Turk users in the United States use the site as their primary source of income; an additional 30% of users report that they use the site because they are unemployed or underemployed.

If a significant portion of workers use Mechanical Turk primarily as a means of generating income, their incentive is to game the system to get as much money as possible. This causes surveys to be taken quickly and without careful attention. Even quality control questions may not be enough. Online communities of Mechanical Turk workers such as Turker Nation have developed techniques for identifying quality control questions and skilled Mechanical Turk users can likely answer quality control questions accurately while still breezing through the rest of the survey. Researchers interested in the quality of survey responses would do well to building further quality checks into their surveys; one potential method would be to ask several specific questions about the experimental manipulation test subjects were given. This would at least ensure that survey respondents read everything they were supposed to.

Articles about Amazon’s Mechanical Turk often reference the inspiration for the service’s name. As the story goes, Amazon took the name from a machine built in the 18th century claimed to be able to beat any chess player. The machine toured around the world and dumbfounded amateur and professional chess players alike. Years after the success of the machine, it was revealed that the entire contraption was a hoax—inside was just a skilled chess player making all the moves. It is indeed an apt metaphor for the site. Though Mechanical Turk quickly delivers cheap, accurate survey responses, we can’t forget that it is ultimately real people taking the surveys. And these people have just as much of an incentivize to save money as researchers do; academics must adjust their research methods accordingly.

Works cited:

Ipeirotis, Panagiotis G. “Demographics of Mechanical Turk.” (2010).

Mason, Winter, and Siddharth Suri. “Conducting behavioral research on Amazon’s Mechanical Turk.” Behavior research methods 44.1 (2012): 1-23.