The Downstream Effects of Recurring Daily Exposure to Online Emotional Content: A Self-Experiment

Several months ago, I came across a Chrome browser extension developed by Lauren McCarthy, an artist and programmer based in Brooklyn, NY, called the Facebook Mood Manipulator. With a cheeky nod to the Facebook data science team’s 2013 massive-scale emotional contagion experiment, the extension asks you to choose how you’d like to feel — the four options are positive, emotional, aggressive, and open — and filters your feed accordingly through the text and sentiment analysis program Linguistic Inquiry Word Count (LIWC).

It looks like this:                                                          Source. http://lauren-mccarthy.com/moodmanipulator/

I downloaded the extension and reactivated my Facebook account for a week to test it out on my own. The sentiment analysis component is definitely imperfect, perhaps because LIWC is less accurate with shorter bits of text like Facebook status updates and comments than longer excerpts. Nonetheless, I got some interesting results. Setting the manipulator to “positive” promoted a couple of personal status updates about friends and family, but “open” did not change my feed at all. Setting it to “aggressive” once caused my whole feed to go blank, and then to refresh with no change in the initial distribution of posts. The “emotional” filter, however, was particularly strong — it consistently brought some contentious political and social discussion threads to the top of my feed, as well as certain breaking news stories. After about a week of checking my Facebook News Feed daily on the “emotional” setting, I noticed that after logging off, the effects of the “treatment” lingered — I found myself downright concerned about some of the commentary I had come across, and I initiated face-to-face conversations about the news stories I saw posted more often than I typically do. Recurring daily exposure to “emotional” posts thus seemed not only to produce some sort of downstream effect on my mood, but affected my social behavior offline.*

I am wary of drawing any conclusions from this self-experiment — because I knew I was being “studied,” I probably overestimated my responses to the mood manipulator — but nonetheless, assessing my self-report results highlighted the utility of tracking downstream effects in social media experiments. Initial exposure to a disagreeable online political post or a contentious comment thread may trigger an initial emotional response or encourage someone to engage with the online content, for example, but recurring daily exposure to such content may also alter mood, lead people to reconsider opinions, or affect their motivation to engage in more traditional forms of political action,  such as face-to-face deliberation or protest. For political scientists and social psychologists alike, it is these downstream effects on individuals’ attitudes and behavior, rather than individuals’ initial reactions to treatments, that matter most — social science experiments should thus take particular care to capture them.

*Note: I don’t think the “emotional” filter accounts for valence — when it works correctly, it displays both “positive” and “negative” posts. I wish the “positive” and “aggressive” filters had worked properly, so I could test whether repeated exposure to valence-charged posts affected my offline behavior differently than exposure to mixed-valence posts.

The Promises and Perils of Peer Production

Psychological studies are often criticized for their use of undergraduate students as research subjects. As the argument goes, it is difficult to generalize findings that are primarily based off of the attitudes of college students at a small number of universities in the United States. And there is certainly truth to such criticism. But there’s also another truth the critics often ignore: finding a large number of randomly selected research subjects who come from diverse demographic backgrounds is rarely feasible, either economically or time-wise.

But there’s potentially a new method of quickly choosing research subjects at a minimal cost. Several websites allow both private businesses and academic researchers to hire a large number of workers for the purpose complete short, simple tasks. The workers are independent contractors and the cost is cheap—often five to twenty cents for five to ten minutes of work; the result is that individual workers who provide a small amount labor are able to collectively complete a large task for a single employer. It’s called peer production, and the largest service providing this labor is Amazon’s Mechanical Turk. And it’s potentially a system that allows researchers to quickly afford thousands of research subjects from around the world—the famous “college sophomore problem” may have finally found a solution.

Mechanical Turk, however, creates its own problems for researchers. A survey of Mechanical Turk users by New York University professor Panos Ipeirotis found that approximately 50% of Mechanical Turk’s workers come from the United States. The other major country is India, whose workers make up 40% of the site’s workers. Within the United States, the average Mechanical Turk user is a young, female worker who holds a bachelor’s degree and has an income below the U.S. household median. This does not reflect the demographic makeup of the United States. As such, there may still be problems of generalizability and accuracy. However, this may not be as large of a problem as it first appears; several studies provide evidence that researchers can limit the population of Mechanical Turk users they chose from and adjust results so as to make results more generalizable than traditional undergraduate studies.

The larger risk for academics is not generalizability or accuracy, but rather the quality of work Turkers provide. In my own experience with a survey experiment that was put on Mechanical Turk, most survey responses were complete and all quality control questions were answered accurately. However, a large number of surveys were completed in an extremely short time period and some responses were incoherent or appeared to involve little thought. Ipeirotis’ survey provides some clues as to why this might be the case: according to his research, 15% of Mechanical Turk users in the United States use the site as their primary source of income; an additional 30% of users report that they use the site because they are unemployed or underemployed.

If a significant portion of workers use Mechanical Turk primarily as a means of generating income, their incentive is to game the system to get as much money as possible. This causes surveys to be taken quickly and without careful attention. Even quality control questions may not be enough. Online communities of Mechanical Turk workers such as Turker Nation have developed techniques for identifying quality control questions and skilled Mechanical Turk users can likely answer quality control questions accurately while still breezing through the rest of the survey. Researchers interested in the quality of survey responses would do well to building further quality checks into their surveys; one potential method would be to ask several specific questions about the experimental manipulation test subjects were given. This would at least ensure that survey respondents read everything they were supposed to.

Articles about Amazon’s Mechanical Turk often reference the inspiration for the service’s name. As the story goes, Amazon took the name from a machine built in the 18th century claimed to be able to beat any chess player. The machine toured around the world and dumbfounded amateur and professional chess players alike. Years after the success of the machine, it was revealed that the entire contraption was a hoax—inside was just a skilled chess player making all the moves. It is indeed an apt metaphor for the site. Though Mechanical Turk quickly delivers cheap, accurate survey responses, we can’t forget that it is ultimately real people taking the surveys. And these people have just as much of an incentivize to save money as researchers do; academics must adjust their research methods accordingly.

Works cited:

Ipeirotis, Panagiotis G. “Demographics of Mechanical Turk.” (2010).

Mason, Winter, and Siddharth Suri. “Conducting behavioral research on Amazon’s Mechanical Turk.” Behavior research methods 44.1 (2012): 1-23.

This Could Be The Start Of Something New

LIWC is great, all hail LIWC. Except when it stops working.

The Linguistic Inquiry and Word Count (LIWC) program is a text analysis program developed which provides a numerical representation of a number of dimensions of the speech (Positive emotional, anger, pronouns, the list goes on). The way it works is pretty simple: We start with the dictionary.

It really doesn't make much sense on its own like that

A sample of the LIWC Dictionary

If the word “boy” is in your text, the categories 121 and 124 get incremented by one (those category numbers correspond to “social” and “human”). Then, the output file calculates the percentage of words that matched the category in the sample, out of the total words in the sample.

Because Sahil has no friends

No words in this sample matched any of the words coded as “friend” in the dictionary

The LIWC program is great at this kind of stuff, and can do a bunch of cool things, like process multiple files at a time. Except for the fact that sometimes it breaks. Specifically, when you’re dealing with really big text files.

The project I worked on last semester involved analyzing tweets through LIWC. And, boy, people sure do tweet a lot. 7.6 megabytes doesn’t sound like a lot. A high quality picture will probably be larger than that. But 7.6 megabytes of text is a lot of text. In this case, it was over a million words. LIWC, the program, crashes when you put in this file.

Thankfully, we had the dictionary from LIWC, so I began working on our own version of LIWC, coded in python. It didn’t seem that hard, we knew how LIWC was doing the calculations, and while we wouldn’t have any of the fancy features the commercial software provides, for our purposes it seemed good enough.

What’s the progress? Well it turns out that just counting the total number of words is a pretty complicated task. I have a version of the text analysis program working where I get numerical outputs, but they do not match what the LIWC program provides. This is probably a fine tuning issue. Over the coming weeks, I’ll be giving LIWC and my program the same inputs, and adjusting my program until the results match.

How long will this take? Hopefully not long. Famous last words.