Girl Talk: How to identify gender by online speech patterns

Do patterns of online political discussion differ based on the gender of the writer? One of the keys to answering this question may be LIWC, or Linguistic Inquiry and Word Count, a “a computerized text analysis program that categorizes and quantifies language use” (Kahn 263). LIWC analyzes text by recognizing words and grouping words into different categories. For example, “I” and “me” are grouped into the “self-referential words” category while verbs like “think” and “believe” are grouped into the “cognitive processes” category. These categories range in specificity from broad language descriptors like “affect” to specific emotions and topics like “sadness” and “occupation”.

LIWC will be especially useful for the Online Political Discussion Computer Science team as we begin working with our 2008 twitter data set. We will use hashtags that are co-occuring with #politics to create a social network diagram of political discourse. For example, each node will be a tweet, and it will be connected to every tweet with which it shares a hashtag. Overlaying LIWC data with the social network diagram will show how the language content of tweets is mapped out over the network. Specifically, I hope to use LIWC to focus on the relationship between gender and online political discussion. However, the twitter metadata does not disclose the gender of twitter authors. Instead, I will use LIWC to analyze the language patterns of tweets to figure out the gender of twitter users.
How do we differentiate the language patterns of males and females? This is a question that both linguists and feminists have confronted for years. Second wave Feminist writers tackled this question using the language of power and powerlessness. In “Discourse Competence: Or How to Theorize Strong Women Speakers,” Sara Mills argues that the linguistic elements that make women’s speech different from men’s speech, like expressions of uncertainty and reliance on verbal fillers are not unique to women, but are expressions of submissiveness (Mills 4). At the same time, Mills writes that women act as the facilitators of conversation. Instead of steering the course for conversation, women tend do the “repair-work” of the conversation by asking questions and avoiding awkward silences (Mills 5). It should be noted, however, that some of the feminist writings of the 1970s are more theoretical than quantitative. In Language and Woman’s Place—a text on the linguistics of gender that was ground-breaking in the 1970s—the author admits that “the data on which she bases her claims have been gathered mainly through introspection: she examined her own speech and that of her acquaintances, and used her own intuitions in analyzing it” (Lakoff 46). Nonetheless, these theories of the linguistics of gender create a useful framework for discussing online political discourse. For example, if women truly are the “facilitators” of conversation, will female-authored tweets have higher measures of centrality? Or does the nature of online communication destroy the need for conversation facilitators, in which case one might predict the marginalization of female-authored tweets. Or does Twitter, a female-dominated social media site, represent a completely different paradigm for female speech?
While these questions make a good framework for theorizing about gender in online political discussion, there is still the issue of analyzing tweets for gender. For that, I look to Koppel et al.’s work on automatically categorizing written work by author gender (Koppel 401-412). Koppel and his team used a comprehensive list of words and grammatical patterns to create an algorithm that was able to predict the gender of the author of a text with eighty-percent accuracy. Although Koppel did not use LIWC in his algorithm, his team’s methods will inform how I will manipulate LIWC, which allows users to add words or expressions to dictionaries.

Works Cited

Kahn, Jeffrey H., Renée M. Tobin, Audra E. Massey, and Jennifer A. Anderson. “Measuring Emotional Expression with the Linguistic Inquiry and Word Count.” The American Journal of Psychology 120.2 (2007): 263. Print.

Koppel, M.. “Automatically Categorizing Written Texts by Author Gender.” Literary and Linguistic Computing 17.4 (2002): 401-412. Print.

Lakoff, Robin Tolmach. Language and woman’s place. New York: Octagon Books, 19761975. Print.

Mills, Sara. “Discourse Competence: Or How to Theorize Strong Women Speakers.” Hypatia 7.2 (1992): 4-17. Print.