In the past, some compelling theories simply were impossible to prove — the necessary data and evidence was too vast to gather, the computational analysis too time consuming. However, in this era of big data, what once could only be hypothesized now can be demonstrated. A new study finds human languages exhibit a clear positive bias across cultures far and wide, people use more positive than negative words.

“We tried to cover as many major languages as possible, spread out around the world, and diverse in culture,” Dr. Peter Sheridan Dodds, a mathematician at University of Vermont and co-lead of the study, wrote in an email to Medical Daily.  Dr. Chris Danforth, also a mathematician and co-lead of the study, added, “We're presently measuring happiness for U.S. states and cities, and planning to include several other countries now that we have this new and exciting data set for many languages.”

In 1969, Drs. Jerry Boucher and Charles E. Osgood, psychologists at the University of Illinois, began an investigation into how people communicate and found “a universal human tendency” to use positive words more frequently than negative words. In the years since then, other researchers have used Boucher and Osgood’s Pollyanna Hypothesis as a springboard to venture similar theories. In 1978, researchers Margaret Matlin and David Stang, also from University of Illinois, furthered their colleagues theories with their own work, which generally suggested people take longer to recognize what is unpleasant or threatening than what is pleasant and safe. This younger duo also found most of us are guilty of "selective memory," by which we recall the past as rosier than it actually was. If at the conscious level, these psychologists said, our minds tend to focus on the negative, we are concentrating solely on the optimistic and positive in unconscious ways.

In short, we humans have a bias toward happiness.

Bring in the Big Guns

Though these theories have been discussed long and heatedly, psychologists never deemed them conclusive due to the small size of the underlying studies and the small amount of actual evidence. Well, time passed, technology advanced, and the era of big data ushered in new opportunities to revisit old theories. Danforth and Dodds wondered, Does the Pollyanna Hypothesis truly hold water?

To begin their investigation, they set their research team the task of gathering billions of words in 10 languages: English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese (simplified), Russian, Indonesian, and Arabic. The team focused on 24 total sources, from books to websites to music lyrics. From Twitter alone, they collected roughly one hundred billion words written in tweets.

“We used Twitter because (1) we have 10 percent of all tweets streaming to our research group; (2) it's open (in contrast to Facebook); and (3) social media is an important medium of expression and contrasts strongly with our other corpora, such as the Google Books data set,” Dodds told Medical Daily.

After pilfering billions of words from Arabic movie subtitles, Korean Twitter feeds, Russian novels, Chinese websites, and English newspapers, the next difficult step was analysis. This process required the team identify, from their many language sources, roughly 10,000 of the most frequently used words in each of the 10 languages. Then, the team “contracted a translation service who in turn employed around 2000 native speakers of the 10 languages around the globe to assess words,” Dodds told Medical Daily. The native speakers rated the most-frequently-used words on a nine-point scale of emotion, ranging from a deeply frowning face to a broadly smiling one.

After collecting five million individual scores, the researchers then averaged these for each (and every) word. (Thank you, computer programs.) In English, for example, the word laughter averaged a score of 8.50, food 7.44, truck 5.48, greed 3.06, and terrorist 1.30. In every language, neutral words (like the) scored in the middle, as we might expect.

When this extensive process was completed, the researchers ran their computations and discovered, quite happily, that the Pollyanna hypothesis was, well, dead on (pardon the negative word choice). The authors of those many online and printed communiques skewed time and again toward the use of happy words. In fact, all 24 sources skewed above the neutral score of five on their one-to-nine scale, no matter the language. That said, a Google web crawl of Spanish-language sites had the highest average word happiness, while a search of Chinese books had the lowest.

graph Distribution of Happiness Dodds et al., PNAS

“The process took about six months and the whole paper two years,” Dodds told Medical Daily. As a final test, the team even translated words between languages and then back again. Still, they found the emotional content of words consistent between languages.

Free Tools for Everyone

Naturally, having completed this work, Danforth and Dodds hope to continue their research and also multiply the number of applications for their collected data. What better way to accomplish this then to make their instruments available to one and all.

“We're designing our instruments to be of use for policy makers, countries and cities, journalists, businesses and corporations ("how are my products being talked about?") and, of course, interested individuals,” Dodds told Medical Daily. “Our instruments are not just for Twitter and can be used on any large enough text.”

While they continue to build panometer.org, which is intended to measure and estimate “all kinds of quantities from Twitter: health, food consumption, lack of sleep,” Dodds said, they’ve already “built a contribution to the Digital Humanities with an exploration of 10,000+ books here: hedonometer.org/books.html.”  One of their long term goals, Dodds noted, is to “measure the stories of populations.” Based on the current research, then, we certainly will expect most of these stories to conclude with a proverbial Hollywood Ending.

Source:  Dodds PS, Clark EM, Desu S, et al. Human Language Reveals a Universal Positivity Bias. Proceedings of the National Academy of Sciences. 2015.