Abstract

With the rise of social media such as Twitter, people are more willing to convey their stressful life events via these platforms. In a sense, it is feasible to detect stress from social media data for proactive health care. Despite its value, far too little attention has been paid to estimate exact stress level from social media data, due to the following challenges: 1) stressor subject identification, 2) stressor event detection, and 3) data collection and representation. To address these problems, we devise a comprehensive scheme to measure a user's stress level from his/her social media data. In particular, we first build a benchmark dataset and extract a rich set of stress-oriented features. We then propose a novel hybrid multi-task model to detect the stressor event and subject, which is capable of modeling the relatedness among stressor events as well as stressor subjects. At last, we lookup an expert-defined stress table with the detected subject and event to estimate the stress level. Extensive experiments on real-world datasets well verify the effectiveness of our scheme.


Dataset

To construct a benchmark dataset, we first categorized the stressor events into 43 categories based on the professional life events stress scale. We then manually defined a set of keyword patterns collected from the LIWC dictionary for each stressor event category. Using the collected keywords as seeds, we filtered matched tweets from a one billion Weibo dataset, which was crawled from Weibo between 2009.6 and 2012.12 using Weibo’s open APIs. We then collected the top 12 stressor event categories and invited 30 volunteers to manually label the stressor events and stressor subjects of the tweets from the filtered Weibo data. We finally collected a small but reliable dataset containing nearly 2,000 tweets. We also randomly selected 600 tweets that are labeled as non-stress related to be the negative samples. The scale of the collected dataset is comparative to other works in related areas, e.g., personal life events detection [Li et al., 2014]. The detailed distribution of the labeled dataset is shown below.


The dataset can be downloaded HERE.

Our stressor event and subject dictionaries can be downloaded HERE.



Events Labeled Sampling words Events Labeled Sampling words
Marriage 227 marry wedding bride Argue 107 cold war quarrel argue
Financial 114 income salary rent Blamed 199 question blame afraid
Illness 424 hospital sick pain Pregnancy 132 baby pregnant mother-to-be
School 171 school holiday finals Habits 102 revise habits smoke drink
Birth 133 born life baby Death 127 pass away R.I.P
Fired 102 fired job lose Divorce 112 divorce ex-wife cry

Table 1: Summary of the manually labeled tweets for each stressor event category and the sampling words of the constructed stressor event dictionary.



Category Labeled Sampling words Category Labeled Sampling words
i 647 I my our we spouse 207 wife husband dear
family 277 mother daughter boss 161 boss teacher tutor
friend 327 friend teammate relative 123 aunt uncle cousin

Table 2: Summary of the manually labeled tweets for each stressor subject category and the sampling words of the constructed stressor subject dictionary.




Word2Vec Model

We learned the word embeddings with a 200-dimensional vector on a one billion Weibo dataset, which was crawled from Weibo between 2009.6 and 2012.12 using Weibo’s open APIs. And the learnt model can be downloaded here.