Evgeny Nikitin, technical director of FscoreLab, will speak at the annual conference “New Directions in Analyzing Text as Data Conference (TADA 2018)”, which will be held September 21-22 in Seattle (Washington, USA). The speech will be devoted to a new methodology for measuring political ideology based on posts and user comments on social networks.
RedditScore embeddings: text-based ideology estimation with Reddit data
The popularity of social media allows us the opportunity to study the dynamics of political behavior and public opinion. However, many research designs require the ability to measure the ideological content of social media posts, and on fine-grained scale. I propose a new method of text-based ideology estimation, which utilizes a large corpus of reddit comments from politically related subreddits. The motivation for the method is to train a multiclass classifier, which aims to predict which subreddit each comment was posted in. Vectors of predicted probabilities generated by this classifier are then used as document embeddings for any input texts, such as tweets. These embeddings (a) outperform many existing feature extraction methods (bag-of-words, Word2Vec, Doc2Vec) in supervised tasks, and (b) provide a simple way to obtain unsupervised ideology estimates. Moreover, these embeddings can be used for the purpose of measuring the content of documents on custom scales as only “liberal-conservative”, for example scales such as “anti-Trump-pro-Trump” or “pro-life-pro-choice”.