Ph.D. thesis. June 27, 2016. Pittsburgh, PA, USA
Yanchuan Sim, School of Computer Science, Carnegie Mellon University
People write everyday — articles, blogs, emails — with a purpose and an audience in mind. Politicians adapt their speeches to convince their audience, news media slant stories for their market, and teenagers on social media seek social status among their peers through their posts. Hence, language is purposeful and strategic. In this thesis, we introduce a framework for text analysis that make explicit the purposefulness of the author and develop methods that consider the interaction between the author, her text, and her audience's responses. We frame the authoring process as a decision theoretic problem — the observed text is the result of an author maximizing her utility.
We will explore this perspective by developing a set of novel statistical models that characterize authors' strategic behaviors through their utility functions. We consider three particular domains — political campaigns, the scientific community, and the judiciary — using our models and develop the necessary tools to evaluate our assumptions and hypotheses. In each of these domains, our models yield better response prediction accuracy and provide an interpretable means of investigating the underlying processes. Together, they exemplify our approach to text modeling and data exploration.
Throughout this thesis, we will illustrate how our models can be used as tools for in-depth exploration of text data and hypothesis generation.
Noah Smith (chair), University of Washington
Eduard Hovy, Carnegie Mellon University
Daniel Neill, Carnegie Mellon University
Jing Jiang, Singapore Management University
Philip Resnik, University of Maryland, College Park