Abstract: Unstructured multimedia data (text and audio) provides unprecedented opportunit ...
Expand
Abstract: Unstructured multimedia data (text and audio) provides unprecedented opportunities to derive actionable decision-making in the financial industry, in areas such as portfolio and risk management. However, due to formidable methodological challenges, the promise of business value from unstructured multimedia data has not materialized. In this study, we use a design science approach to develop DeepVoice, a novel nonverbal predictive analysis system for financial risk prediction, in the setting of quarterly earnings conference calls. DeepVoice forecasts financial risk by leveraging not only what managers say (verbal linguistic cues) but also how managers say it (vocal cues) during the earnings conference calls. The design of DeepVoice addresses several challenges associated with the analysis of nonverbal communication. We also propose a two-stage deep learning model to effectively integrate managers' sequential vocal and verbal cues. Using a unique dataset of 6,047 earnings call samples (audio recordings and textual transcripts) of S&P 500 firms across four years, we show that DeepVoice yields remarkably lower risk forecast errors than that achieved by previous efforts. The improvement can also translate into nontrivial economic gains in options trading. The theoretical and practical implications of analyzing vocal cues are discussed.Challenge 1: Time variation. Managers' vocal features can vary substantially during conference calls. For example, during the presentation section, when the manager is reading a forward-looking statement, the voice pitch may be quite stable. However, individuals' vocal cues may change as a result of being nervous, uncertain, or excited (Scherer et al., 1991). As such, in the question-answer section, when the manager is being scrutinized by analysts, the voice pitch may change. Mayew et al. (2020) found that the voice pitch of managers during dialogues with analysts is lower by about 4 Hz, on average, compared to their voice pitch during presentation sessions. Therefore, the time-varying patterns of managers' vocal cues may contain useful information. However, prior studies have ignored the time-varying pattern of managers' vocal cues in earnings conference calls (Mayew et al., 2020;, and it is unclear how a predictive model could effectively summarize time-varying vocal sequences.Challenge 2: Vocal-verbal integration. According to nonverbal communication theory (Mehrabian, 1972), vocal cues play a key role in nonverbal communication, as they can either affirm or discredit a message. For example, if a manager claims the company has strong future growth but uses a pessimistic tone of voice, market participants listening to the conference call may spot the irregularity and interpret the verbal message differently. This implies that to accurately gauge the information contained in managers' communication, one should integrate verbal language with the accompanying vocal cues. However, existing computational models (Poria et al., 2017) are limited in terms of integrating these two different communication channels (i.e., vocal and verbal).Challenge 3: Vocal complexity. Identifying the specific acoustic constructs from voice is a complex process, and the literature is still evolving without any consensus on an appropriate model for measuring affective states, emotions, and trust (Mayew & Venkatachalam, 2013). There is also little agreement about how a speaker's vocal cues should be combined to identify emotional states (Schuller, 2010)-e.g., which vocal cues to investigate and measure and how much weight to place on each vocal cue. As a result, attempts to develop constructs from vocal cues may suffer from high measurement errors.
Collapse
Semantic filters:
communication theoryartificial neural network
Topics:
database system enterprise information system website accounting social media
Methods:
deep learning experiment meta design machine learning design artifact
Theories:
transaction cost economics communication theory