Research Direction

The Information Analytics and Economic Impact Research Lab focuses on developing and applying various information analytics approaches on important business problems faced by enterprises.

Current research focuses include:

 Text Mining for Business Decision Making

Textual data found in news articles, on-line forums, blogs, and financial reports often contain valuable information for the assessment of firms’ future development as well as their market values. However, current information systems and technologies provide limited support for representing, extracting, and quantifying this type of information in textual data. We are interested in developing a framework that can be used to represent and extract business-related information in textual data so that valuable information embedded within can be used to augment empirical finance and accounting research. This research direction may be fruitful given the fact that written documents such as newswires and research reports play an important role to disseminate information in financial markets.

One of the technical aspect of this research direction is to develop an automated approach that can be used to recognize risk-related information in news articles. While this research direction is still in its infant stage, we managed to developed a few prototype systems that allow us to understand its potential. We published the initial research results in IEEE Intelligence Systems and leading IS and accounting conferences such as International Conference on Information Systems (ICIS) and American Accounting Association (AAA).

 

Economic Impact of Textual Data

Accounting numbers such as earnings per share are an important source that conveys the value of firms. Previous studies on return-earnings relation have confirmed that stock prices react to the information content in accounting numbers. However, other information sources such as financial news may also contain value-relevant information and affect investors’ reaction to earnings announcements. Studying the interaction among various information source may help us understand how investors make use of information from different sources. In one of our recent study, we quantify news coverage and news sentiment about S&P 500 constituents in the Wall Street Journal before earnings announcements and model their interaction with the return-earnings relation. Our empirical results show a strong corroboration effect for positive news sentiment followed by positive earnings surprise. Moreover, negative news sentiment news followed by positive earnings surprise exhibit a surprise effect. The results suggest that investors are sophisticated in considering managers’ motivation behind accounting disclosure when exposed to multiple sources of information. Part of the research results have been accepted for publication in Decision Support Systems.

 

Syndromic Surveillance Systems

The goal of syndromic surveillance systems is to develop information technology to identify potential disease outbreaks in a timely manner. We pursue the goal by taking advantage of the free-text chief complaints entered by triage nurses. Since many diseases may show similar symptoms at early stages, the free-text chief complaints need to be classified into syndrome groups to facilitate subsequent analysis. We developed an ontology-based chief complaint classification approach that can handle unseen free-text chief complaints by leveraging the large medical terminology ontology. This research is published in Journal of Biomedical Informatics, an leading medical informatics journal. We further extended our approach to handle Chinese chief complaint. We developed a list of key Chinese medical terminology using statistical approaches. This set of terms are then used to map the Chinese chief complaints into English. Evaluation results show that our approach is better approaches based on leading machine translation systems. Our findings are published in International Journal of Medical Informatics.

 

Outlier Detection and Disease Outbreak Detection

Accurate and timely detection of infectious disease outbreaks provides valuable information which can enable public health officials to respond to major public health threats in a timely fashion. However, disease outbreaks are often not directly observable. For surveillance systems used to detect outbreaks, noises caused by routine behavioral patterns and by special events can further complicate the detection task. Most existing detection methods combine a time series filtering procedure followed by a statistical surveillance method. The performance of this "two-step” detection method is hampered by the unrealistic assumption that the training data are outbreak-free. Moreover, existing approaches are sensitive to extreme values, which are common in real-world data sets. We considered the problem of identifying outbreak patterns in a syndrome count time series using Markov switching models. The disease outbreak states are modeled as hidden state variables which control the observed time series. A jump component is introduced to absorb sporadic extreme values that may otherwise weaken the ability to detect slow-moving disease outbreaks. Our approach outperformed several state-of-the-art detection methods in terms of detection sensitivity using both simulated and real-world data. This research result is published in IEEE Transactions on Knowledge and Data Engineering.