As the closing to this series, this post will concentrate on how to use web-mined event information as variables in modeling and/or decisioning. For simplicity’s sake, I will break it down by sub-topic.
Introducing Event Data into Predictive Models
Event data can easily be introduced as predictor (independent) variables within pre-existing risk and marketing models in two basic modes. The first adds them as later stage variables, which means the pre-existing model variables are entered on a forced basis (which replicates the current state of the model), and the event variables are subsequently allowed to enter. This process ensures that the event variables are evaluated for their incremental contribution to the model, and do not displace any pre-existing model variables. In contrast, the alternative mode starts the model development from scratch, and pre-existing model variables might be replaced by the event variables. This approach may outmode the current model, but yield a more optimized set of factors.
The event input data should be coded into event types as well as time periods. For example, the number of litigation occurrences in the last 3, 6,12,18,24+ months. As a simple example, I’ve found very high correlation between the number of lawsuits from different parties and payment delinquency. Sometimes, source and quantity are desirable, but from a practical perspective they create significant complexity (a single litigation event might now be exploded into many different combinations of source and amount which need to be individually tested).
Treating Events as Triggers
Sometimes, events are hugely significant in their impact on risk and/or reward. As an obvious example, M&A, which (believe it or not) is a variable ignored in risk models. The affects of these events cannot be easily quantified in models, and so they are best treated as “triggers” or decisioning input (for subsequent manual review and intervention).
Sentiment Analysis
Sentiment Analysis of companies is one of the more interesting qualitative pieces of data that has recently become available, due to advances in web mining. Briefly stated, sentiment measures the positive or negative “buzz” about a company. The firms that utilize product sentiment analysis use varying sources and methods to produce “sentiment scores”. Minimally, sentiment analysis can be used to corroborate certain decisions, and may have predictive ability as well. Like events, sentiment scores can be used as time-based model variables, or as external triggers.
To conclude, I would like to impress that there should be no doubt that select business events affect the risk and opportunity value of a company. Event data, and its accompanying sentiment, is available on a near-real time basis on the Internet. Semantic analysis companies (such as Digital Trowel) have created a process that mines this data and presents it in coded form, which can be made available to scoring and decision models, as well as human monitors. Virtually any company relying on risk and/or potential models can incorporate this powerful information to enhance its accuracy, by employing them either as internal variables or as external decision factors.
Please contact me with any questions or comments. I can be reached by commenting on the blog, or via email at Steve (at) digitaltrowel.com
Check back soon for more in-depth exploration of the growing text-mining phenomenon.
- Steve