Uncovering the Secrets of Synergy
Well, in the previous section we mentioned in passing that our technology was based on a synergistic approach, combining syntax, semantics and pragmatics. In this final part of the survey, we’ll explain just how we do this, and why our system yields unparalleled results. In doing so we’ll do our best to abstract away from the underlying mathematics and details of the machine-learning algorithms, and instead present the linguistic principles by which our algorithms work using examples. To wrap things up, we’ll end this review with a snapshot of what Digital Trowel’s Sentiment Analysis looks like in action.
Our technological approach begins with the observation that sentiment is conveyed on three interacting levels of increasing structural complexity. Namely the lexical, phrasal and semantic-event level of structure. We’ll explain.
Lexical sentiment, sometimes referred to as dictionary-based sentiment, is the sentiment attributed to single isolated words. For example:
great, wonderful, terribly, worrisome, helpful, etc…
Though single words clearly carry sentiment, this is the most rudimentary and least reliable sentiment available. To see this consider the following phrases using the above examples:
Terribly surprising comeback
Worrisome transformation for previous skeptics
Helpful in expediting the demise
It should be evident from the above phrases that the initial or “natural” sentiment associated with the isolated words, have all been transformed if not negated. To avoid such “wonderful fiascoes” in deciphering the sentiment, we employ the lexical analysis of sentiment only after the text has undergone syntactic parsing. In simple words syntactic parsing means that sentences are analyzed to determine their grammatical structure and that each word is assigned its corresponding Part Of Speech (POS) tag.
Consider the following example taken from Cisco’s website (where red and green indicate negative and positive sentiment, respectively):
If Cisco does not achieve the desired level of acceptances, the company will withdraw the offer and evaluate alternative ways to expand our activities in the video communications market.
To glean the lexical sentiment, the sentence is first parsed, i.e. grammatically analyzed. For starters, this allows us to determine the subject of the sentence (“Cisco”) as well as any pronominal phrase referring to the subject (“the company”) – both of which have been marked in bold above. Naturally, this is of critical import to us is in determining what company the sentiment is to be associated with. Secondly, once we obtain a phrasal structure of the sentence we are able to determine how a candidate lexical entry interacts with clause-mate entries. In the example above, “desired” is typically associated with positive sentiment, but this sentiment is reversed due to the negation “does not” appearing earlier in the clause. On the other hand in the subsequent clause the verb entries “evaluate” and “expand” maintain and even substantiate their positive sentiment, as there is nothing in the clause to alter their natural interpretation.
Obviously, not all lexical entries are born equal. Entries may vary both in the extent to which they convey a sentiment and their relative intra-clausal effect. For example “excellent” conveys a stronger sentiment than “good”, whereas “great” and “superb” generally indicate the same level of positivity, but “great” is more susceptible to lexical negation (cf. “great mistake” vs. “superb mistake”). Different entries therefore receive different weights, depending on their relative sentimental strength and susceptibility to polarity-transformations. In order to correctly assign weight to these words, DT uses advanced statistical models which are generated using large manually-analyzed text corpora. In addition further factors such as conditional, speculative and contra-factual clause structures are taken into account before the final contribution of specific entries are calculated.
But this is only the first and most rudimentary element of our synergistic approach. The second more complex element is that associated with the phrasal level of structure. The phrasal level of analysis assigns a sentiment value to full phrases rather than to single words. Consider the following examples:
Cisco Chief Executive John Chambers has said the firm aims to gain market share in a tech recovery.
Boosted by those moves and … following last year’s 40 percent decline
Company Struggles in Attempt to Buy Time
In the examples above the lexical level may signal certain entries are positive or negative, but only a real phrase-level analysis can ascertain the sentiment. It is here that we first allow semantic and pragmatic factors to interact. It is not enough to understand the meaning of each word in isolation, the meaning of the entire phrase must be deciphered, and to so correctly, context is needed.
Take a look for instance at the third example above. Usually when companies buy something, it’s either a product or another company. Here, however, it is clear that an idiomatic meaning is intended (buying time… stalling).
DT’s SA takes pragmatics to a whole new level. Not only do we use carefully developed word-classes to allow our engine to utilize outside knowledge in interpreting text, but, working with a team of linguists and economists we have developed specialized sets of phrase level interpretive rules, which allow the engine to identify the context of a sentence or phrase. All of this combined with the simple pragmatic module which is used to identify key companies by resolving anaphora and common nicknames and descriptors and you end up with a context identifier that allows our engine to assign sentiment to even highly complicated, idiomatic or obscure phrases. Believe it or not, allowing our semantic and pragmatic modules to collaborate, our engine is able to pick up on sarcastic, wishy-washy, and even ironic notes in the text.
This brings us to the third level of our Synergistic Sentiment Analysis, which is based on the interpretation of actual events within the text. Transcending both lexical and phrasal levels of interpretation, we have trained our engine to identify key economic events, and together with a team of experienced financial experts, we’ve created a scale of positive and negative weights for these events. Take a look at the following examples:
shares of Cisco Systems (Nasdaq: CSCO) were recently up 47 percent
Cisco expects revenue to grow 1 to 4 percent
Cisco(R) (NASDAQ: CSCO) today announced a revised recommended voluntary cash offer to acquire TANDBERG (OSLO: TAA)
All the above are real examples of events captured by our SA engine and marked as positive. We currently have our engine trained to extract and evaluate dozens of types of events including purchases, stock offerings, workforce changes, legal events, product launches or recalling, hiring and firing of key figures, new facilities, bankruptcy, etc… etc…
The event-level of our SA assigns the highest weights since it combines and epitomizes all of our techniques. Using syntactic, semantic and pragmatic analyses to determine the contribution of the event to the sentiment. In fact, we believe that by identifying and analyzing the key events in the text we are emulating just what an expert would do when attempting to estimate the sentiment associated with a given text excerpt.
Starting from the lexical level, which allows us to pick up on subtle tones in the text , building up to phrases which indicate attitude, and embedding these all within a semantic-pragmatic event extractor and economic-analyzer, we believe we are truly able to capture the sentiment of text very much like a human would, with incredible reliability and consistency. We may not have yet passed the Turintg Test, but we’re surely on the way to improve the ability of machines to “understand” the natural language that humans use!
Well, for now that’s all we can show, without divulging too much
The next time someone asks you what Turing’s Test has to do with the stock market, I hope you know where to refer them to..!
Stay tuned for our official product release, and meanwhile, as they say in Boston: Have a good one!