<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Mine Your Business</title>
	<atom:link href="http://mineyourbusiness.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://mineyourbusiness.wordpress.com</link>
	<description>The Text Mining Blog</description>
	<lastBuildDate>Wed, 08 Dec 2010 22:50:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='mineyourbusiness.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Mine Your Business</title>
		<link>http://mineyourbusiness.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://mineyourbusiness.wordpress.com/osd.xml" title="Mine Your Business" />
	<atom:link rel='hub' href='http://mineyourbusiness.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Matching &amp; Merging Professional and Company Data</title>
		<link>http://mineyourbusiness.wordpress.com/2010/12/08/matching-merging-professional-and-company-data/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/12/08/matching-merging-professional-and-company-data/#comments</comments>
		<pubDate>Wed, 08 Dec 2010 22:49:28 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=408</guid>
		<description><![CDATA[Digital Trowel was founded to help alleviate the information overload that inevitably is taking place thanks to the growth of the web. One part of what we do is gather company and executive information that we provide as relevant and up-to-date business information. One of our biggest challenges is determining if a company from one [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=408&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Digital Trowel was founded to help alleviate the information overload that inevitably is taking place thanks to the growth of the web. One part of what we do is gather company and executive information that we provide as relevant and up-to-date business information.</p>
<p>One of our biggest challenges is determining if a company from one source is the same one as we found in another source. Using our proprietary semantic Identity Resolution engine, we perform advanced matching to know if <em>this</em> company is the same as <em>that</em> company. This enables us to avoid duplicates in the database and to combine multiple information sources to create rich company and executive profiles.  This process is referred to as “Identity Resolution” or “Match &amp; Merge.”</p>
<p>Our matching system is based on advanced semantic &amp; statistical models, natural language processing and machine learning. Using our massive database of web-sourced data the system has created sets of positive and negative rules to decide if two contacts should be matched.</p>
<p><strong>Positive Rules</strong></p>
<p>The inputs are first exposed to a defined rule set made up of certain positive rules that are meant to examine the likelihood that the two inputs refer to the same entity. The names of the companies are examined first and based on the likelihood that they refer to the same company the pair is given a rating between 0 and 1.</p>
<p><em>For example: Luigi’s Pizzeria and Luigi’s Italian Restaurant will be matched and assigned a score of 1 as it is likely that the two names refer to the same company.</em></p>
<p>Next, the contact information of each company is examined, in order to ascertain a true connection between the two. This includes the physical addresses of the two entities, the phone numbers, URL’s, employee information and so on. Based on the similarities between all this information, the two entities are combined into a “matched set.”</p>
<p>“Fuzzy Matching” is utilized in the identification of spelling mistakes, and an advanced semantic process analyzes the actual meaning of the content, allowing the system to take into account synonyms and similar-meaning words (such as “restaurant” and diner). In addition, an abbreviation process considers if YMCA is its own name, or just a short hand for Young Men&#8217;s Christian Association.</p>
<p><strong>Negative Rules</strong></p>
<p>Every “matched set” is then processed through a defined rule set made up of negative rules that are meant to examine the likelihood that, regardless of the “matched set” status assigned after successfully meeting the standards of the positive rules, these two entities are different companies. The DUNS number, stock symbol and other recognized information is examined for any discrepancies. After this examination, if there is a conflict in information the set is assigned a “problematic” status, re-examined and either accepted as a match or dismissed as opposing companies.</p>
<p><strong>Merge</strong></p>
<p>After two entities are matched, the next step is to merge them correctly into one profile. Sources are assigned priority levels based on the quality, accuracy and recency of the information, so that any conflicting data can be adjusted according to the source with the highest priority. Priority and source quality is assigned independently per attribute and in this way we can refer to the most accurate information provided on one topic regardless of the accuracy of the remaining information provided by the source.</p>
<p><em>For example: If one source has great company financial data but poor phone records, aside from the overall priority score assigned to the source as a whole, each one of these attributes receives a quality and priority ranking; the source would be considered when dealing with financial information but basically disregarded when handling phone records. In this way the most relevant and accurate information per category is utilized. </em></p>
<p><strong>Precision and Recall</strong></p>
<p>We need to strike the optimal balance between precision and recall. For our purposes, precision equals the fraction of information that is correctly retrieved while recall equals the fraction of information retrieved relative to what’s available. We have created advanced tools that allow us to dial the precision and recall up or down, to find the right balance. The more “loosely“ we match, the greater the opportunity is to extract more information on the topic, leading to higher recall, the “tighter” the match, the less likely we are to connect the dots incorrectly, improving precision.</p>
<p>Depending on our customers wants and expectations, we can select the appropriate balance between precision and recall to offer the most accurate and rich concentration of knowledge to our customers.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/408/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/408/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=408&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/12/08/matching-merging-professional-and-company-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Using Web Mined Data to Enhance the Performance of Business Risk and Opportunity Models. Part 3 of 3</title>
		<link>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-3-of%c2%a03/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-3-of%c2%a03/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 13:54:47 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[risk modeling]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=384</guid>
		<description><![CDATA[As the closing to this series, this post  will concentrate on how to use web-mined event information as variables in modeling and/or decisioning. For simplicity’s sake, I will break it down by sub-topic. Introducing Event Data into Predictive Models Event data can easily be introduced as predictor (independent) variables within pre-existing risk and marketing models [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=384&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As the closing to this series, this post  will concentrate on how to use web-mined event information as variables in modeling and/or decisioning. For simplicity’s sake, I will break it down by sub-topic.</p>
<p><strong>Introducing Event Data into Predictive Models</strong></p>
<p>Event data can easily be introduced as predictor (independent) variables within pre-existing risk and marketing models in two basic modes.  The first adds them as later stage variables, which means the pre-existing model variables are entered on a forced basis (which replicates the current state of the model), and the event variables are subsequently allowed to enter.  This process ensures that the event variables are evaluated for their incremental contribution to the model, and do not displace any pre-existing model variables.  In contrast, the alternative mode starts the model development from scratch, and pre-existing model variables might be replaced by the event variables.  This approach may outmode the current model, but yield a more optimized set of factors.</p>
<p>The event input data should be coded into event types as well as time periods. For example, the number of litigation occurrences in the last 3, 6,12,18,24+ months.  As a simple example, I’ve found very high correlation between the number of lawsuits from different parties and payment delinquency.  Sometimes, source and quantity are desirable, but from a practical perspective they create significant complexity (a single litigation event might now be exploded into many different combinations of source and amount which need to be individually tested).</p>
<p><strong>Treating Events as Triggers </strong></p>
<p>Sometimes, events are hugely significant in their impact on risk and/or reward.  As an obvious example, M&amp;A, which (believe it or not) is a variable ignored in risk models.  The affects of these events cannot be easily quantified in models, and so they are best treated as “triggers” or decisioning input (for subsequent manual review and intervention).</p>
<p><strong>Sentiment Analysis</strong></p>
<p>Sentiment Analysis of companies is one of the more interesting qualitative pieces of data that has recently become available, due to advances in web mining. Briefly stated, sentiment measures the positive or negative “buzz” about a company.  The firms that utilize product sentiment analysis use varying sources and methods to produce “sentiment scores”.  Minimally, sentiment analysis can be used to corroborate certain decisions, and may have predictive ability as well.  Like events, sentiment scores can be used as time-based model variables, or as external triggers.</p>
<p>To conclude, I would like to impress that there should be no doubt that select business events affect the risk and opportunity value of a company.  Event data, and its accompanying sentiment, is available on a near-real time basis on the Internet.  Semantic analysis companies (such as Digital Trowel) have created a process that mines this data and presents it in coded form, which can be made available to scoring and decision models, as well as human monitors.  Virtually any company relying on risk and/or potential models can incorporate this powerful information to enhance its accuracy, by employing them either as internal variables or as external decision factors.</p>
<p>Please contact me with any questions or comments. I can be reached by commenting on the blog, or via email at Steve (at) digitaltrowel.com</p>
<p>Check back soon for more in-depth exploration of the growing text-mining phenomenon.</p>
<p>- <a title="Digital Trowel About Us - Tech Teama" href="http://digitaltrowel.com/aboutus/techteam.asp">Steve</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/384/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=384&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-3-of%c2%a03/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Using Web Mined Data to Enhance the Performance of Business Risk and Opportunity Models. Part 2 of 3</title>
		<link>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-2%c2%a0of%c2%a03/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-2%c2%a0of%c2%a03/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 13:54:12 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[risk modeling]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=382</guid>
		<description><![CDATA[I would now like to explore the concept of “Business Events,” particularly their affect on company risk. First things first, let’s define risk.  The traditional definition of risk is a company will be unable to make the required payments on its debt obligations.  This, of course, is a narrow financial definition, and if you’re a lender [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=382&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I would now like to explore the concept of “Business Events,” particularly their affect on company risk. First things first, let’s define risk.  The traditional definition of risk is a company will be unable to make the required payments on its debt obligations.  This, of course, is a narrow financial definition, and if you’re a lender that’s probably exactly what you care about. If you’re a supplier, on the other hand, you probably view risk in a larger scope; for example, is your customer having financial difficulties and will he demand to renegotiate payment terms on a more extended basis, renegotiate pricing in a downward direction, reduce his order commitments, and so on?  Furthermore, although many risk models have a 1-2 year horizon, a short-term view is also needed, and web-data can be used in that short-term (1-6 month) context.</p>
<p>Regardless of whether you’re a lender, supplier, analyst or salesperson, here are some events that negatively impact the growth behavior of a company, and that can be mined from web-data in a very updated manner:</p>
<ul>
<li>Litigation: When a company starts to have cash flow problems, one of its first reactions is to delay payment to some suppliers.  At some point, payment delinquency moves from the “tolerant” stage to the litigation stage.  But what happens if you’re modeling the risk of a company and do not have access to their AP/AR data?  How do you recognize litigation without waiting for it to possibly appear in a financial report?  Fortunately, there are fee-based web-based sources that detect and track litigation including LexisNexis, Public Access to Court Electronic Records (PACER), and D&amp;B.  Publicized litigation that has made it into the media can be obtained at little or no direct cost, and recent (2010) examples of major litigation include BP,  Microsoft’s suit against Salesfore.com (patent infringement), Borg-Warner (asbestos product liability), and Chrysler (failure to pay suppliers).  Of course, the above companies are large enough to withstand the litigation payouts to avoid default; but what does this do to their sales &amp; marketing budget and supplier terms?  In our more expanded view of risk, these are important topics!</li>
<li>Analyst Recommendations: Analyst recommendations often, and quickly, affect a company’s stock price.  Downward recommendations that cause the stock to fall, place pressure on the company to compensate. A typical reaction is to cut expenses in order to boost earnings. Of course, this action does not bode well for the company’s S&amp;M efforts, or their suppliers.</li>
<li>Partnerships: Partnerships usually indicate positive growth activity, and by logical extension, lower the company risk.</li>
<li>M&amp;A: M&amp;A logically reduces the target company’s risk.  Although M&amp;A (and even its announcement) should immediately change the risk score of the target company, this is usually not the case, since the scoring models have no way of quickly recognizing the event.</li>
<li>Key employee movement: When a company hires a heavyweight Senior executive, it is invariably a growth move, which should lower risk (otherwise they would likely not take the new position).</li>
<li>Insider trading: The purchasing of shares by insiders is often a leading indicator that they expect the stock will go up in the near future (which is itself a leading indicator that the company will expand due to its increased market cap)</li>
<li>Product introductions:  A new product introduction is typically a leading indicator of growth, hype, success, and similar; these are all leading indicators that reflect a lowering of risk.</li>
<li>Product recalls (pharma): At a minimum, product recalls offer a negative distraction to sales.  Sometimes, for example in the Pharma sector, recalls can have a devastating affect on sales.  Sometimes, for example in the auto industry, they may have a more temporary affect. But in either case, they diminish the strength of a company.</li>
<li>Financial announcements:  Financial announcements are excellent leading indicators, on the upside and downside. They appear on the Internet well before they appear in the financial statements that are used to drive typical company risk models.   Competitive tracking:  Significant changes in competitive activity greatly affect market potential models, and could well affect risk models.</li>
<li>Competitive monitoring becomes increasingly important in economic downturns, since supplier loyalty is overshadowed by the customer need for cost reductions.   Generally speaking, as direct competition grows, it becomes more formidable to deal with, and the competitive events including product, financial, employment, and so on should be quantified and incorporated into both risk and marketing models .Whew! Now that we are all caught up on Business Events, check back for the third and final post of the series that will tie everything together.</li>
</ul>
<p>Check back in a couple of days for <strong>Part 3:</strong> <strong>Using Web Mined Data to Enhance the Performance of Business Risk and Opportunity Models</strong></p>
<p>Please contact me with any questions or comments. I can be reached by commenting on the blog, or via email at Steve (at) digitaltrowel.com</p>
<p>Cheers!</p>
<p>- <a title="Digital Trowel About Us Tech Team" href="http://digitaltrowel.com/aboutus/techteam.asp" target="_blank">Steve</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/382/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=382&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/08/11/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-2%c2%a0of%c2%a03/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Using Web Mined Data to Enhance the Performance of Business Risk and Opportunity Models. Part 1 of 3</title>
		<link>http://mineyourbusiness.wordpress.com/2010/08/09/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-1-of%c2%a03/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/08/09/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-1-of%c2%a03/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 13:53:47 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[risk modeling]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=380</guid>
		<description><![CDATA[As you may know, business risk models have not fundamentally changed over the past 40 years.  The famed Altman Z-score model, first published in 1968 by Edward Altman, is still being used as a pillar in the area of modeling bankruptcy.  Why? Well, because risk models are typically founded on basic financial information such as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=380&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As you may know, business risk models have not fundamentally changed over the past 40 years.  The famed Altman Z-score model, first published in 1968 by Edward Altman, is still being used as a pillar in the area of modeling bankruptcy.  Why? Well, because risk models are typically founded on basic financial information such as working capital, total assets, retained earnings, EBIT, equity, sales, and similar financial statistics that reflect fundamental measurements of company health. Since the importance of these basic financial barometers hasn’t changed over time, the models that employ them haven’t needed to change either.  It is true that improvements in risk model performance can be made by incorporating payment patterns, however this is more suitable for internal customer scoring models, as finding enough reliable and ongoing payment data for an external risk model build and score is difficult indeed!</p>
<p>Having been in the business risk and opportunity-modeling arena for many years, I’ve come to the conclusion that the greatest weakness in business data modeling is quite simply the age of the data.  There is no doubt that a downswing in EBIT spells bad news; but by the time that is recognized in a financial report, it’s very late in the game, and no modeling technique can overcome the limitations of old data.  In my search for a better source of leading indicators, I naturally gravitated to the internet.  After all, the Internet offers an unparalleled rich, dynamic source of data in both quantitative (e.g. financial reports) and qualitative (e.g. sentiment) form, and many of these are inherently powerful leading indicators of both risk and opportunity.</p>
<p>Not coincidentally, statistical package developers such as SAS and SPSS have already launched applications that combine text mining and analytics.  However, for many companies, it will be preferable to gather the data as a separate process, and then integrate it into their modeling/decisioning processes. Recently, I’ve found that incorporation of web data can improve the accuracy/timeliness of risk-based decisions by as much as 20%; even larger benefits can be expected in the area of market potential analytics.</p>
<p>Stay tuned for my upcoming musings on this topic. <strong>Part 2: Using Web Mined Data to Enhance the Performance of Business Risk and Opportunity Models</strong></p>
<p>Please contact me with any questions or comments. I can be reached by commenting on the blog, or via email at Steve (at) digitaltrowel.com</p>
<p>Looking forward to an active dialogue.</p>
<p>- <a title="Digital Trowel About Us Tech Team" href="http://digitaltrowel.com/aboutus/techteam.asp" target="_blank">Steve</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/380/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/380/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=380&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/08/09/using-web-mined-data-to-enhance-the-performance-of-business-risk-and-opportunity-models-part-1-of%c2%a03/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>We’re back!</title>
		<link>http://mineyourbusiness.wordpress.com/2010/08/09/were-back/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/08/09/were-back/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 13:53:33 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=378</guid>
		<description><![CDATA[Hey Everyone, So much has been happening here at Digital Trowel in the past months, and we&#8217;ve sort of let the blog fall to the side. But no more! This blog is now a company-wide affair. You&#8217;ll be seeing regular postings from numerous members of our team, about everything from text mining and data analytics, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=378&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<div id="_mcePaste">Hey Everyone,</div>
<div id="_mcePaste">So much has been happening here at Digital Trowel in the past months, and we&#8217;ve sort of let the blog fall to the side. But no more!</div>
<div>This blog is now a company-wide affair. You&#8217;ll be seeing regular postings from numerous members of our team, about everything from text mining and data analytics, to new product ideas and development updates.</div>
<div>First up &#8211; Steve Gasner, our Chief Data Officer, posting about risk modeling.</div>
<div>Please comment and reply. All are welcome. For any other questions, please feel free to email yoni (at) digitaltrowel.com with any questions or comments.</div>
<div>Enjoy!</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/378/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=378&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/08/09/were-back/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Turing’s Test &amp; The Stock Market &#8211; Part 3</title>
		<link>http://mineyourbusiness.wordpress.com/2010/05/30/turing%e2%80%99s-test-the-stock-market-part-3/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/05/30/turing%e2%80%99s-test-the-stock-market-part-3/#comments</comments>
		<pubDate>Sun, 30 May 2010 12:57:37 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=356</guid>
		<description><![CDATA[Uncovering the Secrets of Synergy Well, in the previous section we mentioned in passing that our technology was based on a synergistic approach, combining syntax, semantics and pragmatics. In this final part of the survey, we&#8217;ll explain just how we do this, and why our system yields unparalleled results. In doing so we&#8217;ll do our [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=356&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h1 style="text-align:justify;"><em>Uncovering the Secrets of Synergy</em></h1>
<p style="text-align:justify;"><strong> </strong></p>
<p style="text-align:justify;">Well, in the previous section we mentioned in passing that our technology was based on a synergistic approach, combining syntax, semantics and pragmatics. In this final part of the survey, we&#8217;ll explain just how we do this, and why our system yields unparalleled results. In doing so we&#8217;ll do our best to abstract away from the underlying mathematics and details of the machine-learning algorithms, and instead present the linguistic principles by which our algorithms work using examples. To wrap things up, we&#8217;ll end this review with a snapshot of what Digital Trowel&#8217;s Sentiment Analysis looks like in action.</p>
<p style="text-align:justify;">Our technological approach begins with the observation that sentiment is conveyed on three interacting levels of increasing structural complexity. Namely the lexical, phrasal and semantic-event level of structure. We&#8217;ll explain.</p>
<p>Lexical sentiment, sometimes referred to as dictionary-based sentiment, is the sentiment attributed to single isolated words. For example:</p>
<p><em>great, wonderful, terribly, worrisome, helpful, etc&#8230; </em></p>
<p>Though single words clearly carry sentiment, this is the most rudimentary and <em>least reliable</em> sentiment available. To see this consider the following phrases using the above examples:<br />
<em><br />
Great failure<br />
Wonderful fiasco<br />
Terribly surprising comeback<br />
Worrisome transformation for previous skeptics<br />
Helpful in expediting the demise</em></p>
<p>It should be evident from the above phrases that the initial or &#8220;natural&#8221; sentiment associated with the isolated words, have all been transformed if not negated. To avoid such &#8220;wonderful fiascoes&#8221; in deciphering the sentiment, we employ the lexical analysis of sentiment only after the text has undergone syntactic parsing. In simple words syntactic parsing means that sentences are analyzed to determine their grammatical structure and that each word is assigned its corresponding Part Of Speech (POS) tag.</p>
<p>Consider the following example taken from Cisco&#8217;s website (where<span style="color:#ff0000;"> red</span> and <span style="color:#00ff00;">green </span>indicate negative and positive sentiment, respectively):<br />
<em><br />
If <strong>Cisco</strong> </em><em><span style="color:#ff0000;">does not achieve the desired level of acceptances</span>, <strong>the company</strong> will withdraw the offer and </em><em><span style="color:#00ff00;">evaluate alternative ways to expand our activities in the video communications market.</span></em></p>
<p>To glean the lexical sentiment, the sentence is first parsed, i.e. grammatically analyzed. For starters, this allows us to determine the subject of the sentence (&#8220;Cisco&#8221;) as well as any pronominal phrase referring to the subject (&#8220;the company&#8221;) &#8211; both of which have been marked in bold above. Naturally, this is of critical import to us is in determining what company the sentiment is to be associated with. Secondly, once we obtain a phrasal structure of the sentence we are able to determine how a candidate lexical entry interacts with clause-mate entries. In the example above, &#8220;desired&#8221; is typically associated with positive sentiment, but this sentiment is reversed due to the negation &#8220;does not&#8221; appearing earlier in the clause. On the other hand in the subsequent clause the verb entries &#8220;evaluate&#8221; and &#8220;expand&#8221; maintain and even substantiate their positive sentiment, as there is nothing in the clause to alter their natural interpretation.</p>
<p>Obviously, not all lexical entries are born equal. Entries may vary both in the extent to which they convey a sentiment and their relative intra-clausal effect. For example &#8220;excellent&#8221; conveys a stronger sentiment than &#8220;good&#8221;, whereas &#8220;great&#8221; and &#8220;superb&#8221; generally indicate the same level of positivity, but &#8220;great&#8221; is more susceptible to lexical negation (cf. &#8220;great mistake&#8221; vs. &#8220;superb mistake&#8221;). Different entries therefore receive different weights, depending on their relative sentimental strength and susceptibility to polarity-transformations. In order to correctly assign weight to these words, DT uses advanced statistical models which are generated using large manually-analyzed text corpora. In addition further factors such as conditional, speculative and contra-factual clause structures are taken into account before the final contribution of specific entries are calculated.</p>
<p>But this is only the first and most rudimentary element of our synergistic approach.  The second more complex element is that associated with the phrasal level of structure. The phrasal level of analysis assigns a sentiment value to full phrases rather than to single words. Consider the following examples:</p>
<p style="text-align:justify;"><strong>Cisco</strong> Chief Executive John Chambers has said <strong>the firm</strong> aims to <span style="color:#00ff00;">gain market share</span> in <span style="color:#00ff00;">a </span><span style="color:#00ff00;">tech recovery</span>.</p>
<p style="text-align:justify;"><span style="color:#00ff00;">Boosted by those moves</span> and  &#8230;  following last year&#8217;s <span style="color:#ff0000;">40 percent decline</span></p>
<p style="text-align:justify;">Company <span style="color:#ff0000;">Struggles in Attempt to Buy Time</span></p>
<p style="text-align:justify;">In the examples above the lexical level may signal certain entries are positive or negative, but  only a real phrase-level analysis can ascertain the sentiment. It is here that we first allow semantic and pragmatic factors to interact. It is not enough to understand the meaning of each word in isolation, the meaning of the entire phrase must be deciphered, and to so  correctly, context is needed.</p>
<p style="text-align:justify;">Take a look for instance at the third example above. Usually when companies buy something, it&#8217;s either a product or another company. Here, however, it is clear that an idiomatic meaning is intended (buying time&#8230; stalling).</p>
<p style="text-align:justify;">DT&#8217;s SA takes pragmatics to a whole new level. Not only do we use carefully developed word-classes to allow our engine to utilize outside knowledge in interpreting text, but, working with a team of linguists and economists we have developed specialized sets of phrase level interpretive rules, which allow the engine to identify the context of a sentence or phrase.  All of this combined with the simple pragmatic module which is used to identify key companies by resolving anaphora and common nicknames and descriptors and you end up with a context identifier that allows our engine to assign sentiment to even highly complicated, idiomatic or obscure phrases. Believe it or not, allowing our semantic and pragmatic modules to collaborate, our engine is able to pick up on sarcastic, wishy-washy, and even ironic notes in the text.</p>
<p style="text-align:justify;">This brings us to the third level of our Synergistic Sentiment Analysis, which is based on the interpretation of actual events within the text. Transcending both lexical and phrasal levels of interpretation, we have trained our engine to identify key economic events, and together with a team of experienced financial experts, we&#8217;ve created a scale of positive and negative weights for these events. Take a look at the following examples:</p>
<p style="text-align:justify;"><strong><span style="color:#00ff00;">shares of Cisco Systems (Nasdaq: CSCO) were recently up 47 percent</span></strong></p>
<p style="text-align:justify;"><strong><span style="color:#00ff00;">Cisco expects revenue to grow 1 to 4 percent</span></strong></p>
<p style="text-align:justify;"><strong><span style="color:#00ff00;">Cisco(R) (NASDAQ: CSCO) today announced a revised recommended voluntary cash offer to acquire TANDBERG (OSLO: TAA)</span></strong></p>
<p style="text-align:justify;">All the above are real examples of events captured by our SA engine and marked as positive. We currently have our engine trained to extract and evaluate dozens of types of events including purchases, stock offerings, workforce changes, legal events, product launches or recalling, hiring and firing of key figures, new facilities, bankruptcy, etc&#8230; etc&#8230;</p>
<p style="text-align:justify;">The event-level of our SA assigns the highest weights since it combines and epitomizes all of our techniques. Using syntactic, semantic and pragmatic analyses to determine the contribution of the event to the sentiment. In fact, we believe that by identifying and analyzing the key events in the text we are emulating just what an expert would do when attempting to estimate the sentiment associated with a given text excerpt.</p>
<p style="text-align:justify;">Starting from the lexical level, which allows us to pick up on subtle tones in the text , building up to phrases which indicate attitude, and embedding these all within a semantic-pragmatic event extractor and economic-analyzer, we believe we are truly able to capture the sentiment of text very much like a human would, with incredible reliability and consistency. We may not have yet passed the Turintg Test, but we&#8217;re surely on the way to improve the ability of machines to &#8220;understand&#8221; the natural language that humans use!</p>
<p style="text-align:justify;"><span style="font-family:Verdana;line-height:normal;">Well, for now that&#8217;s all we can show, without divulging too much <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </span></p>
<div style="text-align:justify;"><span style="font-family:Verdana;line-height:normal;">I sincerely hope that you now know better understand Digital Trowel&#8217;s pioneering Synergistic Sentiment Analysis technology, and even more so I hope you&#8217;ve enjoyed the ride&#8230;</p>
<p>The next time someone asks you what Turing&#8217;s Test has to do with the stock market, I hope you know where to refer them to..!</p>
<p>Stay tuned for our official product release, and meanwhile, as they say in Boston: Have a good one! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p></span></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/356/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=356&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/05/30/turing%e2%80%99s-test-the-stock-market-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Turing’s Test &amp; The Stock Market &#8211; Part 2</title>
		<link>http://mineyourbusiness.wordpress.com/2010/04/06/turing%e2%80%99s-test-the-stock-market-part-2-2/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/04/06/turing%e2%80%99s-test-the-stock-market-part-2-2/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 17:45:48 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=326</guid>
		<description><![CDATA[Part 2 &#8211; Synergistic Sentiment Analysis: The Space Between the Lines Welcome back! Sit down and buckle up for a magical tour of the text mining technology focusing on Sentiment Analysis (SA). Well, first things first. What is Sentiment Analysis anyway? Rephrasing the Wikipedia definition, Sentiment analysis (sometimes called opinion mining) refers to an area of Natural Language Processing [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=326&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h1>Part 2 &#8211; Synergistic Sentiment Analysis:</h1>
<h2><strong><em>The Space Between the Lines</em></strong></h2>
<h2 style="text-align:justify;"><span style="font-weight:normal;font-size:13px;">Welcome back! Sit down and buckle up for a magical tour of the text mining technology focusing on Sentiment Analysis (SA).</span></h2>
<p style="text-align:justify;">Well, first things first. What is Sentiment Analysis anyway? Rephrasing the <a id="nou4" title="Wikipedia definition" href="http://en.wikipedia.org/wiki/Sentiment_analysis">Wikipedia definition</a>, <strong>Sentiment analysis</strong> (sometimes called <strong>opinion mining</strong>) refers to an area of <a title="Natural language processing" href="http://en.wikipedia.org/wiki/Natural_language_processing">Natural Language Processing</a> (NLP), which aims to determine the attitude of a writer with respect to some topic. This attitude may be their judgment or evaluation, their emotional state when writing or the intended emotional communication the author wishes to convey.</p>
<p style="text-align:justify;">To keep our discussion as concrete as possible we&#8217;ll use real life examples to elucidate the different types of attitudes.  Consider the following example:</p>
<p style="text-align:justify;"><em>This year was a setup year for B&amp;N, and 2010 will see its efforts start to pay off [...]</em> <em>In 2010, B&amp;N will rack up significant sales of Nooks and e-books, as some consumers look for an Amazon alternative.</em></p>
<p style="text-align:justify;">Obviously this excerpt contains an explicit positive evaluation for Barnes and Noble for 2010, but moreover the tone is upbeat, optimistic, and even excited. A good Sentiment Analysis would pick up on this tone and report a highly positive sentiment for B&amp;N and their e-reader Nook, whereas a negative or at least an apprehensive sentiment should be reported for Amazon.</p>
<p style="text-align:justify;">The next example is even more blatant:</p>
<p style="text-align:justify;"><em>Belated Happy New Year and already what a year it&#8217;s turning out to be for eReaders! [...] Time&#8217;s a fave around here these days, especially considering its December report naming nook one of the <a rel="nofollow" href="http://www.time.com/time/specials/packages/article/0,28804,1933520_1933522_1933478,00.html" target="_blank">Best Travel Gadgets of 2009</a> as well as rating the device # 2 among the <a rel="nofollow" href="http://www.time.com/time/specials/packages/article/0,28804,1945379_1944278_1944289,00.html" target="_blank">Top Ten Gadgets</a> of the year. While emphasizing nook&#8217;s &#8220;classy book-lending feature&#8221;, the magazine also cited &#8220;the powerful, flexible Android operating system that the whole package runs on.&#8221;</em></p>
<p style="text-align:justify;">
<p style="text-align:justify;">The exclamation mark, the rhythm, the tone, the profuse use of superlatives and positive adjectives all indicate an extremely positive sentiment for the nook product. It is clear that the author has a favorable opinion of the product, and moreover that he is quite eager to share his enthusiasm with the readers.</p>
<p style="text-align:justify;">
<p style="text-align:justify;">Obviously these are not the only attitudes that can be found on the web. Other attitudes may include anticipation, sarcasm, doubt, apprehension, cynicism and even condemnation. It&#8217;s our nature to focus on the good, so we&#8217;ll spare you examples of the negative attitudes (well, I guess it&#8217;s also that we prefer to avoid any unnecessary lawsuits <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ) but the basic idea of what is meant by an underlying attitude should be clear by now.</p>
<p style="text-align:justify;">It&#8217;s important to keep in mind that Sentiment Analysis is not severed from the basic meaning of the sentence. Rather, SA picks up on the basic meaning and further capitalizes on the cadence, the tone, the choice of words, and even the absence thereof, to build a complete picture of the message being conveyed. Note that we&#8217;ve implicitly drawn a line between some sort of &#8220;basic meaning&#8221; of a sentence, and the &#8220;ultimate intention&#8221; of the  message to be conveyed. Let&#8217;s try and be a bit more precise and explicit about this distinction.</p>
<p style="text-align:justify;">Formal linguistic theory usually recognizes 3 levels of abstraction for natural language comprehension: Syntax, Semantics and Pragmatics (we are excluding phonology, phonetics and morphology which are irrelevant here). Simply stated, Syntax is the study of the grammatical structure of sentences, Semantics deals with how words are interpreted and how their interpretation is combined to yield the meaning of the sentence, and Pragmatics is the study of how extra-linguistic, real-world knowledge, so to speak, interacts with the basic meaning of sentences to yield the ultimate message conveyed.</p>
<p style="text-align:justify;">So for example, syntactic theories may attempt to explain why the English sentence &#8220;I gave that to you&#8221; is fine whereas, &#8220;You gave that to I&#8221; is ungrammatical. Semantic theories may attempt to explain what the meaning of a word such as &#8220;tall&#8221; is, and how this meaning can be reconciled with seemingly problematic examples such as &#8220;I am tall&#8221; vs. &#8220;The midget is only 4 feet tall&#8221;. Pragmatics, goes one step further and attempts to explain how our knowledge of the world, circumstances, etc. play with and alter the meaning of the conveyed message. So for example, although strictly speaking the sentence: &#8220;I have 3 children&#8221;, does not formally preclude the possibility that I have more than 3, say 5 children, it would generally be considered wrong, or at least odd, for someone who indeed has 5 children to utter the original sentence: &#8220;I have 3 children&#8221;. To see how this judgment may change with circumstances, imagine Mr. Jones is being interviewed by the IRS, when he is notified by the interviewer that tax benefits are available to anyone with 3 or more children. Under these circumstance, we would probably no longer consider it odd for Mr. Jones to say &#8220;I have 3 children&#8221;, even if in fact he had 10 children.</p>
<p style="text-align:justify;">So where does Sentiment Analysis fit in this 3-headed theoretical framework? If you&#8217;re guessing the answer lies somewhere between semantics and pragmatics, perhaps with a bit of a syntactic-twist, you&#8217;re following this introduction just fine. (If, on the other hand, you thought it was limited to the syntax, you may want to go brew yourself a fresh cup of coffee before you reread the last few paragraphs <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ).</p>
<p style="text-align:justify;">Mirroring the theoretical image portrayed above, Natural Language Processing algorithms consist of syntactic algorithms (most notably Part Of Speech (POS) parsers and taggers), semantic algorithms (e.g. semantic rulebooks and relation extraction algorithms) and finally pragmatic algorithms (including for example, contextual disambiguating algorithms, and world knowledge look-up algorithms, used in automated translators for instance). At Digital Trowel we&#8217;ve honed our Sentiment Analysis algorithms to combine the strengths of these 3 disciplines to produce the most reliable and comprehensive understanding of the message being conveyed, reading not only the text itself, but also<em><strong> between the lines</strong></em>, so to speak.</p>
<p style="text-align:justify;">The mathematical implementation of these algorithms is beyond the scope of this introduction, but this by no means should prevent us from taking advantage of the knowledge we&#8217;ve gained thus far to see how Sentiment Analysis techniques may harness the power of the different types of linguistic algorithms in an attempt to achieve their goal. In fact the lion&#8217;s share of the third part of this survey aims to do just that. For now, suffice it to say that one of the main reasons we at DT believe that our technology is superior has to do with our <em><strong>synergistic approach of integrating syntactic, semantic and pragmatic algorithms</strong></em>. This is why we call it Synergistic Sentiment Analysis (SSA). BTW, for those of you wondering, synergy is the term used to describe a situation where different entities cooperate advantageously for a final outcome (tx, Wikipedia!). There, you now understand yet another word in the titles above <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p style="text-align:justify;">Before ending this part, let&#8217;s focus on the goals of SA, or in other words, what SA is good for. Well, in one sentence, as we already phrased it:</p>
<p style="text-align:justify;"><strong><em><strong><em>Extracting and discerning the underlying sentiment allows us to transform otherwise inert texts into vibrant business opportunities.</em></strong><br />
</em></strong></p>
<p style="text-align:justify;"><strong><em> </em></strong></p>
<p style="text-align:justify;"><strong><em> </em></strong></p>
<p><strong><em> </em></strong></p>
<p><strong><em> </em></strong></p>
<p><strong><em> </em></strong></p>
<div id="_mcePaste" style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;">But how does this come about? I think the best way to explain is by using an example:</span></span></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;"><br />
</span></span></div>
<div id="_mcePaste" style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;">Every day, millions of business news articles are published on the web. Many of these articles contain both facts as well as judgments, predictions, and just plain old sentiment. Obviously, it is impossible for any one human (or even a team of a hundred people) to read all these articles, sieve and sort through them, extract the facts and discern the sentiment, let alone do this all in real time to facilitate decision-making. This is where our SA engine comes in.</span></span></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;">In a few seconds, our Sentiment Analysis engine can run through thousands and thousands of articles, sorting them for industry, company, product, etc., extracting key facts and events, and discerning the underlying sentiment. Take the stock market for example. Within less than 10 seconds, our SA engine can scan every article mentioning any NYSE company for example, published within a specified time range. Not only are key facts and events compiled into our database, but a sentiment score is calculated and generated for each ticker, yielding a real time numeric indication of the stock&#8217;s vibe for each company on the market! Numeric scores can be translated into an array of decision making procedures, and help with consolidating trading strategies. Now if that isn&#8217;t a great business idea, I don&#8217;t know what would constitute one!</span></span></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"></div>
<div style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;"><br />
</span></span></div>
<div id="_mcePaste" style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;">There are many other business opportunities for the SA technology, including some of which we&#8217;ve already implemented at DT such as evaluating pharmaceutical forums for client&#8217;s sentiment about drugs, as well as sports product satisfaction, but I think this is enough hype for now <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></span></div>
<div id="_mcePaste" style="text-align:justify;"><span style="font-weight:normal;"><span style="font-style:normal;">The third and final part of this introduction to the field of SA, goes a bit deeper into the SA engine itself, and examines the innovative technology unique to Digital Trowel using real examples&#8230; Stay tuned, this is where things get really exciting <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </span></span></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/326/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=326&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/04/06/turing%e2%80%99s-test-the-stock-market-part-2-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>
	</item>
		<item>
		<title>Turing&#8217;s Test &amp; The Stock Market</title>
		<link>http://mineyourbusiness.wordpress.com/2010/03/14/turings-test-the-stock-market/</link>
		<comments>http://mineyourbusiness.wordpress.com/2010/03/14/turings-test-the-stock-market/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 21:54:28 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[informatinon extraction]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[sa]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[stock market]]></category>
		<category><![CDATA[stock market sentiment]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[turing test]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=280</guid>
		<description><![CDATA[Turing&#8217;s Test &#38; The Stock Market A Non-standard Introduction to Sentiment Analysis in 3 Parts Part 1 &#8211; CAPTCHA to Gotcha: A Brief History of Artificial Intelligence Alan Turing was a prominent British mathematician and one of the most inspiring pioneers of modern computer science. In 1950, at the age of 38, he published his [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=280&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h1 style="text-align:justify;">Turing&#8217;s Test &amp; The Stock Market</h1>
<p style="text-align:justify;"><em><strong>A Non-standard Introduction to Sentiment Analysis in 3 Parts</strong></em></p>
<h2 style="text-align:justify;"><strong>Part 1 &#8211; CAPTCHA to Gotcha: </strong></h2>
<p style="text-align:justify;"><strong><em>A Brief History of Artificial Intelligence</em></strong></p>
<p style="text-align:justify;">Alan Turing was a prominent British mathematician and one of the most inspiring pioneers of modern computer science. In 1950, at the age of 38, he published his seminal paper <em><a href="http://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence" target="_blank">Computing Machinery and Intelligence</a></em>, which till this day remains probably the single most influential paper in the field of Artificial Intelligence (AI).</p>
<p style="text-align:justify;">Since Digital Trowel&#8217;s core technology is based on machine learning, a modern offshoot of AI, it would be conducive (and nice!) to get back to the basics, and learn a bit about the history that continues to shape both the science itself and the challenges we face at DT.</p>
<p style="text-align:justify;">Big words and complications aside, Turing begins his paper with the simple yet perplexing question: <em>&#8220;Can machines think?&#8221;</em> Nevertheless, realizing that &#8220;thinking&#8221; is a highly ambiguous term, Turing immediately proposed an alternative question that would be free of obscurities and eschew obfuscations. Instead of dealing with machines&#8217; capacity for thinking, he focused on their capacity to emulate human thought. In simplified terms the question he suggested was:</p>
<p style="text-align:justify;"><em>Could machines be made to simulate human thought well enough so as to fool a person into believing they were actually human?</em></p>
<p style="text-align:justify;">This question is the essence of what has come to be called the <strong>Turing Test</strong>. It <a href="http://en.wikipedia.org/wiki/Turing_test" target="_blank">proceeds as follows:</a> a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine&#8217;s intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen.</p>
<p style="text-align:justify;">At the time of its publication, many people viewed the prospect of machines ever reaching the level of human computational power an impossibility, but in his paper, Turing, armed with his visionary intuition and razor-sharp mathematical analysis, set out to invalidate contemporary objections, ending with a speculation of his own, that one day machines would indeed emulate human thought, thereby passing the Turing Test!</p>
<p style="text-align:justify;">Inspired by the challenge,<strong><em> Digital Trowel&#8217;s groundbreaking technology has taken several huge steps forward in proving that Turing was right</em></strong>. The technology we&#8217;ve developed allows computers to extract not only the facts communicated by the text, but also the underlying sentiment or, if you will, the attitude associated with the message conveyed. In simple words, we&#8217;re enabling computers to understand the full meaning not only of the text, but of the subtext &#8211; just as a human would. But hold your horses! Before we continue, let&#8217;s try to explain why the problem is so difficult, so we can more fully appreciate the profundity of Digital Trowel&#8217;s achievement and its extraordinary implications.</p>
<p style="text-align:justify;">Well for one thing, it&#8217;s now sixty years later and the question presented by Turing has yet to be settled. In fact, it is far from being resolved. Machines have beaten world chess champions, navigated spacecrafts millions of miles away and even been used to prove mathematical theorems whose intricacy is impenetrable to human beings for their sheer magnitude of computational complexity, yet as of now no computer has been shown to pass the test.</p>
<p style="text-align:justify;">It may be argued that the challenge machines face is simply a matter of raw computing power. It is currently estimated by <a href="http://insidehpc.com/2009/03/12/even-supercomputers-not-yet-close-to-the-raw-power-of-human-brain/" target="_blank">some experts</a> that the human brain can perform some 38,000 trillion operations per second (that&#8217;s 3.8&#215;10<sup>16</sup> operations!) and hold over 3,500 terabytes of memory. In comparison, the world&#8217;s most powerful supercomputers (e.g. IBM’s BlueGene) have computational capacity of less than a &#8220;mere&#8221; 100 trillion operations per second (only 10<sup>14</sup>) and less than 10 terabytes of storage. However, if this indeed is the case, and the capacity to &#8220;think&#8221; lies in raw computation power alone, then according to some versions of <a href="http://en.wikipedia.org/wiki/Moore%27s_law" target="_blank">Moore&#8217;s Law</a> (which predicts the rate at which computation performance evolves with time) machines will ultimately obtain the required criterion by circa 2018. But we may have to wait a bit longer for an answer: recently, futurist Raymond Kurzweil, revised his earlier prediction that Turing test-capable computers would be manufactured by 2020, deferring the predicted date to 2029 (I can&#8217;t help but wonder if this prediction has anything to do with the fact that asteroid <a href="http://en.wikipedia.org/wiki/%2835396%29_1997_XF11" target="_blank">(35396) 1997 XF<sub>11</sub></a> is anticipated to make a close approach to Earth late in 2028 <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ).</p>
<p style="text-align:justify;">But how does this all have to do with Digital Trowel&#8217;s business? That is, unless we have secret plans in store for purchasing stocks in Scientific American&#8230; (which we don&#8217;t!). Well to answer that consider first what is sometimes called a Reverse Turing Test. Imagine a modification of the Turing Test wherein the role of the judge has been switched between machines and humans. Now it&#8217;s the computer who has to determine whether it is &#8220;conversing&#8221; with a human or another machine. In fact to some of you this may just ring a (quite annoying) bell. Take a look at the images below:</p>
<p style="text-align:justify;"><a href="http://mineyourbusiness.files.wordpress.com/2010/03/captcha.png"><img class="size-full wp-image-302    aligncenter" title="Captchas" src="http://mineyourbusiness.files.wordpress.com/2010/03/captcha.png?w=500" alt=""   /></a></p>
<p style="text-align:justify;">Ever wonder why every once in a while you&#8217;re prompted to try to decipher the jumbled up letters in images such as these? Well, put simply, it&#8217;s because you&#8217;re taking part in a test that&#8217;s not meant for you &#8211; you&#8217;re serving as a participant in a Reversed Turing Test administered and judged by the security computers of the website with which you are attempting to engage. Humans have (or rather should have!) no problem deciphering the text in the above images, which incidentally are called CAPTCHAs (for <strong>C</strong>ompletely <strong>A</strong>utomated <strong>P</strong>ublic <strong>T</strong>uring test to tell <strong>C</strong>omputers and <strong>H</strong>umans <strong>A</strong>part). However, the random distortions in a CAPTCHA make it nearly impossible for computers to decipher the letters. As a result, automated security programs can use these images and the respective responses to make certain it is a human attempting to engage with the website and not some malicious script.</p>
<div style="text-align:justify;">
<p>Back to Digital Trowel. No we don&#8217;t make CAPTCHAs. This may come as a disappointment, but we&#8217;re not even in the business of Turing Tests; reversed, straightforward or in any other direction <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . We are, however, in the business of using computers to achieve something no less illusive: <strong><em> deciphering the sentiment that lies hidden inside text</em></strong>.</p>
<p>Gleaning not only the formal meaning but also the sentiment associated with a text passage is crucial for any machine that hopes to &#8220;pass the Turing Test&#8221;. Albeit so, this is not part of our technological agenda nor is it a component of our business plan. In fact our aspirations are much more practical. We aim to use the highly sophisticated technologies powered by our cutting edge machine-learning and linguistic algorithms to analyze millions of lines of text, thus creating valuable business information that will help our customers make decisions in real-time. In short:</p>
<p><strong><em><span style="text-decoration:underline;">Extracting and discerning the underlying sentiment allows us to transform otherwise inert texts into vibrant business opportunities.</span></em></strong></p>
<p>But again we&#8217;re jumping ahead of ourselves. Now that we&#8217;ve laid the foundations for understanding what AI is all about, we&#8217;re ready to take a tour down the path of linguistic algorithm theory, focusing of course on the art of sentiment analysis. Or as we like to call it at DT, <strong>Synergistic Sentiment Analysis</strong>, a term that is used for reasons that will become apparent in due course.</p>
</div>
<div style="text-align:justify;">
<p>The second part of this survey presents an overview of sentiment analysis. What it is, what it does, and most importantly what it&#8217;s good for (hint: think unique business opportunities!). The third and final part, will delve into the deep abyss of the algorithmic world in hope of salvaging insight on the awesome technology we&#8217;ve developed at DT. By the end of this intro we hope you&#8217;ll understand not only what we do and why do it, but also how we do it, and why we&#8217;re light-years ahead of anyone else in the field.</p>
<p>In the meantime, we hope you understand at least half of the words in the titles above <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>We&#8217;ve told the story, set the stage, laid the bait &#8211; are you hooked..?</p>
</div>
<div style="text-align:justify;">All we can do is hope we <strong><em>&#8220;gotcha&#8221; </em></strong>!</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/280/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/280/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=280&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2010/03/14/turings-test-the-stock-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2010/03/captcha.png" medium="image">
			<media:title type="html">Captchas</media:title>
		</media:content>
	</item>
		<item>
		<title>Winds of Change</title>
		<link>http://mineyourbusiness.wordpress.com/2009/11/02/winds-of-change/</link>
		<comments>http://mineyourbusiness.wordpress.com/2009/11/02/winds-of-change/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 18:55:50 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=244</guid>
		<description><![CDATA[Winds of Change The Demise of the Language Barrier As autumn briskly descends on New England I find myself admiring nature’s capacity to silently transmute subtle changes into full-fledged displays of majestic beauty. A leaf here, a leaf there… They practically go unnoticed for several weeks &#8211; and then suddenly one morning you wake up and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=244&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h1 style="text-align:justify;">Winds of Change</h1>
<h2 style="text-align:justify;"><strong><em>The Demise of the Language Barrier</em></strong></h2>
<p style="text-align:justify;">As autumn briskly descends on New England I find myself admiring nature’s capacity to silently transmute subtle changes into full-fledged displays of majestic beauty. A leaf here, a leaf there… They practically go unnoticed for several weeks &#8211; and then suddenly one morning you wake up and look through the window, amazed to see that nearly all the trees on the street are ablaze.</p>
<p style="text-align:justify;"><img class="aligncenter size-full wp-image-249" title="View from my front porch in Somerville, MA" src="http://mineyourbusiness.files.wordpress.com/2009/11/porch-view1.jpg?w=500&#038;h=353" alt="View from my front porch in Somerville, MA" width="500" height="353" /></p>
<p style="text-align:justify;">
<p style="text-align:justify;">Reflecting back on this past year, I can’t help but draw the analogy to Digital Trowel’s own progress. It seems unbelievable that only one year ago we were a small team of 7 engineers crowded together in 2 small rooms, with not much more than an untested NLP platform and a vision. As the weeks passed by, we added an engineer here, a linguist and a mathematician there, and suddenly we are a mature, full-fledged commercial company, with over 40 developers, selling products and data, ablaze with a proven breakthrough NLP technology in hand.</p>
<p style="text-align:justify;">But we’re still hungry! And our team of scientists is cooking up a storm <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p style="text-align:justify;">We’ve just recently hired another mathematician and a computer scientist, so we now have 5 PhD level scientists on our R&amp;D team. We strongly emphasize the research aspect of the business, because we know that being at the top is a very fragile state. So while we’re very proud to be the first company to produce consistent commercial results with over 90% accuracy and recall across multiple semantic fields, we realize there are still many challenges to be met, and we’re determined to be the first to meet them.</p>
<p style="text-align:justify;">While I’m not yet at liberty to divulge the full extent of our newest developments, I am able to give you a taste of some of the advances we&#8217;ve made.</p>
<p style="text-align:justify;">One of the most exciting projects we’re looking into is a <strong><em>multi-lingual text-mining platform</em></strong> codenamed <strong>SNUG</strong> for <strong>Semantically Negotiated Universal Grammar</strong>. Admittedly, I don’t think this is exactly what Noam Chomsky had in mind when he first proposed his theory of <a href="http://en.wikipedia.org/wiki/Universal_grammar">Universal Grammar</a>, but it may not be too far off. Let me explain:</p>
<p style="text-align:justify;">Linguists use the term Universal Grammar (UG) to denote a theory according to which all humans possess an inherent “hard-wired” capability to acquire a language. It is this linguistic “hardware” we’re all born with that allows children to learn the grammar of a language even when the linguistic data available to them is <a href="http://en.wikipedia.org/wiki/Poverty_of_stimulus">insufficient</a>.</p>
<p style="text-align:justify;">A quick example is in order. A child born in the U.S.A must learn to form questions out of assertions in English. He may, for example, hear the assertion:</p>
<p style="text-align:justify;"><em>“My sister is pregnant.”</em></p>
<p style="text-align:justify;"><em> </em></p>
<p style="text-align:justify;">And over time infer that the proper interrogative form of this statement is:</p>
<p style="text-align:justify;"><em>“Is my sister pregnant?” </em></p>
<p style="text-align:justify;">He may then (subconsciously) form a rule of grammar in his mind, whereby questions in English are formed by moving the first auxiliary verb to the beginning of the sentence. We would then expect that children faced with an assertion such as:</p>
<p style="text-align:justify;"><em>“My sister who is pregnant will be blessed with happiness.”</em></p>
<p style="text-align:justify;"><em> </em></p>
<p style="text-align:justify;">Would form the <strong>incorrect</strong> question:</p>
<p style="text-align:justify;"><em>* “Is my sister who pregnant will be blessed with happiness?”</em></p>
<p style="text-align:justify;">Interestingly enough –<strong> they don’t!</strong> Only the correct form is acquired by children:</p>
<p style="text-align:justify;"><em>“Will my sister who is pregnant be blessed with happiness?”   (Of course she will!)</em></p>
<p style="text-align:justify;">The claim (asserted by UG supporters) is that children simply don’t encounter enough sentences as complicated as the one above to make a learned choice. So how do they do it? Well, simply put – <strong><em>they are born knowing</em></strong>. More accurately, they’re born with a set of rules that are triggered and activated in a certain way once they are exposed to sentences in their language.</p>
<p style="text-align:justify;">Of course, things are far more complicated than these simplistic examples, and many questions immediately arise. How, for example, is it that other languages, German and French for instance, form questions by placing the main verb at the beginning of the sentence? For example:</p>
<p style="text-align:justify;"><em> </em></p>
<p style="text-align:justify;">French:<em> Parlez-vous anglais? </em></p>
<p style="text-align:justify;">German: <em>Sprechen Sie Englisch?</em></p>
<p style="text-align:justify;">Both mean: <em>Do you speak English?</em></p>
<p style="text-align:justify;">But a literal translation would be the ungrammatical: * <em>Speak you English? </em></p>
<p style="text-align:justify;">So, obviously, the theory must be significantly more complex. But don’t worry, that concludes today’s lesson in linguistics 101 <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . We now return to what got us here in the first place – our new <em>multi-lingual text-mining platform</em> <strong>SNUG </strong>(We thought it’s kind of cute to build <strong>SNUG </strong>using <strong>CARE </strong> <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> ).</p>
<p style="text-align:justify;">So, first things first: No – we’re not out there to decipher the way our mind processes languages, but – Yes we are out there to create a platform that will enable us to process and mine texts in any language.</p>
<p style="text-align:justify;">For us humans, it’s quite hard to fathom learning a new language in say, a week. But for our system that’s actually an achievable feat. In order for us to effectively extract data from free text in a new language, we need to accomplish two things. First, we need to teach our system this new language. Well not exactly the language itself, but more like the statistical distribution of words in this new language. We do this using an automatic training process, by which our system runs through huge text corpora, producing a statistical model of the language, which is further refined every time we train the system.</p>
<p style="text-align:justify;">The second task is translating our semantic rulebooks. Here is where our linguists come into play. Since our rulebooks are basically comprised of highly sophisticated weighted Context-Free Grammars, their translation amounts to a structure-preserving function of semantic rules (a semantic homomorphism). This function can be thought of as taking an English semantic-driven grammar as input and transmuting it into a semantic-driven grammar for a different language. Though languages vary considerably in their actual spoken grammar, they tend to convey events and factual information in a surprisingly similar manner (or rather, the structure of the required semantic rules is quite similar).</p>
<p style="text-align:justify;">Most Central European languages share so many common qualities with English, that many of the rules can be translated verbatim. Some of the challenges arise when translating rulebooks to languages with a word order that significantly differs from the English word order (e.g. German, not to mention Subject-Object-Verb languages such as Japanese, Hindi and Turkish). Languages with a high level of agreement inflection such as French, Italian and Spanish are usually easier to parse as well, though high inflection agreement often incurs the omitting of overt pronouns. So for example <em>“I eat”</em> in Italian will simply be <em>“</em><em>mangio”.</em> This turns out to be quite problematic for issues of anaphora resolution.</p>
<p style="text-align:justify;">Every new language poses a new and exciting challenge, but crucially, it does not require us to rewrite the code. All we need is a few large text corpora, an expert linguist, a savvy rulebook writer, and a week or two of intensive work, and our system will learn to “speak” a new language. If only it were so simple for people to learn…</p>
<p style="text-align:justify;">Well I guess that’s enough for this post. I hope you’ve enjoyed this quick “intro to linguistics”, and are as excited as I am by the prospect of text-mining &#8211; free of language barriers. Stay tuned for more new and interesting features in my next entry.</p>
<p style="text-align:justify;">Meanwhile, wherever you are, I hope you have an autumn as beautiful as the one I am having in Boston. Even more importantly, I hope you take the time to appreciate the beauty of change all around &#8211; spoken without any words.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/244/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/244/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=244&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2009/11/02/winds-of-change/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/11/porch-view1.jpg" medium="image">
			<media:title type="html">View from my front porch in Somerville, MA</media:title>
		</media:content>
	</item>
		<item>
		<title>New Year&#8217;s Blessings</title>
		<link>http://mineyourbusiness.wordpress.com/2009/09/25/new-years-blessings/</link>
		<comments>http://mineyourbusiness.wordpress.com/2009/09/25/new-years-blessings/#comments</comments>
		<pubDate>Fri, 25 Sep 2009 07:52:17 +0000</pubDate>
		<dc:creator>Digital Trowel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mineyourbusiness.wordpress.com/?p=200</guid>
		<description><![CDATA[On the occasion of the new Hebrew year, I thought I&#8217;d make this entry a bit lighter than usual and present blessings for the new year in the spirit of NLP, spiced up with some advice, so here goes: When setting the goals for extraction results, try to find the optimal balance between recall and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=200&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:left;">On the occasion of the new Hebrew year, I thought I&#8217;d make this entry a bit lighter than usual and present blessings for the new year in the spirit of NLP, spiced up with some advice, so here goes:</p>
<p><strong><em>When setting the goals for extraction results, try to find the optimal balance between recall and precision. Remember: Aiming high is good, just make sure there&#8217;s an easy way down!</em></strong></p>
<p style="text-align:center;"><strong><em><img class="aligncenter size-full wp-image-208" title="giraffe_zebra" src="http://mineyourbusiness.files.wordpress.com/2009/09/giraffe_zebra4.jpg?w=500" alt="giraffe_zebra"   /><br />
</em></strong></p>
<p><strong><em> </em></strong></p>
<p><strong><em> </em></strong></p>
<p><strong><em>Do not attempt to debug a rulebook for more than 24 hours straight. We&#8217;ve been there. We&#8217;ve done that. If you think you&#8217;re stuck now, you still haven&#8217;t seen nada!</em></strong></p>
<p style="text-align:center;"><strong><em><img class="aligncenter size-medium wp-image-210" title="horse_head" src="http://mineyourbusiness.files.wordpress.com/2009/09/horse_head.jpg?w=263&#038;h=299" alt="horse_head" width="263" height="299" /></em></strong></p>
<p style="text-align:left;">
<p style="text-align:left;">
<p style="text-align:left;">
<p style="text-align:left;"><strong><em>Remember the chain of  language processing from last entry? First comes The HTML converter &#8211; it hands down the info to CARE, which does all the hard work and in turn passes the parsed relations to the Post Processor so that it can rest on its laurels</em></strong></p>
<p style="text-align:left;"><img style="display:block;margin-left:auto;margin-right:auto;border:0 initial initial;" title="hampsters" src="http://mineyourbusiness.files.wordpress.com/2009/09/hampsters1.jpg?w=360&#038;h=529" alt="hampsters" width="360" height="529" /></p>
<p style="text-align:left;">
<p style="text-align:left;">
<p style="text-align:left;"><strong><em>Perfect &#8220;Anaphora Resolution&#8221; is a myth. You can try. For a while, you may even believe you&#8217;ve done it. Our prediction: In the end the bubble will burst. Or you&#8217;ll go crazy trying to solve all the problems. Or both.<img class="aligncenter size-full wp-image-225" title="bubbles" src="http://mineyourbusiness.files.wordpress.com/2009/09/bubbles.jpg?w=500&#038;h=338" alt="bubbles" width="500" height="338" /></em></strong></p>
<p style="text-align:left;"><em><strong>Make sure that all the crucial information that comes in as input, goes out as output</strong></em></p>
<p style="text-align:left;"><strong><em><img style="display:block;margin-left:auto;margin-right:auto;border:0 initial initial;" title="dog" src="http://mineyourbusiness.files.wordpress.com/2009/09/dog.jpg?w=500&#038;h=295" alt="dog" width="500" height="295" /></em></strong></p>
<p style="text-align:left;"><strong><em>Remember the engineer that didn&#8217;t take our advice and corrected extraction rules through the night?<img style="display:block;margin-left:auto;margin-right:auto;border:0 initial initial;" title="monkeys" src="http://mineyourbusiness.files.wordpress.com/2009/09/monkeys.jpg?w=500&#038;h=443" alt="monkeys" width="500" height="443" /></em></strong></p>
<p style="text-align:left;"><strong><em>Just like life itself, retrieving the ultimate extractions for a given relation may be an extremely tedious and laborious task. Lighten up, add some spice. Make fun of yourself!</em></strong></p>
<p style="text-align:left;"><strong><em><img style="display:block;margin-left:auto;margin-right:auto;border:0 initial initial;" title="hotdogs" src="http://mineyourbusiness.files.wordpress.com/2009/09/hotdogs.jpg?w=500&#038;h=350" alt="hotdogs" width="500" height="350" /></em></strong></p>
<p style="text-align:left;">
<p style="text-align:left;"><strong><em>Do everything you do with love and CARE!</em></strong></p>
<p style="text-align:left;"><strong><em><img class="aligncenter size-full wp-image-230" title="dogs1hug" src="http://mineyourbusiness.files.wordpress.com/2009/09/dogs1hug1.jpg?w=500" alt="dogs1hug"   /></em></strong></p>
<p style="text-align:left;"><strong><em>But remember that if you&#8217;re not enjoying it, you&#8217;re probably not doing it right! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> <img class="aligncenter size-full wp-image-231" title="smack" src="http://mineyourbusiness.files.wordpress.com/2009/09/smack.jpg?w=500" alt="smack"   /></em></strong></p>
<p style="text-align:left;">
<p style="text-align:left;"><strong><em>Here from Cambridge, MA, wishing you all a wonderful year full of wonderful experiences, CARE-ing, happiness and love!</em></strong></p>
<p style="text-align:left;"><strong><em> </em></strong></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mineyourbusiness.wordpress.com/200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mineyourbusiness.wordpress.com/200/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mineyourbusiness.wordpress.com&#038;blog=8209739&#038;post=200&#038;subd=mineyourbusiness&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mineyourbusiness.wordpress.com/2009/09/25/new-years-blessings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/094eab2312457433c44341b28bebc95d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Micha Y. Breakstone</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/giraffe_zebra4.jpg" medium="image">
			<media:title type="html">giraffe_zebra</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/horse_head.jpg?w=263" medium="image">
			<media:title type="html">horse_head</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/hampsters1.jpg" medium="image">
			<media:title type="html">hampsters</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/bubbles.jpg" medium="image">
			<media:title type="html">bubbles</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/dog.jpg" medium="image">
			<media:title type="html">dog</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/monkeys.jpg" medium="image">
			<media:title type="html">monkeys</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/hotdogs.jpg" medium="image">
			<media:title type="html">hotdogs</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/dogs1hug1.jpg" medium="image">
			<media:title type="html">dogs1hug</media:title>
		</media:content>

		<media:content url="http://mineyourbusiness.files.wordpress.com/2009/09/smack.jpg" medium="image">
			<media:title type="html">smack</media:title>
		</media:content>
	</item>
	</channel>
</rss>
