Are you into Natural Language Processing? Text Mining? Developing Software for Information Extraction?
If so, take a deep breath, lean back and imagine….
Imagine a search-engine you can use to query the web as though it were your personal structured database.
Imagine verifying the facts of any article, CV or paper, by the time you finish reading it.
Imagine browsing the web at 1,000 pages per second extracting valuable information as you whiz along.
That’s the kind of imagining I’ve been doing at Digital Trowel ever since I began working there in 2008 as a linguistic engineer, and I’m proud to say that in the very short time since, together with our team of engineers and scientists, we’ve managed to transform our imagination into cutting-edge technology. Now we want to invite you to continue imagining with us – which, essentially, is what this blog is all about.
But first, a little bit more about what we are enabling people to do with our technology if they do it with CARE. That’s the name we’ve given to our unique NLP engine (an acronym for CRF Assisted Relation Extraction). Incorporating highly sophisticated learning algorithms, CARE can be used to scan millions of pages in a matter of minutes, extracting detailed biographical data and up-to-date Contact Info as it does so, making sense of Free Language Texts, any style, any format.
A simple example of how our technology can be applied: Say you’d like to compile a list of all CFOs in major companies across Chicago along with their contact info and employment history. Sure, all the information is already out there on the web, but good luck finding it with a standard search engine. Just for the heck of it, give it a try. You’ll quickly appreciate the value of the automated technology we have pioneered – and of the staggering, unprecedented 90% accuracy rate we can deliver.
But the truth is I believe CARE can and should be even better, which brings me back to you. While our team of world renowned text-mining computer scientists and talented algorithm engineers has been able to accomplish incredible results, we recognize that we’ve only just scratched the surface of CARE’s true potential.
Now we want you to scratch deeper with us. CARE can essentially be used to extract any information from any source: match drugs with their side-effects as reported in medical forums, compile lists of stock prices along with concurrent ratings posted in economical sites, or scan Wikipedia for correlations between major historical figures and events. All you need to do is imagine. So, in the coming weeks, I’ll be posting a secured-access link that will enable you to play around with our engine.
Whether you’re developing software that requires access to information-extraction services, conducting theoretical or empirical research in discriminative-probabilistic algorithmic models for NLP, or simply have been dreaming of having access to a state-of-the-art text-mining engine tailor-made for your own needs – this is an opportunity you won’t want to miss.
The idea is to let you compose rulebooks that will then be loaded into CARE and applied to texts and URLs of your choice. Of course you get to use any information extracted.
And what do we get out of it? Are we crazy to be allowing access to our precious CARE over the web? Well, not quite… in fact, au contraire!
Pioneering these new frontiers of text-mining, we’ve discovered more directions to go in than we can possibly explore ourselves, so we are more than happy to share significant parts of our technology with developers and users who may suggest new features, report bugs and even contribute new information extraction rulebooks.
I myself am one of the senior rulebook composers at DT, which is good news for you: I will be in charge of moderating, commenting and helping anyone interested in honing and debugging the rulebooks he or she loads into CARE on the site. This way you can enjoy our engine while helping us to improve the spectacular results we’ve already achieved.
Once we upload an API to plug into CARE you’ll be able to start using it immediately, see what all the hype is about, and judge for yourself if it is justified. We are convinced you’ll conclude that it is.
Another hope of ours is that this blog will help establish a small but dynamic community of text-mining enthusiasts who can enjoy our technology as well as help us by challenging it to its limits!
We invite everyone and anyone to partake in this effort. This is a real chance for you to actually make a difference, and at the same time take advantage of our break-through technology, create your own rulebook, load it into CARE and run it on the content of your choice…
We look forward to your becoming a part of this process, and expect that soon you’ll be extracting information from the web in a way that you never have before. Until then, take CARE.