Wednesday, November 18, 2009
Abstract: Unstructured natural language text found in blogs, news and other web content is rich with semantic relations linking entities (people, places and things). At Evri, we are building a system which automatically reads web content similar to the way humans do. The system can be thought of as an army of 7th grade grammar students armed with a really large dictionary. The dictionary, or knowledge base, consists of relatively static information mined from structured and semi-structured publicly available information repositories like Wikipedia, Crunchbase, and Amazon. This large knowledge base is in turn used by a highly distributed search and indexing infrastructure to perform a deep linguistic analysis of many millions of documents ultimately culminating in a large set of semantic relationships expressing grammatical SVO style clause level relationships. This highly expressive, exacting, and scalable index makes possible a new generation of content discovery applications.
The talk slides used in the video are available HERE. It is best to download the slides and follow along as they are difficult to see in the video. In addition, the talk is broken up into multiple parts. Part 1, is shown below. Links to all parts are as follows: Part 1, Part 2, Part 3, Part 4, Part 5, and Part 6.
Monday, November 09, 2009
I'm excited to announce The Attack Machine, an exploration site available via Evri's experimental Garden, which harnesses the power of the Evri API to automatically generate content oriented around an action, or verb.
Content on The Attack Machine is automatically generated leveraging Evri's deep linguistic analysis of news, blog and other web content. The Attack Machine is an example application highlighting the power of leveraging a grammatical clause level understanding of verbs. The entire Attack Machine site could be straightforwardly mapped to other verbs, such as love, hate, punch, cherish, destroy, etc.
Content on The Attack Machine homepage is generated by an algorithm which runs every few minutes and looks for the top attackers and victims in particular categories such as animals, locations, weapons, and politicians.
Evri's system can be thought of like an army of 7th grade grammar students armed with a really large dictionary, or knowledgebase. These 7th grade grammar student algorithms are able to scour the news, blog and other web content to break every sentence down into key grammatical clauses such as the grammatical subject, verb, and object. The attackers in The Attack Machine are, in essence, the grammatical subjects; the verb is always attack and related verbs like: kill, assault, maim, etc.; and the victims on The Attack Machine are the grammatical objects. So, for example, if a blog post contains a simple sentence or title like: "Israel attacks Hamas.", this post will appear in the attacker page for Israel, and the victim page for Hamas.
In a blog post titled Evri's Garden Sprouts Some Search I discuss in detail the underlying search mechanism our scientists and engineers use to power higher level API and application functionality. One of the key higher level API functionalities The Attack Machine leverages is an API resource called Get relations about an entity. This API resource is used, for example, to populate the center column in the individual attack pages such as this one on alligator attacks:
and more specifically, the following REST API call is used:
This API call allows us to automatically identify the key people, places, organizations and things involved in attacks, in addition to getting the articles which correspond to the latest attacks. Now if the user clicks on a specific person, place, or thing, the following API call is used:
So in this way, we can populate the data for all attacks by alligators in Florida.
Finally, The Attack Machine leverages the Evri API to generate unique natural language content which benefits the reader, as well as significantly helps page SEO. For example, the 1st and third paragraphs in the 1st column in the screen shot above are automatic formulations from API output. In addition, the Evri knowledgebase is leveraged to minimize human editorial contribution. For example, the second paragraph above is written by an editor and linked to either a specific entity, a narrow category for an entity, such as animal, or politician, and a higher level category such as person, organism, or location. Simple logic is then applied at page generation time to select the natural language content from an entity handle if it exists, if not, from a narrow category handle, and if none exists (since we have thousands), then from a higher level category (there are only a handful of these) handle.
That's all for now. If you have any questions on how to use the Evri API for similar applications, or any other feedback, please let us know on our API forum.
Saturday, October 17, 2009
Thursday, August 13, 2009
Every minute of every day people are expressing their sentiments and writing them down in news articles, blog posts, and other web content. Many people are too famous to write down their sentiments, but journalists, bloggers and other content creators are more than willing to document their feelings. Often times a famous radio commentator will bash a politician, or a politician will thrash a Hollywood actress. And on occassion, a true act of heroism will be recognized, and all sorts of famous folk will follow up with praise. Whether depressing or uplifting, disturbing or unnerving, tapping in to the sentiments of key actors on the world stage can be highly informative and engaging.
I'm excited to announce the release of our new sentiment web API which lets you build applications around the sentiments of specific entities (i.e. people, places, or things) as well as categories, or facets. Every minute of every day, Evri's systems are busy scouring the web, reading news content, blog posts and more so you don't have to. Now, Evri's system is also understanding the sentiments, or positive and negative expressions by and about entities. Many types of applications can be built using the sentiment API in areas including, but not limited to: market intelligence, market research, sports and entertainment, brand management, product reviews and more. Specifically, Evri's new sentiment API lets you:
- Find the percentage of positive and negative expressions of sentiment made by an entity, or about an entity. For example, find out what percentage of things being written about the iPhone are positive and which percent are negative.
- Discover who is criticizing and who is praising a particular person, place or thing. For example, see who is criticizing and praising Microsoft right now.
- Read what praisers and critics are saying about an entity. For example, see what the GOP are saying about the Democrats.
- Discover who or what your favorite entity is bashing and why. For example, see who Lance Armstrong is complaining about.
- Discover who or what your favorite entity is praising and why. For example, see who the World Health Organization is commending and why.
From the above screenshot, we can see that the percentage of positive sentiment and negative sentiment expressed by Barack Obama are displayed. We can also see the specific top entities being praised by Barack Obama in the left column, and the specific entities being criticized in the right column. For example, from the above screenshot, we see that Barack Obama is criticizing the GOP, Rush Limbaugh, the ACLU, Al Zawahiri, and Israel. In order to render the screenshot above, this sentiment summary information is returned by the following REST API resource call:
Now, consider the use case outlined in the screenshot below, where the user clicks on [Anything] under the positive vibes sentiment. In order to get the results outlined to the right of the positive and negative sentiment columns, we execute a resource request like:
From this request URI, we see that the sentimentSource references Barack Obama, meaning we are interested in vibes or sentiment expressed by Obama, as opposed to about him. Next we see the sentimentType is set to positive, meaning we are interested in positive sentiment expressions. Finally, we see sort=date meaning we are interested in the latest results.
Also from the screenshot below, we see the results of this resource request, namely, the specific snippet from the article, as well as a time stamp, the article title, and a link off to the source article. From the snippet, we see the sentence stating that "the president commended..." -- the Evri system recognizes "the president" to be the source of the vibes, or sentiment, and commendation to be the prime justification for his positive sentiment expression.
And finally, we consider the case illustrated below, where the "Receiving vibes" tab is selected, and the particular source of negative sentiment is chosen by the user to be Rush Limbaugh. In this case, by executing this resource request:
From this request URI, we see that the entityURI references Barack Obama, meaning the returned sentiment is about Barack Obama. We can also see that the sentimentType is set to negative, meaning returned sentiment expressions will be negative in nature. We also see that the sentimentSource references Rush Limbaugh. The URI referencing Limbaugh was obtained from the sentiment summary results of the request shown above in reference to the first screenshot.
That pretty much sums up our walk through with the sentiment API. For complete documentation on this new API resource, see the Get sentiment information section of our REST API Specification. And finally, if you have any comments on how we can make this API better, questions on how to get things to work, or examples of bugs, please let us know on our developer forum.
Wednesday, May 13, 2009
Hope you are well. It's been a while since we chatted. I just came back from a short campaigning trip to Chandigarh last week. The Punjab elections are still going on (voting is the day after tomorrow).
Wow, campaigning in Punjab sounds exciting or scary and probably hot.
Scary is an understatement. One of my friends, a really good guy, a professional, educated lawyer in the Supreme Court, who is contesting from one of the parliamentary constituencies there got attacked, his mother got beaten, his local president of the Congress party and his colleagues got stabbed with daggers and swords and are admitted in hospital. All by the Akali Dal guys ... they are out of control! We're asking for central police force for additional protection on polling day.
Unbelievable. Sorry to hear about your friends. Yeah, politics in Punjab can definitely be crazy. I think people haven't forgiven the Congress party for the 84 riots and subsequent total lack of punishment. Reasonable people mostly don't get involved in politics there, and when they do, they can wind up in the hospital or worse.
Actually it's not at all because of the 84 riots. Initially I used to think the same. It's amazing how some politicians use random issues to create rifts between common people, most of who are regular people who want peace and jobs and agriculture.
More on the incident at Punjab Newsline in this article titled Prime Minister concerned about violence in Ludhiana.
Thursday, May 07, 2009
It's sometimes just amazing to me to experience the evolution of socialization and interactivity on the web. GetGlue reminds me of a project I started back in 1995 back when I was a researcher at GTE Laboratories. We built a system that fostered serendipitous communication between users based on websites users were visiting (all in a 3D browser based SGI built VRML world). Well VRML and SGI are basically dead, the web never did go totally 3D as we dreamed, and our project died, but it turns out we were onto something; the idea of socialization and serendipitous encounters based on browsing context lives on. GetGlue takes it to a new level adding a prime use case around product reviews that sort of inverts the closed Facebook social network phenomenon. Check it out and semantically webify yourself if you haven't already.
Tuesday, May 05, 2009
Board members who supported real math recently but are under tremendous pressure to reverse are listed below. Let them know you are with them:
- Michael DeBell: email@example.com
- Harium Martin-Morris: firstname.lastname@example.org
- Mary Bass: email@example.com
- Sherry Carr: firstname.lastname@example.org
- Peter Maier: email@example.com
- Steve Sundquist: firstname.lastname@example.org
- Cheryl Chow: email@example.com
Here's a nice video of board member DeBell discussing in detail why the fuzzy math text is a bad idea.
And HERE the Seattle Times discusses why the fuzzy texts are a bad idea.
Finally, some additional talking points to include in your emails or use as a launch point into your own investigations. Please add in your own experience with district math programs at any grade level.
- Prentice-Hall books are solid, well-organized, and mathematically sound computational algorithms and formulas are clearly stated and well motivated by examples and hands-on activities. These materials are family and student friendly.
- Discovering Algebra and Discovering Advanced Algebra (Key Curriculum Press) have too much verbiage, too little in the way of clearly stated mathematical principles. Definitions, computational algorithms, and formulas are vaguely stated if they are stated at all. The program does not include enough practice for mastery.
- Local and national mathematicians have expressed their written concerns about the soundness of these programs.
- Our kids should not be subject to this ongoing failed experiment.
Monday, March 23, 2009
The semantic web is in essence, a web of understanding, where the actual meaning of web documents is made available to machines enabling a new generation of applications to emerge. The semantic web is being realized by many companies exposing APIs that are in turn used by other companies. For example, Yahoo has a terrific API with a very liberal usage policy. Yahoo has already spent billions of dollars indexing the entire web. Companies like Hakia, and my own, Evri, have taken advantage of Yahoo's extensive web crawl and global ranking to sample the web and index only the documents our users really need; this is a significant cost savings for a start up. Let's take another example; Free Base is busy building an extensive database of things -- think of it as a very highly structured Wikipedia, where not only is there a page for something like Barack Obama, but all of the data is completely structured and machine readable. This is in turn extremely helpful for applications like Powerset and Evri which need to be able to recognize these things in the many web documents their machines attempt to read in a human-like manner.
All of the above examples are, in essence, examples of machines talking to machines. In other words, when Hakia's indexing system leverages the Yahoo API to help determine what documents to index, I doubt a human is in the loop. In addition to machines talking to machines, there is, of course, the case of machines talking to humans. One of the hardest nuts to crack in the semantic web arena, is what a truly successful user experience looks and feels like. If you think back in time, when Alan Kay invented the desktop metaphor for PCs at Xerox Parc, the idea was not completely obvious. In fact, it was more than a decade before the metaphor was popularized by Apple. One of the most powerful motivations driving API proliferation is the appeal of access to a much larger community of innovators than are available within ones own company. For example, at my company we have great product designers and a terrific UI team; I have every confidence we are on the right path toward cracking the user experience problem for a deep semantic search engine. But, however good our team is, the user interface possibilities of building on our platform are endless. Why should we force our partners and customers to use our UIs? I think a similar conclusion is quite common; many semantic companies have some significant technology in the "deeper understanding" arena that enables a great number of UI permutations. So companies like Zemanta, Day Life, Reuters (Open Calais) and others all conclude: technology aspects with individual utility should be freed so its inherent value can be realized.
And of course, "freeing" technology aspects in the form of an API enables different monetization opportunities. In this rough economic climate, its a true advantage for any company to have diverse revenue streams. Many might not realize that almost 10x of Twitter's traffic is API driven. While most of us have little idea exactly what Twitter's monetization strategy looks like, I will wager a guess that making money off the API will be a core part of it. So how are companies making money off their APIs? Some are taking payment in the form of links to other already monetized pages. Many are simply charging for it. Most companies offer a certain volume of API requests free, then charge an escalated amount as a function of API request volume.
It seems obvious to most that social networking applications played a pivotal role in the web's evolution the past 10 years. Similarly, when we celebrate the 30th anniversary of the web, I predict APIs exposing a deep understanding of the web's content will be unambiguously recognized as primary drivers.