Wednesday, November 18, 2009

Discovering Content by Mining the Entity Web

I had a blast last night presenting to CS students at the University of Washington. For those who missed the talk, the video is embedded below.

Abstract: Unstructured natural language text found in blogs, news and other web content is rich with semantic relations linking entities (people, places and things). At Evri, we are building a system which automatically reads web content similar to the way humans do. The system can be thought of as an army of 7th grade grammar students armed with a really large dictionary. The dictionary, or knowledge base, consists of relatively static information mined from structured and semi-structured publicly available information repositories like Wikipedia, Crunchbase, and Amazon. This large knowledge base is in turn used by a highly distributed search and indexing infrastructure to perform a deep linguistic analysis of many millions of documents ultimately culminating in a large set of semantic relationships expressing grammatical SVO style clause level relationships. This highly expressive, exacting, and scalable index makes possible a new generation of content discovery applications.



The full talk: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and the Slides.

Labels: , , , ,

Monday, November 09, 2009

Automated Action Based Content Generation

I am republishing this article in its entirety. The original was published on the Evri blog HERE.

--

I'm excited to announce The Attack Machine, an exploration site available via Evri's experimental Garden, which harnesses the power of the Evri API to automatically generate content oriented around an action, or verb.

Content on The Attack Machine is automatically generated leveraging Evri's deep linguistic analysis of news, blog and other web content. The Attack Machine is an example application highlighting the power of leveraging a grammatical clause level understanding of verbs. The entire Attack Machine site could be straightforwardly mapped to other verbs, such as love, hate, punch, cherish, destroy, etc.

Content on The Attack Machine homepage is generated by an algorithm which runs every few minutes and looks for the top attackers and victims in particular categories such as animals, locations, weapons, and politicians.


Evri's system can be thought of like an army of 7th grade grammar students armed with a really large dictionary, or knowledgebase. These 7th grade grammar student algorithms are able to scour the news, blog and other web content to break every sentence down into key grammatical clauses such as the grammatical subject, verb, and object. The attackers in The Attack Machine are, in essence, the grammatical subjects; the verb is always attack and related verbs like: kill, assault, maim, etc.; and the victims on The Attack Machine are the grammatical objects. So, for example, if a blog post contains a simple sentence or title like: "Israel attacks Hamas.", this post will appear in the attacker page for Israel, and the victim page for Hamas.

In a blog post titled Evri's Garden Sprouts Some Search I discuss in detail the underlying search mechanism our scientists and engineers use to power higher level API and application functionality. One of the key higher level API functionalities The Attack Machine leverages is an API resource called Get relations about an entity. This API resource is used, for example, to populate the center column in the individual attack pages such as this one on alligator attacks:

and more specifically, the following REST API call is used:

http://api.evri.com/v1/organism/alligator-0x397510/relations/verb/attack/?media=article&sort=date&includeMatchedLocations=true&appId=evri.com/blog

This API call allows us to automatically identify the key people, places, organizations and things involved in attacks, in addition to getting the articles which correspond to the latest attacks. Now if the user clicks on a specific person, place, or thing, the following API call is used:

http://api.evri.com/v1/organism/alligator-0x397510/relations/verb/attack/location/florida-0x3154d?media=article&sort=date&includeMatchedLocations=true&appId=evri.com/blog

So in this way, we can populate the data for all attacks by alligators in Florida.

Finally, The Attack Machine leverages the Evri API to generate unique natural language content which benefits the reader, as well as significantly helps page SEO. For example, the 1st and third paragraphs in the 1st column in the screen shot above are automatic formulations from API output. In addition, the Evri knowledgebase is leveraged to minimize human editorial contribution. For example, the second paragraph above is written by an editor and linked to either a specific entity, a narrow category for an entity, such as animal, or politician, and a higher level category such as person, organism, or location. Simple logic is then applied at page generation time to select the natural language content from an entity handle if it exists, if not, from a narrow category handle, and if none exists (since we have thousands), then from a higher level category (there are only a handful of these) handle.

That's all for now. If you have any questions on how to use the Evri API for similar applications, or any other feedback, please let us know on our API forum.

Labels:

Saturday, October 17, 2009

Kooky Coconuuts

Stumbled across this kooky yet catchy coconut video while combing for kids Spanish content. Get your dancing shoes ready.

Labels:

Thursday, August 13, 2009

Sentiment API Exposes Web’s Feelings

I recently wrote this article for Evri's blog. I'm reproducing it here in its entirety for the reading pleasure of all you ChaloBolo-ites.

--

Every minute of every day people are expressing their sentiments and writing them down in news articles, blog posts, and other web content. Many people are too famous to write down their sentiments, but journalists, bloggers and other content creators are more than willing to document their feelings. Often times a famous radio commentator will bash a politician, or a politician will thrash a Hollywood actress. And on occassion, a true act of heroism will be recognized, and all sorts of famous folk will follow up with praise. Whether depressing or uplifting, disturbing or unnerving, tapping in to the sentiments of key actors on the world stage can be highly informative and engaging.

I'm excited to announce the release of our new sentiment web API which lets you build applications around the sentiments of specific entities (i.e. people, places, or things) as well as categories, or facets. Every minute of every day, Evri's systems are busy scouring the web, reading news content, blog posts and more so you don't have to. Now, Evri's system is also understanding the sentiments, or positive and negative expressions by and about entities. Many types of applications can be built using the sentiment API in areas including, but not limited to: market intelligence, market research, sports and entertainment, brand management, product reviews and more. Specifically, Evri's new sentiment API lets you:
  • Find the percentage of positive and negative expressions of sentiment made by an entity, or about an entity. For example, find out what percentage of things being written about the iPhone are positive and which percent are negative.
  • Discover who is criticizing and who is praising a particular person, place or thing. For example, see who is criticizing and praising Microsoft right now.
  • Read what praisers and critics are saying about an entity. For example, see what the GOP are saying about the Democrats.
  • Discover who or what your favorite entity is bashing and why. For example, see who Lance Armstrong is complaining about.
  • Discover who or what your favorite entity is praising and why. For example, see who the World Health Organization is commending and why.
Now, as an exploratory exercise, or tutorial, on how to use the API, I will walk through the calls needed to make a widget called the Vibology Meter. So, imagine the widget below is externally configured to be about the entity Barack Obama corresponding to the Evri URI: /person/barack-obama-0x16f69. Upon first load, you see something that looks like this:

From the above screenshot, we can see that the percentage of positive sentiment and negative sentiment expressed by Barack Obama are displayed. We can also see the specific top entities being praised by Barack Obama in the left column, and the specific entities being criticized in the right column. For example, from the above screenshot, we see that Barack Obama is criticizing the GOP, Rush Limbaugh, the ACLU, Al Zawahiri, and Israel. In order to render the screenshot above, this sentiment summary information is returned by the following REST API resource call:

/v1/sentiment/summary?sentimentSource=/person/barack-obama-0x16f69&includeSummaryDetails=true&sort=date

Now, consider the use case outlined in the screenshot below, where the user clicks on [Anything] under the positive vibes sentiment. In order to get the results outlined to the right of the positive and negative sentiment columns, we execute a resource request like:

/v1/sentiment/about?sentimentSource=/person/barack-obama-0x16f69&sentimentType=positive&sort=date

From this request URI, we see that the sentimentSource references Barack Obama, meaning we are interested in vibes or sentiment expressed by Obama, as opposed to about him. Next we see the sentimentType is set to positive, meaning we are interested in positive sentiment expressions. Finally, we see sort=date meaning we are interested in the latest results.

Also from the screenshot below, we see the results of this resource request, namely, the specific snippet from the article, as well as a time stamp, the article title, and a link off to the source article. From the snippet, we see the sentence stating that "the president commended..." -- the Evri system recognizes "the president" to be the source of the vibes, or sentiment, and commendation to be the prime justification for his positive sentiment expression.

And finally, we consider the case illustrated below, where the "Receiving vibes" tab is selected, and the particular source of negative sentiment is chosen by the user to be Rush Limbaugh. In this case, by executing this resource request:

/v1/sentiment/about?entityURI=/person/barack-obama-0x16f69&sentimentType=negative&sentimentSource=/person/rush-limbaugh-0x1ebf5&sort=date

From this request URI, we see that the entityURI references Barack Obama, meaning the returned sentiment is about Barack Obama. We can also see that the sentimentType is set to negative, meaning returned sentiment expressions will be negative in nature. We also see that the sentimentSource references Rush Limbaugh. The URI referencing Limbaugh was obtained from the sentiment summary results of the request shown above in reference to the first screenshot.

That pretty much sums up our walk through with the sentiment API. For complete documentation on this new API resource, see the Get sentiment information section of our REST API Specification. And finally, if you have any comments on how we can make this API better, questions on how to get things to work, or examples of bugs, please let us know on our developer forum.

Labels: , ,

Wednesday, May 13, 2009

Swords, Daggers and Punjabi Politics

A friend of mine, a local Seattle resident who recently relocated to India, is involved in the national elections there. While the western press typically presents Indian elections as a relatively pastoral exercise in democratic self governance, here are some snippets from our recent back and forth via email that show otherwise.

Friend:

Hope you are well. It's been a while since we chatted. I just came back from a short campaigning trip to Chandigarh last week. The Punjab elections are still going on (voting is the day after tomorrow).

Me:

Wow, campaigning in Punjab sounds exciting or scary and probably hot.

Friend:

Scary is an understatement. One of my friends, a really good guy, a professional, educated lawyer in the Supreme Court, who is contesting from one of the parliamentary constituencies there got attacked, his mother got beaten, his local president of the Congress party and his colleagues got stabbed with daggers and swords and are admitted in hospital. All by the Akali Dal guys ... they are out of control! We're asking for central police force for additional protection on polling day.

Me:

Unbelievable. Sorry to hear about your friends. Yeah, politics in Punjab can definitely be crazy. I think people haven't forgiven the Congress party for the 84 riots and subsequent total lack of punishment. Reasonable people mostly don't get involved in politics there, and when they do, they can wind up in the hospital or worse.

Friend:

Actually it's not at all because of the 84 riots. Initially I used to think the same. It's amazing how some politicians use random issues to create rifts between common people, most of who are regular people who want peace and jobs and agriculture.

More on the incident at Punjab Newsline in this article titled Prime Minister concerned about violence in Ludhiana.

Labels: , ,

Thursday, May 07, 2009

Semantic Web Meetup @ Evri HQ a Hoot

Thanks all for a terrific first Seattle Semantically Webbed meetup. Great food, drinks, and a chance to find out more about all things semantic. Special thanks to Alex Iskold and the rest of the AdaptiveBlue gang for coming out and giving a great presentation on GetGlue.


It's sometimes just amazing to me to experience the evolution of socialization and interactivity on the web. GetGlue reminds me of a project I started back in 1995 back when I was a researcher at GTE Laboratories. We built a system that fostered serendipitous communication between users based on websites users were visiting (all in a 3D browser based SGI built VRML world). Well VRML and SGI are basically dead, the web never did go totally 3D as we dreamed, and our project died, but it turns out we were onto something; the idea of socialization and serendipitous encounters based on browsing context lives on. GetGlue takes it to a new level adding a prime use case around product reviews that sort of inverts the closed Facebook social network phenomenon. Check it out and semantically webify yourself if you haven't already.

Labels:

Tuesday, May 05, 2009

Stop Fuzzy Math in Seattle

The Seattle School Board is voting on Wednesday to introduce a completely discredited and mandatory fuzzy math curriculum called the "Discovery Series" city wide. Please act now to stop them. Send an email to each board member listed below and let them know how you feel. If you don't know what to write, simply state: "No fuzzy math" or something similar in the subject, and "Please support our kids right to compete on the world technology stage by choosing real math and not fuzzy math. Please choose the Prentice Hall text books." or something similar in the body along with which school your child is in if you have a child.

Board members who supported real math recently but are under tremendous pressure to reverse are listed below. Let them know you are with them:
  • Michael DeBell: michael.debell@seattleschools.org
  • Harium Martin-Morris: harium.martin-morris@seattleschools.org
  • Mary Bass: mary.bass@seattleschools.org
Board members who let our kids down and voted for fuzzy math and the onslaught of mediocrity are shown below. Let them know you disapprove:
  • Sherry Carr: sherry.carr@seattleschools.org
  • Peter Maier: peter.maier@seattleschools.org
  • Steve Sundquist: steve.sundquist@seattleschools.org
Absent from the last vote, but said to be in favor of the fuzzy “Discovery” books is:
  • Cheryl Chow: cheryl.chow@seattleschools.org
Please spread the word to everyone you know in Seattle! Feel free to email this message, copy it, etc.

Here's a nice video of board member DeBell discussing in detail why the fuzzy math text is a bad idea.



And HERE the Seattle Times discusses why the fuzzy texts are a bad idea.

Finally, some additional talking points to include in your emails or use as a launch point into your own investigations. Please add in your own experience with district math programs at any grade level.
  • Prentice-Hall books are solid, well-organized, and mathematically sound computational algorithms and formulas are clearly stated and well motivated by examples and hands-on activities. These materials are family and student friendly.
  • Discovering Algebra and Discovering Advanced Algebra (Key Curriculum Press) have too much verbiage, too little in the way of clearly stated mathematical principles. Definitions, computational algorithms, and formulas are vaguely stated if they are stated at all. The program does not include enough practice for mastery.
  • Local and national mathematicians have expressed their written concerns about the soundness of these programs.
  • Our kids should not be subject to this ongoing failed experiment.

Labels: ,

Content recommendations from Evri
More blogs about chalo bolo.
desi Blogs