Wednesday, November 18, 2009

Discovering Content by Mining the Entity Web

I had a blast last night presenting to CS students at the University of Washington. For those who missed the talk, the video is embedded below.

Abstract: Unstructured natural language text found in blogs, news and other web content is rich with semantic relations linking entities (people, places and things). At Evri, we are building a system which automatically reads web content similar to the way humans do. The system can be thought of as an army of 7th grade grammar students armed with a really large dictionary. The dictionary, or knowledge base, consists of relatively static information mined from structured and semi-structured publicly available information repositories like Wikipedia, Crunchbase, and Amazon. This large knowledge base is in turn used by a highly distributed search and indexing infrastructure to perform a deep linguistic analysis of many millions of documents ultimately culminating in a large set of semantic relationships expressing grammatical SVO style clause level relationships. This highly expressive, exacting, and scalable index makes possible a new generation of content discovery applications.

The talk slides used in the video are available HERE. It is best to download the slides and follow along as they are difficult to see in the video. In addition, the talk is broken up into multiple parts. Part 1, is shown below. Links to all parts are as follows: Part 1, Part 2, Part 3, Part 4, Part 5, and Part 6.

No comments:

Related Posts with Thumbnails

Liked what you read? Tell your friends

More info about content in my post