Friday, July 18, 2008

The Flaming Conch Shell

My good friend Matt, now known as Mateo, dropped his life as a Seattle software geek, packed up his trailer, and drove down to the lovely beaches of Zipolite, Oaxaca. He has set up a new life down there. Occasionally I post some of his letters, with his permission of course, to remind us all that there is a world out there away from mortgages and 9 to 5 gigs. Here are some snippets from the latest in the Continued Adventures of Mateo series:

I'm swamped with work doing web design stuff. Everybody here suddenly wants websites and I'm the only person who does them. You can see my latest one at www.septimosol.com, not bad eh? The hardest part is working with the people here who have lived in a fishing village their whole lives and have shall we say a different aesthetic that needs to be pleased. And they want me to use the logos their best friend since fourth grade made for them, like the flaming conch shell in the Septimo Sol site which I could not persuade them to let go of. Still, its fun working with people and making a difference in their lives and work.

Also I was thinking today this is ending up being a really good professional experience, as both a designer and programmer, and general systems administrator (because people are now starting to ask me to do all sorts of IT stuff), in kind of a roundabout way... I have been used to doing my thing and getting help in areas that I never had any experience in from other friends or co-workers. Here I'm completely on my own, so I have been forced to learn a ton of stuff that I never would have otherwise and become good at stuff I have always sucked at. Like at the moment I'm having to do a lot of research on how different search engines work so that I can get my clients rankings improved. It looks like I'm going to end up being kind of a jack of all trades computer guy down here. Tell that to Carsten when you feel like seeing a good belly laugh! Make sure he's not in the middle of drinking a beer or something.

Well, I've been thinking of you because today was another deep-should-have-been-here day. It has been raining for 3 days straight and everything is a mess. My house is full of piles of wet clothes and there is not much I can do about it. My road is washed out and I can't leave in my truck, everyone else has been hunkered down as well. Today I had to clean a nasty virus off both of my computers, and the online sync and backup service I'm using... a version of the Brontok virus. Its indonesian but it might be hitting mostly Spanish sites at the moment because a lot of the web documentation is in spanish. So anyway I got mostly through it all and everything backed up, and my registry cleaned up, and I decided to walk down to town to grab a piece of zuchini pie at Francos. Jody was there; my Australian friend who traveled through here last year, never left, stayed long past her visa, lost her passport, and most recently became impregnated by an entertaining mexican hippy called Rasta who hangs out at a local hotel and busses tables in exchange for free room, board, and [mystery substance X]. When either of them need any cash they have to go to the owner and ask for some like he's their father or something. However he is like the coolest guy in Zipol and the place is exactly what they want although it doesn't seem to me to be very conducive to family upbringing. Anyway despite what it sounds like she has a very solid head on her shoulders and is in a great space so she is great fun to talk to. It reminds you that intelligent people choose all different paths. So I spent about an hour there talking to her and my friend Paco, Franco's son, who is training to be a marabarista which is sort of an all purpose juggler type dude. Currently he is practicing all the time with twirly gyro things that seem to stay stuck on string connecting two wands. If you have seen it you will know exactly what I'm talking about, and if not I'll never be able to explain it. We are both supposed to be featured in a Mexican beer commercial being shot here at the end of the month, the rep showed up yesterday and took pictures of both of us without our shirts on and then scribbled down our phone numbers and raced off. Then Andrew the ex programmer nudist also from Australia wandered by, and we went off to go see the lagoon, which broke today. We say the lagoon breaks when the sand barrier between the sea and the seasonal river breaks and the water in the lagoon starts rushing out into the ocean. Its very dramatic because it starts with a tiny trickle and then as it excavates sand away it turns into a raging current and then carves out a big wide (20 meter) opening, and then the water becomes kind of a lake, but an estuary really. Its fun to watch. Plus we almost got to see Armando's little beach bar completely get taken out, but alas, it only took part of the roof and a palapa hut, so we'll have to wait for the big rains in October to make the lagoon break again. Ha ha, I'm becoming Mexican, laughing at the misfortune of others.

Then I went to go buy some roast chicken to take home for dinner, and along the way I lost Paca, so I went to Jody's beach bar where Paca always goes when she runs away, because she likes hanging out with Jody. She wasn't there, but Daniele was... a thoroughly delectable Brazilian girl with whom I have struck up a purposeful acquaintance. Try to guess what the purpose is. I had to think quick on my feet and make up a plan to invite her on the following day. So tomorrow a bunch of us are going to Estacahuite to go swimming in the rain and eat Shrimp al Diablo at my favorite Shrimp al Diablo restuarant. Actually every Mexican restaurant is a Shrimp al Diablo restaurant, since that is one of the three dishes that by Zipolite law must be included on every menu, along with Filete al mojo de ajo and Filete Empanado. Anyway hopefully I will be able to snuggle up with her a bit tomorrow.

So I watched the boys play soccer on the beach for a while, and talked to my friend Pedro from Oaxaca city who was trying to steal a little time away from his in-laws on the beach, then I headed home, but I got sidetracked at Geogione's new little cafe, where my Oaxacan friend Alexis who builds cob homes was having a nice toasted foccacia sandwich. I had a bite and it was so good I had to order a couple foccacias for myself to take home, and while you are at Georgione my friend why don't stuff one of those nice italian salamis in the bag, say Alexis why don't you come over for a little glass of wine before heading off on your motercycle? Alexis and I split most of a bottle, and he invited me to his birthday party on friday, we are going to have sushi. I told him I wanted to bring the Brazilian, which was fine, and suggested he also invite our friend English Sarah from Mazunte, the former rock star who made her real fortune in North Carolina selling little ice creams to construction workers in tight shorts. He said, sure she's coming, but the problem is, so is her boyfriend, and her boyfriends wife has also suddenly turned up from wherever she went and is also coming, so he's expecting fireworks. Just another day in Zipolite...


Enjoy the summer in seattle!

Saturday, July 12, 2008

The Grammar Students Guide to Radiohead

Below is an article that I wrote and originally published on the Evri blog. I included it here in its entirety.

--

Here at Evri, we talk a lot about searching less. When we say searching less, we are talking about you, our users with precious time -- we want you to search less -- we aren't talking about our machines, because they do an awful lot of searching so you don't have to. So how are they, our racks and racks of computers, searching so you can understand more?

Well it comes down to teaching our machines to read documents more similar to the way humans do - to basically understand more of the meaning of the documents they index. This is very different from what traditional keyword based search technology does. Typical search engines, when they encounter a document, treat the document like a bag of words -- associations between the words, how they interconnect, and form actual meaning is lost. Consider the following text snippet from a Starpulse article:

Howard insists they won't be copying Radiohead's idea and making their disc only available on the internet. [...] He tells BBC Radio 1, "We won't be doing the same thing as Radiohead, no." [...] Last year, Radiohead released In Rainbows as an Internet download and allowed fans to name their own price for the album.

Now from this snippet of text, your favorite search engine will store this data something like:

Radiohead - 3
Howard - 1
Rainbows - 1
released - 1
Internet - 1

and so on. I'm simplifying things a lot for the sake of discussion, but basically, your favorite search engine is maintaining a list of words, and keeping track of how many times those words appear in a given document. This approach works quite well for finding websites, but not very well for discovering facts, or relationships describing how people, places and things interconnect.

Now consider how Evri's approach is different. For this same snippet of text, our machines will break the snippet out into multiple sentences. For each sentence, our machines will, in essence, diagram the sentence similar to what you did back in 7th grade grammar class. So, for every grammatical clause in a sentence, our system creates a data structure like that shown below.
In the last sentence of the snippet above, our system will store a relationship like:

Radiohead > released > In Rainbows

In addition, our system knows that Radiohead is a band, released is a verb, and In Rainbows is an album. If a sentence said: Radiohead of Oxfordshire may release an album called In Rainbows, our system will store Oxfordshire as the suffix modifer of Radiohead, and will store the verb release as being conditional; knowing that a verb is conditional or negated is important as this information can be used to determine where in a list of results this relationship should appear. In addition, if a subsequent sentence says something like: The band's experiment proved successful., our system will know that The band refers to Radiohead; this is because our system attempts to resolve anaphora similar to the way humans do. Finally, this triplet style data structure is searchable at web scale and web speed by searches expressible in a query language; this query language is quite flexible, but basically allows our recommendation and information navigation applications to formulate effective queries in a precise manner. For example, a query like:

[musical_artist] OR [band] > praise > Radiohead

is being used to render the right column in the entity detail page shown in the screen shot below.
When you actually click on a person or organization, like Billy Corgan, the system will execute a more refined query like:

Billy Corgan > praise > Radiohead

One of the challenges our scientists and engineers face is how to formulate these types of queries in clever ways so you, the user, do not have to; I'll save this discussion for another day, however.

Finally, we published a book chapter last year that does a more thorough job explaining our approach, and additional grammatical treatments our system performs. So if you're interested, see the Natural Language Processing and Text Mining book chapter titled A Case Study in Natural Language Based Web Search.
Related Posts with Thumbnails

Liked what you read? Tell your friends

More info about content in my post