Category Archives: Search

Google’s Matt Cutts Explains Why Your Web Site Images Should Have Alternative (Alt) Text

Google’s rockstar search engineer and public figure Matt Cutts has recently posted a really great video that gives an overview of how you can add alternative (alt) text to the images in your HTML web page, how it helps Google, and accessibility for people with disabilities.

Check out the video and then pass it around to your friends.

Google Web History Could Power Its Social Search

I’ve really enjoyed the conversation that has taken place around Robert Scoble’s videos on, “Why Mahalo, TechMeme, and Facebook are going to kick Google’s butt in four years.”

The basic idea is that the Web has gotten too big and Google doesn’t have the power to sift the crap out of its search engine.  Scoble thinks that with things like Mahalo‘s human-powered search and Facebook, we’ll be able to better understand what sites we should trust and what sites our friends trust.

While this is interesting, haven we forgotten about Google Web History?  Maybe no one else uses this, but it allows Google to keep a history of the sites that you go to which in turn helps to customize your personal search.

The argument could be made that just the act of surfing the Web will show some types of implicit endorsement of some pages and not of others.  If Google knows what pages you click and and how long you stay there, it can also understand what pages resonate with you.  Google doesn’t need systems which give more explicit endorsements of Web pages, like Yahoo has with Del.icio.us.

What do you think?

Mahalo, Wikipedia, and Human-Powered Search

Screenshot of Mahalo.com

With the Web, there has been a publishing revolution. The barrier to entry has almost been eliminated. Anyone with a computer and an Internet connection, can publish there thoughts or creative works. This creates a lot of crap out there. A lot of smart people are trying to figure out how to separate out the wheat from the chaff… the signal from the noise.

Jason Calacanis’ idea is Mahalo. It is a search engine with human powered result sets. He has hired a group of guides which will find reputable sites which correspond to a specific search term. If the guides haven’t put together a result set for that keyword, it defaults to Google.

While this is clever and interesting, I think this is hardly revolutionary.

One of the biggest things I like about Wikipedia is the External Links section down on the very bottom of most topic pages. For example, you go to the Wikipedia baseball page at the very bottom there is a large set of Baseball links in the External Links section.

A lot of times I will go to a topics Wikipedia page just to see what external links are there because they’re usually really reliable and are the primary sources.

Aren’t the Wikipedia external links just human-powered search results sets? Except with this the links are vetted by the world and not just Jason Calacanis’ search guides.

Human powered search is great if you’re only playing with the fat head of search queries. What about the long tail? Mahalo doesn’t have results when I look for more obscure things.

I don’t think the future is the “wisdom of the masses” or the wisdom of a few Mahalo search guides. The future of search is what I like to call “the wisdom of your friends.” It is my buddy saying hey “go check out this site.”

I know my friends. I know what their expertises are and what their interests are. I know who trust in some ares and not the others.

I’m thinking out loud (well in a blog). Does any of this make sense?

WWW2007: Bradley Horowitz on “The Changing Face of Web Search”

Note: These are rough notes from the WWW2007 conference.

Yahoo’s mission is “to connect people to their passions, their communities, and the world’s knowledge.”  There are concentric circles of creation… theirs, ours, and mine.

We often call people “users.”  This is often pejorative.  Users use drugs or abuse people.  Need to eliminate the pejorative uses.

There is a pyramid of creators (starters), synthesizers (contribute once provoked), and consumers (lurkers).  The Web 2.0 phenomenon is to melt down the pyramid.   Anyone with a blank is now a blank.

There is always the dark side of UGC.  How do you differentiate the signal vs. noise?

Within Flickr, there was an algorithm they made to derive interestingness. It provokes interaction and response from users.  At acquisition, could only sort photos in Flickr by recently taken.  Now there is interestingness.  It’s determined by the organic behavior of viewing a photo, comments, tagging, and favoriting it.  Instead of asking users to rate photos, they looked at the organic behavior, which is less susceptible.

Flickr has turned users into taggers.  Tagging was started by graffitti artists who put up their art quickly.  Folksonomical tagging is good because its quick and dirty.  This is interesting because computer vision is hard.  Your users are your computer vision.  They can tell you whats in the photo.

There are machine tags.  You can tag a photo with the upcoming event or the geolocation.

You can visualize tags based on where they’re being taggged.  Yahoo has TagMap.  You can also restrict it to time.

Tag clusters are a way to give you more precision.

Users can be more then just contributors.  Users can be distributors.  Flickr encourages people to use flickr photos on non flickr pages.  The blogosphere does that a lot.

Users can also be developers.  There is an API.

There is a community of people who rose up that just take photos of single letters.  They have conversations about the shapes of type faces and characters.

You can turn users into neighbors.  Only a very few people actually leave comments.  MyBlogLog allows you to see who are the other people that are within your blog community.

For a while, we’ve been pushing HTML on the users.  Now we have RSS.  How can we bring the different types of data together?  CraigsList knows apartment openings.  Yahoo! Local knows where the parks are.  Yahoo! Pipes is an interactive application which allows you to bring the different systems together.

People + Algorithms > Alogrithms

  • Phase 1 – human editorial
  • Phase 2 – mass automation
  • Phase 3 – topological analysis
  • Phase 4 – social search

We’re moving towards social search.  We’re going to democratize the process of “voting.”

A great example of social search is Yahoo! Answers.  There is a person who has a question.  You have a large body of users who can tell you where to go or what the answer is.

Del.icio.us is the bookmark system in the cloud.  It is a personal memory.  You can create a network of people.  You can search to just see what the people within your network tagged.

Its all the process of moving towards 100% creators, 100% synthesizers, and 100% consumers.

Self-Expression, Search, and Life Context

After reading Danah Boyd’s “Blogging Outloud: Shifts in Public Voice“, I got thinking. I really wonder why the context of the creator’s life work isn’t more so taken into consideration when a user is searching for something on the Web.

Before the Web, a person’s various writings, creations, and expressions were individual separate items. Now with the Web, we live our lives online. Our lives are on display through the user-generated content and the online social networks that we participate in.

Our various self-expressions don’t have to be taken as disparate items they can be really looked at with the context of the creator’s whole body of work.

How much more could a search engine learn about Web site if it was seen along side everything else that that author has created?

Would a search engine be able to understand how well thought out an issue is if it understood the other times that an author thought about an issue? It could show the difference between a fleeting thought and something that a user has been researching and musing about for a while.

Could the content of a web page have subliminal meaning that would only be understood if it was judged in relationship to all the other works that went around it?

You don’t ever have individual thoughts. My thoughts build upon other thoughts which build upon other thoughts. My blog isn’t just a conversation with my readers it displays an evolution of myself. It is a conversation between me and history.

With technologies like OpenID (a single online identity), we can tie together expressive works across multiple disparate systems. I have written short stories using Ficlets but written blog posts using WordPress. With OpenID, the short stories and blog posts can be tied together to the same author.

NOTE: I dunno…this has been just rolling around in my head. I have been probably drinking too much coffee. If this post doesn’t make any sense, humor me.