Soft Sciences Mentioned But Not Very Present At WWW2007

One thing that was pretty interesting about WWW2007 is that both Sir Tim Berners-Lee and Prabhakar Raghavan mentioned in their keynotes the need for more of the soft sciences to study the World Wide Web.  They talked about how we need to better understand our users and the problems that are engineering solutions solve and create.

The interesting thing is that the soft sciences were very under represented in what was presented in the WWW2007 conference program.  I heard a lot of great talks but there were all about addressing some technical issue, showcasing a new technique, or demoing a new product.

What about the economics, psychology, and sociology of the Web?

Maybe we’ll see more of  the non-technical side of the Web at the WWW2008 conference in Shanghai. I gotta start working on the paper that I’ll present.

Little Mention of Second Life at WWW2007

Well my stay in Banff has come to a close. I really enjoyed WWW2007 and I dearly hope that I can go to WWW2008 in Beijing.

One thing that surprised me a bit was that there was very little mention of Second Life or Virtual Worlds. Is Second Life to be seen as too much of a game or just software?

Second Life is a genuine way for people to experience information and knowledge in a connected collaborative environment. It is the Web. Second Life is just the Web in a 3 dimensional form not the Web of documents that we all deal with in our browsers.

I expect in the future you’ll see more crossover between the Web that we experience in the browser and the Web that we experience in Second Life. You can already get audio, video, and rss content.

I guess we’ll see what shows up in Beijing in 2008. Maybe, I’ll have to submit a paper.

WWW2007: Bradley Horowitz on “The Changing Face of Web Search”

Note: These are rough notes from the WWW2007 conference.

Yahoo’s mission is “to connect people to their passions, their communities, and the world’s knowledge.”  There are concentric circles of creation… theirs, ours, and mine.

We often call people “users.”  This is often pejorative.  Users use drugs or abuse people.  Need to eliminate the pejorative uses.

There is a pyramid of creators (starters), synthesizers (contribute once provoked), and consumers (lurkers).  The Web 2.0 phenomenon is to melt down the pyramid.   Anyone with a blank is now a blank.

There is always the dark side of UGC.  How do you differentiate the signal vs. noise?

Within Flickr, there was an algorithm they made to derive interestingness. It provokes interaction and response from users.  At acquisition, could only sort photos in Flickr by recently taken.  Now there is interestingness.  It’s determined by the organic behavior of viewing a photo, comments, tagging, and favoriting it.  Instead of asking users to rate photos, they looked at the organic behavior, which is less susceptible.

Flickr has turned users into taggers.  Tagging was started by graffitti artists who put up their art quickly.  Folksonomical tagging is good because its quick and dirty.  This is interesting because computer vision is hard.  Your users are your computer vision.  They can tell you whats in the photo.

There are machine tags.  You can tag a photo with the upcoming event or the geolocation.

You can visualize tags based on where they’re being taggged.  Yahoo has TagMap.  You can also restrict it to time.

Tag clusters are a way to give you more precision.

Users can be more then just contributors.  Users can be distributors.  Flickr encourages people to use flickr photos on non flickr pages.  The blogosphere does that a lot.

Users can also be developers.  There is an API.

There is a community of people who rose up that just take photos of single letters.  They have conversations about the shapes of type faces and characters.

You can turn users into neighbors.  Only a very few people actually leave comments.  MyBlogLog allows you to see who are the other people that are within your blog community.

For a while, we’ve been pushing HTML on the users.  Now we have RSS.  How can we bring the different types of data together?  CraigsList knows apartment openings.  Yahoo! Local knows where the parks are.  Yahoo! Pipes is an interactive application which allows you to bring the different systems together.

People + Algorithms > Alogrithms

  • Phase 1 – human editorial
  • Phase 2 – mass automation
  • Phase 3 – topological analysis
  • Phase 4 – social search

We’re moving towards social search.  We’re going to democratize the process of “voting.”

A great example of social search is Yahoo! Answers.  There is a person who has a question.  You have a large body of users who can tell you where to go or what the answer is.

Del.icio.us is the bookmark system in the cloud.  It is a personal memory.  You can create a network of people.  You can search to just see what the people within your network tagged.

Its all the process of moving towards 100% creators, 100% synthesizers, and 100% consumers.

WWW2007: Prabhakar Raghavan on “Web N.0: What sciences will it take?”

Note: These are rough notes from the WWW2007 conference.

What sciences get us to the next stage of the Web?

There is content in various forms: editorial (newspapers), free (stream of consciousness), and commercial.  There is an audience that consumes all the different flavors of content.  They also enhance the content.  They help to filter out the content.  They purchase the content.

In the middle, you have the corporations. All the corporations are built in two pursuits, we build the audience and then we monetize them.

We face the challenge of search.  There is the algorithmic results and there is the paid monetized results.

The premise is that people don’t want to search.  People just want to get tasks done.  We spend a lot of time searching but computers spend half a second solving your problems.

Search has become task centric.  Its about understanding your intent.  When you searching for Papa Johns, are you looking to invest or are you looking for a pizza?

The grand challenge is to build general platforms for task centric needs.  Generality is critical.

Community computing…

For User Generated Content, there is 5-10 Gb created/day.  There is also user generated metadata, like anchor text, tags, page views, and reviews.  We need to start taking advantage of non-anchor text meta data.

START Meta Data

  • Star – bookmarking
  • Tags – label for retrieval
  • Access – viewing a page
  • Routing
  • Text

Within Flickr, you have a community of photo users.  People tag their photos.  People tag other people’s photos.  This is the wisdom of the crowds helping with search.

How do you use these tags better?  How do you cope with spam?  Whats the rating and reputation system?   What are the incentive mechanisms?

ESP game is a great example of incentivizing tagging.  People feel like they’re winning something.  At the same time, there is meta data that can be harvested.

There are a lot of good QnA systems.

What assignment of incentives leads to good user behavior? What’s good user behavior?  Whom do you trust and why?

Online media experiences…

We’re building audiences and monetizing them.  Yahoo! is a media company that is built on technology.  They’re there to serve media to users.

We need a science of online audience engagement.  It’s not just people interacting with their computers.  Its people interacting with other people.

We need to understand things like:

  • Why do people lurk or participate?
  • Why do people create new online personas?
  • Why are YouTube, MySpace, and Flickr successful?
  • What new genres that are emerging and what can we provoke?

There are different dimensions of experience:

  • duration – short to long
  • ephemerally – forgotten to remembered
  • social context – alone to with others

Second Life is a long duration, remembered, and something you do with other people.

Audience engagement is measured by the number of page view.   With ajax, page views wasn’t as good.  People were just staying on the same page.  They weren’t refreshing.

What are other ways to measure user engagement?  There are different levels. We need metrics for each of those levels.

Microeconomics meets CS…

On the right of most search results you’ll see a ranked list of advertisements.  Classically, slots that are higher up will get more clicks.

Three problems

  • Match ads to query/context
  • Order the ads
  • Pricing on a click-through

The first two have to do with IR.  The last two also have to with economics.  For ordering,  GoTo/Overture used ordering of the ads by bid.   What has become popular is revenue ordering.  What gets clicked on a lot is what goes to the top and what doesn’t get clicked a lot falls to the bottom.

Monetization and economic value are an intrinsic part of the system.

WWW2007: Arun Ranganathan on “Enriching the Web Application Model”

Note: These are rough notes from the WWW2007 conference.

There are lots of different widget technologies.  Content is getting more modular.    The W3C is working on the Widget Recommendation.    Right now there is bad fragmentation with widgets specs.

In the future, Web applications will be built out of proprietary formats like Microsoft’s Silverlight and Adobe Apollo. Vendors aren’t coming to the table to do this type of standardization.

There is work on Web APIs.  They are codifying XMLHttpRequest.  They are working on a Clipboard API, FileUpload, DOM3 Events, Selectors APIs, and a Network API.

WWW2007: Bert Bos on “CSS, 10 Years After”

Note: These are rough notes from the WWW2007 conference.

CSS Level 2 will next week be a candidate recommendation.

The future of the Web page is mobile.  There are 2 mobile profiles.  The new W3C CSS Mobile Profile will be the same as OMAs.

With Print CSS, the goal is to give users the control over the Web page to do the same things you’d normally do with a print document.

The future of css will allow for a grid layout in the design.

CSS is becoming more internationalized.  There are things like vertical text or right to left.

WWW2007: Dave Raggett on “Next steps for HTML Forms”

Note: These are rough notes from the WWW2007 conference.

Now is a good chance to fix mistakes of the Web.

What do people use forms for?  What are the new things?  We need to enable people.

Now to author a web page, you have to be an expert in a lot of things.  What about authoring for normal people?

Today we don’t have an easy way to do required fields, validity tests, simple sums, or dynamic html.   Regular expressions scripting is too difficult.

In 1998, there was the idea of moving towards XML for forms.  This led to XForms.  It used to the MVC, data was XML, and used declarative form logic.   It has been too radical for browser vendors.

There is a new proposal for Web Forms 2.0  Its based around graceful degradation. There is rich controls and many data types.

Most people aren’t Web programmers.  There is the need for high level authoring tools.

HTML is a huge success.  Scripting is powerfull but too difficult.  We need incremental improvements.

Join the HTML WG.