Mahalo, Wikipedia, and Human-Powered Search

Screenshot of Mahalo.com

With the Web, there has been a publishing revolution. The barrier to entry has almost been eliminated. Anyone with a computer and an Internet connection, can publish there thoughts or creative works. This creates a lot of crap out there. A lot of smart people are trying to figure out how to separate out the wheat from the chaff… the signal from the noise.

Jason Calacanis’ idea is Mahalo. It is a search engine with human powered result sets. He has hired a group of guides which will find reputable sites which correspond to a specific search term. If the guides haven’t put together a result set for that keyword, it defaults to Google.

While this is clever and interesting, I think this is hardly revolutionary.

One of the biggest things I like about Wikipedia is the External Links section down on the very bottom of most topic pages. For example, you go to the Wikipedia baseball page at the very bottom there is a large set of Baseball links in the External Links section.

A lot of times I will go to a topics Wikipedia page just to see what external links are there because they’re usually really reliable and are the primary sources.

Aren’t the Wikipedia external links just human-powered search results sets? Except with this the links are vetted by the world and not just Jason Calacanis’ search guides.

Human powered search is great if you’re only playing with the fat head of search queries. What about the long tail? Mahalo doesn’t have results when I look for more obscure things.

I don’t think the future is the “wisdom of the masses” or the wisdom of a few Mahalo search guides. The future of search is what I like to call “the wisdom of your friends.” It is my buddy saying hey “go check out this site.”

I know my friends. I know what their expertises are and what their interests are. I know who trust in some ares and not the others.

I’m thinking out loud (well in a blog). Does any of this make sense?

Yahoo! YUI Theater Hosts Web Accessibility Expert Shawn Lawton Henry. Watch The Presentation.

Photo of Shawn Lawton Henry speaking at the @Media conference in London

Yahoo! has been posting so many great videos on Web Accessibility. While in London, Web Accessibility expert and W3C staffer Shawn Lawton Henry stopped by Yahoo to talk about the Web accessibility guidelines that the World Wide Web Consortium (W3C) is working on. It’s a great talk. Check it out.

(Photo of Shawn Lawton Henry by Richard Ishida.  Taken at the 2007 @media conference in London, UK. )

W3C eGov: Jeffrey C. Griffith on “Beyond Transparency: New Standards for Legislative Information Systems”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

A legislative information systems are the systems that are designed by the legislative bodies to make their systems available to the public and themselves.

Who makes them?  This is tough.  There is no CEO of Congress or Parliament.  Before THOMAS there were 7 stove piped LIS systems.

Everyone uses LIS systems.  People are using the Web to find out more information about politics and their politicians.

One of the challenges is that legislative documents can be very cryptic and complex.  There are complex procedures.  There are readings without readings.  Sometimes yes means no and no means yes.

Gingrich authorized the LC to put up the THOMAS system.   The initial system was there but weak.  The public loved it.  Bills were available to the public at the same time they were open to members.

Staff wanted more.   The liked some of the features of the old legacy systems.

There were some additional standards that had to be met.  The documents needed to be accurate.  The systems needed to point to the right documents.  What is timeliness?  Hours? Same Day? Next Day?  Things should be complete.  Everything that is relevant and useful should be linked.  The document should be clear and explained.   There is also context.

A comparative study was done between the US Congress and the European Parliament.  There needs to be better integration of related information/documents.  There is some integrated data like CRS summary or CBO budget estimates.

What’s exciting is when the public sector gets involved.  There are things like OpenCongress.org  There is information on the bill but also links to blogs talking about it.

We need to start filling in the gaps.  We need technologists who are familiar with the legislative process who can talk with Congress when they’re aware.

There are too many impediments within the institutions so public sector use of the data will be important.

W3C eGov: Kevin Novak on “Government as a Participant in Social Networks. Adding Authority to the Conversation”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

The world is changing.  Over half of the World’s population is under 17.  We need to change to accomodate these new users.  More people are connected to the Web and go often.

We need to be starting a participatory volunteerism.  Its community based interaction, sharing, assistance, managing, and changing.

It’s about relating what you’re doing to the user. How do you allow the participation when you’re the authoritative source?

We have a very diverse user base: Scholars/Researchers, Teachers, Students, Librarians, Publishers, and Public.

In the Web 2.o space, online libraries are really content and media companies.  We must compete in a dynamic world of the Internet and strive to maintain relevance.

The Library is a 207 year old institution.  We don’t do anything quickly.  The technology and the users are changing.  We need to compete.

RSS is now available.  The LC has launched a Meta Search and Beta Search.  LC was a beta tester of the Open Site Maps Project.  In the future there will be podcast, second life, flickr, tag clouds, widgets, and much more.

The blog has become very popular.  There has been traffic coming into the site because they visited the blog.

There are currently 18 RSS feeds currently available.  By the end of the summer there will be one for THOMAS.

We will be putting images on Flickr.  We want to get the photos out there, see how people use our photos,  and see how people tag our photos.   Flickr has been very co-operative.

We want to meet users in their worlds through Second Life.

By the end of the summer there will be widgets.  One will be for Today in History.  There will be others for our thematic portals.  Hopefully it will drive traffic back to the LC site.

W3C eGov: Anil Saldhana on “Secure E-Government Portals- Building a web of trust and convenience for global citizens”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

Just like when a citizen talks to a Government representative and trusts them, he should be able to have the same trust when the citizen goes to a government Web site.

Average Joes want to be able to use the services.  People get scammed.  We have to make sure that we provide Average Joe with secure services.

According to a report,  senior citizens are the fastest growing on-line audience, who will double by 2010.  The US IRS Web site had 13.5 million unique visitors in March 2007.

Portals are a one stop shop for information.  Secure portals are necessary.

The different parties are the users, the browsers (technical clients), government services, software, and the communication medium.   All of these parties have to work at making things secure.

End-users can be insecure and error prone.  Delegate as much responsibility as possible to technology.  The W3C Security Context Working Group is trying to establish some type of visual trust context to help the user feel reasonably secure. Use SSL or SRP.

Use a federated identity that allows the user a single authentication service and access multiple heterogeneous services.   There is OpenID.

Getting buy-in into a single IT installation from various departments and organizations of a Government is difficult.

YouTube Launches 9 Localized Versions of their Site

It is truly a WORLD WIDE Web. You will get users from across the globe.

Flickr isn’t the only one realizing how much more effectively they can serve their global audience with localized versions of the site for specific areas of the world.
YouTube has just launched country specific sites for Brazil, France, Ireland, Italy, Japan, the Netherlands, Poland, Spain and the UK.

I’d love to see how these localized sites have improved the traffic that both Flickr and YouTube are getting.

W3C eGov: J.L. Needham on “Ensuring government is only one search away: Implementing the Sitemap protocol”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

Google is working to make sure that they’re working with government and evangelizing Google’s FREE services.

Google’s biggest focus isn’t site search.  It’s Web search.  There was a recent NY Times article on search results quality.  There is another side which is crawling the pages to find all that exists.

Some of the biggest barriers is that content is hidden by a search form.  There can be a robots.txt which tells the crawler not to crawl.

Bureau of Alcohol, Tobacco, and Firearms prevents all search engine crawling with a robots.txt file.   Google’s index doesn’t recognize the acronym ATF.

US Gov is the largest publisher of the world’s data.  The data is put into databases for easy access but unless correctly structured a search engine can not crawl it.

This is important because of the value that is places on public sector information.  People trust .gov more than .com.  It’s unbias and free.

Microsoft, Yahoo, Ask, and Google have come together to agree on a common SiteMap standard.   This can make all Web services accessible to search engine crawlers.  It is pretty easy to implement.

There are 4 parameters:

  • location
  • last modification date
  • change frequency
  • priority

PlainLanguage.gov successfully implemented the sitemaps protocol in around 8 hours. Now the site is being crawled and added to search results.  When there are changes to the site, the sitemap is updated and uploaded.

There are now partnerships with about four states.

Searching has become the defacto way for people to find public sector information.

W3C eGov: Tom Steinberg of MySociety

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

The thesis is if you build user-focused services good information policy follows.

Its a small UK NGO called MySociety (mysociety.org).  It builds democratic Web sites.

There is the hansard which is the the printed parliamentary debates.  It is now online.  You can link to supplementary links and definitions.   You can make profiles for the members of parliament that lists their voting records, committee memberships, or how active they are.

Data reuse matters.  Information matters.  The best way to get officials to understand is to show them information about themselves which they check obsessively.

At FixMyStreet.com, it will show you what kind of problems are on your street that have been reported by other people.  The problems will be sent to appropriate local official.  This was made with a $10,000 government grant.

There is a good relationship with the government.  You’d think otherwise.   Governments are taught about how to do this.

Tom wrote a report on Information Policy.  There were 15 recommendations.  One was the importance of information sharing and reuse.  There are great tools that are out there to do this.

Most charging regimes for public sector data were created pre-Semantic Web.

MySociety is working on letting more people be aware of the small neighborhood e-mail mailing lists.

W3C eGov: Tim Berners-Lee on “Widescale data integration: opportunities and challenges”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

The value you get from the Web is re-use.  You put something on the Web because other people will find it useful for reasons that you don’t know.

You’re amazed at all the things that you’ll find on the Web that crazy people put out there.  You’d be amazed at all the ways your data gets re-used.  Sometimes its for a wide look at something or sometimes its for a very quick look.

At the Semantic Web level, you can ask questions that go across all the different stove pipes.

When you share data, its more reusable.  You should make the data available via accessible Web sites but also just as data.  Data will be merged with  other data from other places.  Re-use may well outstrip the primary use.

If we want wider re-use, we have to talk to the wider community.  It takes effort.

Semantic Web is the first technology that understands that there are different ontologies between organizations.  The two ontologies can be wired together and be treated as if they’re the same.

Data Owners Should

  1. Take inventory
  2. Decide Priorities
  3. Look for existing ontologies
  4. Don’t change the way the data is managed
  5. Set up standard (RDF, SPARQL) portals onto existing data
  6. Where necessary, adapt or write new ontology bits.

We should always use open standards, regardless of whether or not it will be public.

If we use linked data, we can reference other people’s data systems and make our systems become more useful.

Well we need to track where we got our data from.  We need to track what the acceptable uses are and what the licenses are.

Next steps… we need to make our data Semantic Web standards compliant. There is a list on the ESW wiki of linked data.  We shouldn’t be upset existing systems.  Don’t make ontologies unless you have to.  Allow for re-use and transparency.

(Tim Berners-Lee’s slides from this talk are available online)

W3C eGov: Carol Tullo on “Unlocking the Power of Public Sector Information”

Note: These are rough notes from the W3C Workshop on eGovernment and the Web.  It is being held in Washington DC on June 18th-19th.

Goal is to give a wider policy context of the topics that will covered over the next two days.

It’s about “unlocking the potential.”  It’s not just about the content.  It’s about the economic and social value of the information.  The value is beyond its inherent value.  It’s the value of it being used.  We need an approach that recognizes the potential.

In UK, producing public sector information is about 40% of the GDP.  Geographic information underpins a lot economic activity.

It’s much harder to asses the social and economic value.

Ed Mayo and Tom Steinberg laid out a vision “… that citizens, consumers, and government can create, re-use and distribute information in ways that add maximum value.”

There are many complexities with working in government.  The awareness of all the potentials and values may not be there because we work in silos, departments, and agencies.

There is a lot of risk with transparency.  Be aware that unless we push the boundaries, we’ll never be able to grow. There is also a lack of incentive in government to share information.

With Web 2.0, the Web is becoming a 2-way medium.  There is a data aspect, communities being formed, and user generated content.   We have to understand that is constantly evolving and accelerating.

A senior UK government official on his own initiative put some films on YouTube.  These were the first UK government films on YouTube.  They created more interest in what the Office of Public Sector Information was doing than the last 10 years of marketing work.

No matter what country you’re in, we all share the same aims.

We have an evolving landscape.  We need to engage in partnerships with the user-led communities.  It’s hard to find the right vehicle but thats what experimentation is for.

How can civil servants best participate in the new media and the new focus?

The Web is allowing for the re-use to information in exciting ways.  There is a synergy among lots of partners.  There is a confidence and trust that no one is breaching anyone’s rights.  Rights expressions is a big deal.  Semantic Web has huge potential to overcome some of the format problems.

The UK is working hard to not just use new technologies but to set policies in place.

There are departments where there business is to produce data.  There are also agencies and departments who have data as a by-product of their activities.  People need to understand how important their data is and what it can be used for.

How do we gain traction with these  new policies and strategies?

To make all of this happen, we all have to interact.

Government embracing citizens complex needs, encouraging information re-use and exploitation, and enabling easy to create, easy to find, easy to use government, parliamentary, and public sector information.