Archive for the ‘wikipedia’ Category

MIT Handbook of Collective Intelligence opens up

Wednesday, September 5th, 2007

Since last year I was contributing to MIT project that attempts to create a comprehensive Handbook of Collective Intelligence. This project was initiated by the newly created MIT Center for Collective Intelligence (CCI). It is quite logical that the managers of this project decided to use a collective intelligence technique to describe the collective intelligence itself. The collective intelligence techniques (e.g. human-based computation) that process natural languages are ideally suited for this purpose and were successfully used to describe themselves in the past. I used a human-based genetic algorithm to evolve a short description to put on the website implementing it in 1998. Wikipedia had an evolving page describing itself since 2002 at least. Dr. Terence Fogarty used human-based genetic algorithm to evolve another name for itself (”Automated Concept Evolution”) in 2003. A more recent example is Assignment Zero by Jay Rosen and collaborators, a successful experiment to use collective intelligence and crowdsourcing to report on crowdsourcing itself. MIT project with the same purpose may be even more ambitious than the previous projects, but can’t be called a success so far.

I was initially surprised that MIT Handbook of Collective Intelligence decided to use SocialText wiki software and not the MediaWiki one (especially taking into account that Jimmy Wales is on the advisory board of the CCI). I found SocialText less convenient to work with than MediaWki software (though I am biased here as I was a contributor of Wikipedia for several years before I tried SocialText for the first time). The accumulation of the content in the Handbook was rather slow. Researchers had to request an account by email to contribute to the Hanbook. In addition, it is often suggested that researchers are reluctant to contribute content to wikis, because of the pressures of the academic system encouraging them to submit their writing into the traditional peer review system and avoid publications not officially approved to be “peer-reviewed.” I contributed the majority of the content to the page on Examples of Collective Intelligence in February. Little editing by others was made to this page since then. Other pages were not frequently updated either. It seems that the page on Examples of Collective Intelligence was the most visited page of the Handbook with 10885 views at the time of writing. The only page I could find with a larger number of views is the main page (now missing) with 16619 views.

This summer, the Handbook of Collective Intelligence team decided to move the Handbok from SocialText to the scripts.mit.edu domain, changed the software from SocialText to MediaWiki, and what is more important, open the Handbook to the public contributions as Wikipedia does, i.e. they now allow anyone to register or edit the content of the Handbook without registration. Apparenlty it is the lack of progress that motivated opening up the project for anyone to contribute. People at MIT must have thought that their project will share the success of Wikipedia once it opens itself up to accept anyone’s contributions. However, the reality so far doesn’t seem to support this. A lot of new content is indeed being contributed and a lot of progress can be seen in the list of recent changes (at the time of writing it looks like this). My first impression was that the Handbook went international. I had a hard time to find anything related to collective intelligence in this list, though many irrelevant pages are created every day in different languages. The majority of the newly created pages seems to be in Simplified Chinese. In a random sample of three pages all were Chinese and had no relation to collective intelligence whatsoever.

This returns us back to the topic that I discussed in my previous post Bugs of collective intelligence: why the best ideas aren’t selected?. The common failures of collective intelligence clearly suggest that it is not a phenomenon that automatically emerges once someone set up a shared space like wiki and brought it to the attention of many people. It requires understanding of the dynamic of this systems to make them work, and this is especially true with wikis. There is still serious research to be done on the factors that make different collective intelligence methods effective. It is beyond the scope of this post, but here I want to give some hints into why some wiki-based projects perform poorly.

The main weakness of wikis as a collective intelligence platform is their weak mechanisms of selection. This may lead to what is known as a genetic drift. The selection in current wikis is strongly biased towards the most recently contributed content (”the last edit wins”). In order for a wiki-based project to work, the community have to have enough people who put some effort into overcoming this temporal selection bias present in the software. Those people should be motivated enough to go into the revision history, reverting unhelpful changes and selecting better versions of the content (the software doesn’t encourage the ordinary user to do this). They also have to check recent changes history to delete obvious spam pages. The deletion is necessary in wikis because there is no other way to focus attention of people on important pages (like importance sampling in human-based genetic algorithms). Any wiki-based project pretty much depends on the community of people to overcome the bias present in its software. The MIT CCI project so far haven’t created a community that is effectively performing these functions.

Update: I found it curious that the license under which the content of the Handbook is published prohibits its editing (link). It is a creative commons license Attribution-NonCommercial-NoDerivs 2.5 that allows no derivative works, while any edit creates a derivative work. The license explicitly says “You may not alter, transform, or build upon this work” and yet it is provided in an editable form of a wiki. On the other hand, the same license requires attribution, and yet when the content I contribued was copied presumably by MIT CCI employees to the mit.edu domain, the attribution information was stripped, so no attribution is given to me or any of the other contributors. Hopefully, whoever is responsible for this project will fix this because currently there are too many contradictions. Meanwhile I would recommend Wikipedia as a better organized resource about the topic of collective intelligence and, importantly, a working example of this concept.

Social websites and personality

Saturday, March 10th, 2007

I made a curious observation today that the psychological concept of personality may be useful in characterizing social websites. For example, a website can be introvertive or extravertive. As in psychology, these are not absolute categories, but rather an indication of a bias toward one end or the other.

An intravertive social website draws attention of its users towards its local content, while extravertive social website draws attention of its users outside towards the content present on other sites of the web. Two examples to illustrate these are 3form and StumbleUpon, respectively. Both implement essentially the same technique, human-based evolutionary computation. This technique allows people to contribute items to the database, draw random samples from the population of items, evaluate sampled items. The software computes a fitness function from those evaluations and uses it in later sampling. However, 3form and SU use this technique in remarkably different ways. 3form samples content of its own database, provides an easy way to socially bookmark/evaluate/comment on it. However, it is less easy to bookmark any external resource or comment on it: you have to cut and paste its link into the web form and not many people bother to do it. This makes 3form community rather introspective and focused on the content found locally rather than resources found elsewhere. StumbleUpon, on the opposite, samples from the database containing primarily external resources found elsewhere. It naturally directs user attention to perceive the world outside of SU. SU makes it very easy to bookmark and evaluate any external resource with a single click. It is not true, however, for the local resources found at SU’s own site. When I start using SU I initially thought that, unlike most blogs, SU ones don’t support commenting. Then I found that it is possible to comment on a post, but not as easy or intuitive as commenting on external resources. You first need to find a permalink to the post you want to comment on (shown as the date of the post), click on it to open the post in a single window, then you can use normal SU buttons to evaluate and comment on it. Not many people take effort to go this way, so most posts at SU blogs remain without comments.

Though it might be a pure coincidence, but nevertheless interesting that the personality of a website reflects in this case the personality of its architect. My MBTI profile is INTJ (introvertive) and StumbleUpon chief architect and CEO Garrett Camp is ENTP (extravertive).

What about other websites?

Wikipedia was always mildly introvertive. It was always easier to link to an internal page than to create an external link. In addition, Wikipedia culture is discouraging creation of external links. Recently, Wikipedia has become more clearly introvertive by making you solve CAPTCHA, when you try to contribute a link to an external resource or even fix a broken link. This, undoubtedly will decrease the amount of external references in Wikipedia.

Del.icio.us and most social bookmarking tools are extravertive, their primary purpose is to direct attention to the other sites. I am quite curious if their creators are extraverts as well.

Digg seems to be pretty balanced in this respect, it requires high effort from any user trying to use it because of many CAPTCHAS, but commenting on an internal post and submitting a new story with an external reference involves about the same amount of effort.

Was Wikipedia innovation entirely social?

Thursday, February 8th, 2007

Jimmy Wales, a founder of Wikipedia in his recent talks suggests that Wikipedia is not a technological innovation, but a purely social one:

When Wikipedia was started in 2001, all of its technology and software elements had been around since 1995. Its innovation was entirely social - free licensing of content, neutral point of view, and total openness to participants, especially new ones. The core engine of Wikipedia, as a result, is “a community of thoughtful users, a few hundred volunteers who know each other and work to guarantee the quality and integrity of the work.”

In his view, Wikipedia is not an emergent phenomena of the wisdom of crowds, where thousands of independent individuals contribute each a bit of their knowledge, but instead is a relatively well connected small community, pretty much like any traditional organization, e.g. one that created Encyclopedia Britannica. Even taking into account that he is a founder of Wikipedia, I still am quite skeptical about this explanation. In my opinion, it is insufficient to explain the phenomenon of Wikipedia. It also disagrees with my own experience as a Wikipedia contributor. I started to contribute in 2003, registered in 2004, and yet I don’t know other wikipedians personally and rarely thought about Wikipedia as a social network, even though it definitely can support one. Reading a post of Aaron Swartz Who writes Wikipedia made me even more skeptical.

I know that it is quite natural for entrepreneurs to focus more on organizational aspects because that is what they deal with most of the time, as well as it is common for technologists to focus mainly on technology. I am not arguing that Jimmy Wales point of view is wrong, but I am suggesting that it might be incomplete. I believe, we don’t need to choose between emergent phenomena and core community point of view. They are not mutually exclusive, so Wikipedia can be (and, in my opinion, is) an example of both.

Jimmy suggests that the Wikipedia technology and software had been around since 1995. I didn’t find any support for this. If the technology was there in 1995, why it took so long for large wiki-based collaborative projects to appear? I did some quick research into the history of wiki technology. It suggests that Wikipedia had no chances to succeed using the technology that existed in 1995. The elements that enabled large participatory organizations like Wikipedia were added to wiki software six year later, at approximately the same time when Wikipedia project was launched.

Early wikis were lacking two important features: revision history and support for concurrent editing. These two features are crucial for success of any mass collaboration project using wiki.

I first discovered wiki quite late, in the summer of 2002. I quickly grasped the potential of this simple and brilliant collaboration tool by Ward Cunningham: a site with web pages that anyone can edit with very low effort. I saw it as a web extension of CVS, a revision control system that allows programmers to collaborate on the same codebase concurrently. However, as I started to explore the potential advantages of wiki, I found that the implementation I was using has a serious limitation. Indeed, everyone could edit a page, unless it is currently edited by someone else. If I wanted to edit a page someone else is editing right now, a warning message appeared that the page is locked. The lock was advisory, meaning I still could go ahead and edit, disregarding the message. However, in this case, either my or other people’s work will be lost. Waiting for the lock to be released quickly becames annoying as more people start collaborating. My conclusion then was that twiki software wasn’t ready to support collaboration of large groups of people. I searched for an implementation that would not have this limitation but didn’t find it at that time. I even wrote a note into my TODO list to write a wiki software that uses CVS instead of RCS so that it could support concurrent editing (RCS and CVS are two revision control systems, but CVS is newer and allows lock-less concurrent editing). However, later I found a software that provided means of concurrent editing. This was MediaWiki software and it was the first wiki I saw that really could support mass collaboration.

Another feature that was crucial to the success of Wikipedia is a revision history providing a mechanism for reverting unhelpful changes. It was not present in the original wikis. In fact, according to Landmark changes to the Wiki it was added in 2002. Prior to this, another mechanism (Edit Copy) was used, providing a single backup copy of every page that can be edited. Edit Copy was clearly insufficient to save content from vandalism as it is too easy for vandals to edit both the working and the backup copy of a page. However, Wikipedia according to the Internet Archive already had revision history on August 8, 2001 (see View other revisions). At that time Wikipedia used UseModWiki software written by Clifford Adams. Again, according to the archive, UseModWiki got its revision history somewhere between December 9, 2000 and February 1, 2001, that nearly coincide with the launch of the Wikipedia project (January 15, 2001).

Jimmy Wales might be right suggesting that Wikipedia was a social rather then technological innovation, but the technology he refers to was not there in 1995. The features that made Wikipedia possible were added to UseModWiki approximately at the same time the Wikipedia was launched and began to use UseModWiki. It might be a lucky coincidence for Wikipedia or those might be new features of UseModWiki requested by founders of Wikipedia. Maybe some of them can comment on this post.