Entries tagged with “wikipedia” from O'Reilly Radar
Four short links: 11 August 2009
by Nat Torkington | @gnat | comments: 0
- The Slowing Growth of Wikipedia and More Details of Changing Editor Resistance -- researchers at PARC analysed Wikipedia and found the number of new articles and number of new editors have flattened off, and more edits from first-time contributors are being reverted. This is a writeup in their blog, with the numbers and charts. It's interesting that coverage in New Scientist talked about "quality", but none of the metrics PARC studied are actually quality. Wikipedia launched a strategic review which aims to tackle this and many other issues. (via ACM TechNews)
- The Information Architecture of Social Experience Design: Five Principles, Five Anti-Patterns and 96 Patterns (in Three Buckets) -- teaser for upcoming O'Reilly book with some really good stuff. Balzac once wrote, “The secret of great wealth with no obvious source is some forgotten crime, forgotten because it was done neatly,” and many successful social sites today founded themselves on an original sin, perhaps a spammy viral invitation model or unapproved abuse of new users' address books. Some companies never lived down the taint and other seems to have passed some unspoken statute of limitations. (via BoingBoing)
- Skulpt -- entirely in-browser implementation of Python. (via Andy Baio)
- Why Can't Local Government and Open Source Be Friends? -- the Birmingham example is one of many. Government procurement and tendering processes are often fishing expeditions, which biases responses in favour of commercial software companies making mad margins such that they can respond to RFPs that are really RFIs, etc. It's an issue everywhere in the world because it happens at local, not just central, level.
tags: book related, government, open source, python, research, social software, web, wikipedia
| comments: 0
submit:
Patrick Collison Puts the Squeeze on Wikipedia
How to Cram the Wikipedia onto an 8GB iPhone
by James Turner | comments: 9
You may also download this file. Running time: 15:13
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
Think about Wikipedia, what some consider the most complete general survey of human knowledge we have at the moment. Now imagine squeezing it down to fit comfortably on an 8GB iPhone. Sound daunting? Well, that's just what Patrick Collison's Encylopedia iPhone application does. App Store purchasers of Collison's open source application can browse and search the full text of Wikipedia when stuck in a plane, or trapped in the middle of nowhere (or, as defined by AT&T coverage...) Collison will be presenting a talk on how he did it at OSCON, O'Reilly's Open Source Convention at the end of July, and he spent some time talking to me about it recently.
James Turner: Why don't you start by talking about your background a bit and how you got involved with working with the Wikipedia?
Patrick Collison: I guess I've always been pretty interested in Wikipedia, and I ran my own MediaWiki installations back when I was in school in Ireland. We had our own personal ones and all of the rest. Then in November of 2007, I went to visit my friend in Japan for a month. And in Japan they have all of this incredibly advanced cellular technology and all of the rest. And so because of that, they had very few wireless networks, and my phone didn't work. As a result, I actually had very little access to the Internet. I sort of realized without Wikipedia how little I really knew. And I had just got an iPhone, so I decided to try basically putting a copy of Wikipedia on the phone, so that I'd have it as I was walking around in Japan. Then basically, I spent a significant fraction of my time there in Japan, again, in 2007 writing those applications, say maybe two or three weeks, just firstly trying to decide if it was possible and putting it all together. And then it was released, I think, January of 2008.
James Turner: Now you've also worked on getting it onto the OLPC I understand. How did that occur?
Patrick Collison: I actually didn't do much of the work for this. It was actually a project led by Chris Ball who works both with FreeBSD and with the OLPC project. But I released the code to this application; it was open source from the very start. So it was pretty easy for them to take it and to port it to the OLPC. I mean there are already some applications that allowed you to put a copy of Wikipedia on your computer or something like that, but none had really been optimized for embedded or low power devices or anything like that, which obviously Wikipedia for the iPhone had to be. I think it took about two or three weeks to take the code that ran on the iPhone and then to bring it to the point where it'd run on the OLPC.
James Turner: There are obvious benefits to having Wikipedia on the OLPC, because connectivity is very important in some of those areas. So you'd want to have it local, but outside of the experience that you were just describing, isn't the point of the iPhone that you can just access Wikipedia? What are kind of the advantages of having it locally?
Patrick Collison: I actually find that you spend, or I certainly spend a surprising amount of my time without access to the internet, even with the iPhone. Say for start if you were abroad, I mean everyone knows the horror stories of the data changes AT&T will issue you with if you're roaming. But also just stuff like personally, I find that on a plane or something you have eight hours to not do much. And so I actually end up doing a lot of my Wikipedia browsing there. But even aside from connectivity issues, it actually turns out to be quite a bit faster to use the built-in, cached Wikipedia application as opposed to the website. I mean you can search in real-time with the applications. You just type a couple of characters and tap into your article, rather than firing up Safari or searching for the article in Google; then zooming in so you can tap in, et cetera, et cetera. I and most of the people I know who use the application actually end up using it even when they have internet connectivity. And maybe 20 percent of the time it's pretty useful because it's the only choice.
James Turner: Now just as a point of interest, is this an App Store app or do you have to have a jail-broken phone for it?
Patrick Collison: It was released back when only the jail-broken SDK existed. It was in that initial sort of surge of early applications. I guess the first jail-broken iPhone app, I think, happened in August, and so this was released just under six months later. And then when Apple announced the SDK, I actually originally did not intend to port it to the App store, just because I was just working on other things at the time and my company had just been bought and so it seemed like a lot of work. But then over the summer, I started getting a huge amount of email from people who had upgraded to the new version of the iPhone OS, and were now missing Wikipedia. And I started getting 20 or so emails from people per day saying they love this application and they were really missing it. Or even people saying they were continuing to use the old version of the OS just for this application. And they really hoped that I would port it so they could eventually upgrade. After receiving these emails for a while, I eventually felt too bad about not porting it. So I spent a couple of days porting it and then released it in the App Store. I wrote it and finished the port in August. And then it took about three months to wade through Apple's approval process. Around the end of October, it was released in the App Store.
tags: interviews, iphone, open source, oscon, wikipedia
| comments: 9
submit:
Twitter is Not a Conversational Platform
by Mark Drapeau | @cheeky_geeky | comments: 54
Perhaps the most common reason given for joining the microsharing site Twitter is "participating in the conversation" or some version of that. I myself am guilty of using this explanation. But is Twitter truly a conversational platform? Here I argue that the underlying mechanics of Twitter more closely resemble the knowledge co-creation seen in wikis than the dynamics seen with conversational tools like instant messaging and interactions within online social networks.
Wikis are causally thought of as platforms for "collaborative" document creation. But on Wikipedia, while many people share knowledge to co-create pages, the process is not formally collaborative in the sense that contributors are not cooperating with each other ways that form group identity (to paraphrase Clay Shirky from his book Here Comes Everybody). To the contrary, passionate experts write the majority of text, and a long tail of other contributors offer relatively few, small edits. Many users contribute nothing. Through this process, Wikipedia pages often become valuable repositories of knowledge.
Brian Solis recently posited the dichotomy of whether Twitter is a conversational or broadcast platform. New data bears on this. According to a Harvard Business School study, about 10% of Twitter users contribute roughly 90% of its content. Anecdotally, these 10% are subject-matter experts, passionates, mavens, and thought leaders who break news, write strong opinions, and tell jokes. Like on Wikipedia, most users merely read this information, and a modest number of people in the long tail use the information in the form of re-tweets, comments, corrections, and alternative opinions or links.
So while an individual user may use Twitter primarily as a conversational tool or a broadcast medium, in its totality, Twitter operates a lot like a wiki: as a knowledge-sharing, co-creation platform that produces content and allows its consumption. Conversation is perhaps the most simple and obvious form of collaboration, but would anyone claim that Wikipedia is a conversational platform? Despite the presence of information sharing, co-creation of an end product, and even discussion pages, Wikipedians on the whole aren't having conversations.
According to this argument, Twitter is no more a conversational platform than Wikipedia is. But is it a social networking platform? New HBS data showing that men have 15% more followers than women and being twice as likely to follow another man than a woman also bear on this to some extent. Authors Bill Heil and Mikolaj Piskorski state: "On a typical online social network, most of the activity is focused around women - men follow content produced by women they do and do not know, and women follow content produced by women they know. Generally, men receive comparatively little attention from other men or from women."
As in the case of the conversational platform, it seems that Twitter is also no more a social network than Wikipedia is. Wikis have user accounts and discussion pages, and it is possible for relationships to form. Twitter has user handles and direct messaging, and relationships can form. But social relationships on Wikipedia and Twitter are not a prerequisite for satisfaction and success (inasmuch as that can be defined). For instance, the popular and useful account @BreakingNews has hundreds of thousands of followers but participants in effectively zero engagement. There are many Twitter users who contribute large amounts of useful information and engage in relatively little conversation. And it is not common for people to describe Wikipedia as a social network.
Andrew McAfee notes that two useful Twitter traits are its asynchronous and asymmetric nature. These two traits are also critical to Wikipedia, but importantly much less so within popular social networking platforms like Facebook and MySpace. Thus, entities that are clearly social networking platforms can be but are not necessarily knowledge co-creation platforms, and entities that are clearly asynchronous knowledge co-creation platforms can be but are not necessarily social networks.
If microsharing tools resemble wikis more than conversational tools and social networks, this has huge implications for how people and organizations approach use of this emerging technology. Solis suggests, I think rightly, that "sometimes it's effective to...maintain a presence simply by reading, listening, and sharing relevant and timely information without having to directly respond to each and every tweet." The strategy of being a "lethally generous" member of a community would seem to be more worthwhile in this context, contrasted with the individual-level customer service approach of (for example) @ComcastCares.
This framework for thinking about microsharing platforms as knowledge co-creation enablers also puts Nielsen's recent data on Twitter's "user retention and loyalty" in a new light. When the average user is a consumer of the content produced by subject-matter experts and passionate mavens, how much does it matter if the majority of use is infrequent spectating (particularly when the information is archived for asynchronous retrieval)? As Shirky recently noted in his talk at the IAC/ACT Management of Change Conference that I attended in Norfolk, VA, such an imbalance of contribution is not a condition of failure for the platform or its users.
Finally, if microsharing is equated with knowledge co-creation, rules for attribution becomes an important consideration. But while the wiki attribution process has generally been worked out, attribution on Twitter is like the wild west - there are no rules; only conventions that are commonly accepted in some circles but not others. In addition, it is relatively easy to cheat the system, hard to catch someone doing it, and difficult to determine what the consequences are of such behavior. This problem will be a lasting one, requiring careful consideration by not only the user community, but also Twitter itself.
tags: emerging tech, twitter, web 2.0, wikipedia
| comments: 54
submit:
Wikirank: A Zeitgeist for Wikipedia
by Brady Forrest | @brady | comments: 6
Wikipedia is one of the most significant sites on the web. It produces vast quantities of data and the Wikimedia foundation tries to make all of it available to the public. Wikipedia's traffic data can be an insight into what's interesting on the web. Wikirank, currently in closed beta, shares that information very cleanly.
On its homepage Wikirank shows which Wikipedia articles are the most read and which pages are gaining in popularity. Additionally, you can find each article's detail page via search. On the detail page you can find and article excerpt, traffic numbers and a (soon-to-be-embeddable) traffic chart that allow you to compare traffic with other topics (up to four).
Wikirank (@wikirank) was produced by Small Batch Inc.. The design was done by Jeff Veen, most lately of Google Analytics and previously of Measure Map and Adaptive Path.
Update: In a comment Veen said: Second, the UI wasn't designed just by me, but was a group effort that included the rest of Small Batch's cofounders: Bryan Mason, Greg Veen and Ryan Carver. We also were fortunate enough to work with the very talented Dan Cederholm from Simplebits.
In an email, Veen told me that the charts were built without Flash. It's all Javascript using the HTML Canvas element . The data is being processed in EC2 and stored on S3. Tokyo Cabinet is being used to manage the data store.
With a service like Google Trends available you might wonder why this is significant. Wikipedia only has one page for the Python or Ruby programming languages where as there are a lot of other Rubies or Pythons (or George or Paul for that matter) that dirty the data for the same query on Google Trends. As an added bonus Wikirank will report on Google properties (unlike Google Trends).
You can sign up to be notified of their launch. If you don't want to wait for Wikirank to go live you could bide our time with some of these alternatives. Wikirage tracks which Wikipedia pages are being edited the most -- a good way of judging recent news or controversy. Wikichecker will produce a summary of edits for a page such as Tokyo (the page includes an intriguing "Frequent users also edit these articles" which is an unusual path to potentially similar articles). Wikitrends shows the most popular Wikipedia pages in fourteen languages.
Wikirank is a testament to good, clean design and the power of existing web tools. It's the first project from Small Batch, but it won't be the last. I expect that their other projects will also focus on data visualization
Jeff Veen will be keynoting at the Web 2.0 Expo SF on 4/3 and speaking at Ignite SF on 4/1.
tags: web 2.0, wikipedia, wikirank
| comments: 6
submit:
Wikipedia and RNA Biology
by Nat Torkington | @gnat | comments: 11
I love the RNA Biology journal's new guidelines for submissions, which state that you must submit a Wikipedia article on your research on RNA families before the journal will publish your scholarly article on it:
This track will primarily publish articles describing either: (1) substantial updates and reviews of existing RNA families or (2) novel RNA families based on computational and/or experimental results for which little evolutionary analysis has been published. These articles must be accompanied by STOCKHOLM formatted alignments, including a consensus secondary structure or structures and a corresponding Wikipedia article. Publication in the track will require a short manuscript, a high quality Stockholm alignment and at least one Wikipedia article; Each centered around the RNA in question.
As my source for this points out, Nature (the publishing organisation behind the RNA Biology journal, and co-producer of Science Foo Camp with O'Reilly and Google) the publishers of RNA Biology already synchronise a database with Wikipedia. Apparently there's a core of scientists who do most of the edits, but also a lot of other scientists who pop in sporadically to fix or add information.
Kudos to Nature the publishers of RNA Biology for doing something imaginative to increase the commons. Journals wield a huge amount of power in the scientific world, and it's wonderful to see them using that power to incentivize good.
tags: nature, publishing, science, wikipedia
| comments: 11
submit:
Network Effects in Data
by Tim O'Reilly | @timoreilly | comments: 12Nick Carr's difficulty in understanding my argument that cloud computing is likely to end up a low-margin business unless companies find some way to harness the network effects that are the heart of Web 2.0 made me realize that I use the term "network effects" somewhat differently, and not in the simplistic way many people understand it.
Here's Nick:
Let's stop here, and take a look at the big kahuna on the Net, Google, which O'Reilly lists as the first example of a business that has grown to dominance thanks to the network effect. Is the network effect really the main engine fueling Google's dominance of the search market? I would argue that it certainly is not....Ah, I say to myself: Nick only sees first order network effects, what you might call endogamous networks, those that require the user to be part of the tribe. Thus, phone networks, and networks like Facebook. But the internet is an exogamous network; its benefits increase by the extent to which it reaches out to new groups, increases cross-breeding, and thus the total robustness and variety of the gene pool. This is why links matter, why web services matter, because they extend the reach of the network. Understanding the benefit of exogamous networks requires a more subtle calculus than Nick is applying. It's not necessarily that you benefit directly from belonging, but the fact that you belong allows others to harvest the benefit of your participation.The intelligence embedded in a link is equally valuable to Google whether the person who wrote the link is a Google user or not. In his new post, in other words, O'Reilly is confusing "harnessing collective intelligence" with "getting better the more people use them." They are not the same thing. The fact that my neighbor uses Google's search engine, rather than Yahoo's or Microsoft's, does not increase the value of Google's search engine to me, at least not in the way that my neighbor's use of the telephone network or of Facebook would increase the value of those services to me. The network effect underpins and explains the value of the telephone network and Facebook; it does not underpin or explain the value of Google. (Indeed, if everyone other than myself stopped using Google's search engine tomorrow, that would not decrease Google's value to me as a user.)
Consider Google: The underlying network that Google is based on is one that they neither own nor control, the web itself. It has both endogamous end exogamous elements. No one controls it; its richness and diversity depends on that fact. And yet, there is a benefit to belonging. If there weren't, sites would use their robots.txt file to tell Google and other search engines to stop spidering them.
Yes, you might say: but other search engines have access to that same network. And here, of course, is the first lesson: Google is better at spidering that network than their competitors. They thus benefit more powerfully from the network that we are all collectively building via our web publishing and cross-linking. Nick correctly points out that Google has built superior systems, and that these are the source of their competitive advantage. But that's a diversion. Why did they build those superior systems? To harness the power that was hidden in the network more effectively than their competitors.
Google's second network effect advantage is PageRank. As Robert Scoble so insightfully noted back in 2003, we contribute to Google with every link. Google realized that there was an additional layer of meaning hidden in the network. Far from being a contradiction to my network-effect hypothesis, as Nick claims, this is a validation of it. Advantage came to Google for seeing more deeply into the nature of the network, and building tools to harvest and apply data that was hidden in the network graph.
Google's third (and most profitable) network effect insight was, of course, the ad auction. And once again, Nick misses the point. He says:
Now it's true that, if you want to define market liquidity as a type of network effect, Google enjoys a strong network effect on the advertising side of its business (which is where it makes its money), but it would be a mistake to say that the advertising-side network effect has anything to do with Google's dominance of the searches of web users.It isn't that the advertising-side network effect has anything to do with Google's dominance of search, but rather, that Google's dominance of search is central to the design of their ad auction. You see, while Yahoo! (nee Overture) sold keyword advertising to the highest bidder, Google realized that they could mine their users' clickstream activity to predict which ads would be most likely to be clicked on, and by what ratio, and thus sell to the best combination of price and actual click through. Thus: higher revenue, more ability to invest in infrastructure, better results for advertisers and users, thus more users, thus better data, thus better results for both organic search and advertising (both of which do, in fact, matter to users, no matter what Nick thinks).
And of course, from there, you can also see other areas in which Google (and their competitors) are doing just this, from Google Docs and Spreadsheets (which exhibits the obvious kind of network effects that Nick is comfortable with), to mining clickstream data, to machine translation.
In short, Google is the ultimate network effects machine. "Harnessing collective intelligence" isn't a different idea from network effects, as Nick argues. It is in fact the science of network effects - understanding and applying the implications of networks.
I want to emphasize one more point: the heart of my argument about Web 2.0 is that the network effects that matter today are network effects in data. My thought process (outlined in The Open Source Paradigm Shift and thenWhat is Web 2.0?, went something like this:
- The consequence of IBM's design of a personal computer made out of commodity, off- the-shelf parts was to drive attractive margins out of hardware and into software, via Clayton Christensen's "law of conservation of attractive profits." Hardware became a low margin business; software became a very high margin business.
- Open source software and the standardized protocols of the Internet are doing the same thing to software. Margins will go down in software, but per the law of conservation of attractive profits, this means that they will go up somewhere else. Where?
- The next layer of attractive profits will accrue to companies that build data-backed applications in which the data gets better the more people use the system. This is what I've called Web 2.0.
Nick also took exception to my characterization of Wikipedia as a network-effects driven success:
I would also take issue with O'Reilly's suggestion that Wikipedia's success derives mainly from the network effect; Wikipedia doesn't become any more valuable to me if my neighbor starts using it. Wikipedia's success is probably better explained in terms of scale and scope advantages, and perhaps even its nonprofit status, than in terms of the network effect.How wrong can you be? If there weren't a network effect driving Wikipedia, Knol and Citizendium would be succeeding. Wikipedia got there first, to be sure, but they also built an infrastructure and a workflow and a philosophy that recognized that the collective of all users was smarter than any expert, and that barriers to participation would slow down improvement in the data. There isn't a Facebook-like benefit in "belonging" to Wikipedia, but the application understands something that its competitors don't about harnessing the network and its users to improve its data.
In short, Facebook is the obvious network effect case study. But we learn more by studying what is not obvious: the way internet sites and companies have derived competitive advantage by leveraging different kinds of network effects, most particularly (but not exclusively) to improve the data on which their services are built.
I'm making no claim, as Nick seems to think, that there are no other levers of competitive advantage in the internet era. Nor am I claiming that every network-effect business will be more successful than those that are not, precisely because there are other levers of competitive advantage, but also because some markets are more monetizable than others. But I am claiming that there will be significant differences in profitability between companies that find a network-effect sweet spot in a lucrative market, and those who embrace the commodity end of the business. And sorry, Nick, but I consider the cloud infrastructure business to be the commodity end of the business. It will look like the web hosting business, say, with a bunch of large, capital intensive providers, and not like the hugely profitable company extracting monopoly rents that Hugh Macleod (whose post The Cloud's Best-Kept Secret triggered my own) envisions. Monopoly rents, if they occur, will be at higher levels in the cloud stack.
tags: cloud computing, endogamous networks, exogamous networks, google, network effects, nick carr, nuance, web 2.0, wikipedia
| comments: 12
submit:
Yochai Benkler, others at Harvard map current and future Internet
by Andy Oram | @praxagora | comments: 0
Harvard's world-renowned Berkman Center for Internet & Society is celebrating its tenth anniversary with a conference called Berkman@10. I'll report here on today's sessions, which were organized as a fairly conventional symposium (although as loosely as one could run it with 450 attendees). Tomorrow will be set up as an unconference, where the audience defines most of the topics and self-organizes into small-group discussions.
tags: economics, free software, internet policy, law, open source, wikipedia
| comments: 0
submit:






