Entries tagged with “Semantic Web” from O'Reilly Radar
RSS never blocks you or goes down: why social networks need to be decentralized
by Andy Oram | @praxagora | comments: 26Recurring outages on major networking sites such as Twitter and LinkedIn, along with incidents where Twitter members were mysteriously dropped for days at a time, have led many people to challenge the centralized control exerted by companies running social networks. Whether you're a street demonstrator or a business analyst, you may well have come to depend on Twitter. We may have been willing to build our virtual houses on shaky foundations might when they were temporary beach huts; but now we need to examine the ground on which many are proposing to build our virtual shopping malls and even our virtual federal offices.
Instead of the constant churning among the commercial sites du jour (Friendster, MySpace, Facebook, Twitter), the next generation of social networking increasingly appears to require a decentralized, peer-to-peer infrastructure. This article looks at available efforts in that space and suggests some principles to guide its development.
Update: a few days ago, OpenID expert Chris Messina and microblog developer Jyri Engeström published an article with conclusions similar to mine; clearly this is a felt need that's spreading across the Net. Interestingly, they approach the questions from a list of what information needs to be shared and how it needs to be transmitted; I come from the angle of what people want from each other and how their needs can be met. The two approaches converge, though. See the comments for other interesting related blogs.
tags: Gnutella, Jabber, Napster, P2P, peer-to-peer, RSS, rssCloud, Semantic Web, social networking, standards, Twitter, XMPP
| comments: 26
submit:
Four short links: 13 August 2009
by Nat Torkington | @gnat | comments: 1
- Under the Hood of App Inventor for Android -- regular readers know I'm a big fan of visual programming language Scratch, and apparently Google are too. They've got twelve university classes testing App Inventor for Android, a visual connect-the-bits programming environment for Android. University classes probably because one of the co-creators is Hal Abelson, coauthor of the definitive programming textbook. Also found online: the PR-type announcement, a Professor using it, and @AppInv (nothing juicy on Twitter--it looks like might be a channel for tech support for the students). (via Hacker News)
- Google Web Optimizer Case Study (Four Hour Work Week) -- GWO manages A/B tests for you, with a lot of statistical analysis. It's a fascinating read to see how these should be done. Every equation may halve the readership of a book, but every table of numbers and relevancy analysis doubles the value of a post like this. (via Hacker News)
- Opening Up The BBC's Natural History Archive -- the BBC are releasing programme segments and a whole lot of metadata around their programming. Audio and video segmented, tagged with DBpedia terms, and aggregated into a URI structure based on natural history concepts: species, habitats, adaptations, etc. Gorgeous!
- Yahoo! Term Extraction API to Close -- Internally, both services share a backend data source that is closing down, so the publicly-facing YDN services will be closing as well. I think it's the most significant casualty of Y! outsourcing search to MSFT, as this API was used by a lot of projects. (via Simon Willison)
tags: android, apis, bbc, data, google, history, programming, semantic web, statistics, web, yahoo
| comments: 1
submit:
Four short links: 3 August 2009
Mathematics Collaboration, Risk, Visualisation, and SemWeb
by Nat Torkington | @gnat | comments: 0
- Enabling Massively Parallel Mathematics Collaboration -- Jon Udell writes about Mike Adams whose WordPress plugin to grok LaTeX formatting of math has enabled a new scale of mathematics collaboration.
- 2845 Ways to Spin The Risk -- introduction to the ways in which our perception of risk (and numbers in general) can be distorted by how it is presented. (via titine on Twitter)
- Logstalgia -- OpenGL app to visualize Apache log files.
- 4Store -- "scalable RDF storage". 4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people. (via joshua on Delicious)
tags: brain, collaboration, crowdsourcing, database, math, publishing, semantic web, visualization
| comments: 0
submit:
Four short links: 21 July 2009
Semweb, Comedy Java, Mobile Spyware, Crypto
by Nat Torkington | @gnat | comments: 0
- On Data Reconciliation Strategies and Their Impact on the Web of Data -- For years, I’ve been a fairly vocal advocate for the elegance and scalability of a-posteriori reconciliation via equivalence mappings as a superior mechanism (scale-wise) to a-priori reconciliation efforts but this started to change very rapidly once I started working for Metaweb and saw first hand how much more effective a-priori reconciliation can be, even if drastically more expensive and limiting in the data acquisition front. (via straup on Delicious)
- Java Spring's Biggus Dickus Effect -- Nonstop administrative debris as dadaist poetry. Écriture automatique of the programming office manager or his parrot. (via mattb on Delicious)
- Arabic Blackberry Spyware -- update pushed out to Arabic Blackberries CC:ed all email to the authorities. A powerful case for multi-distro platforms, which reduces the size of the market captured with one distro is pwned like this.
- NaCl - Networking and Cryptography Library -- open source high-level crypto library. NaCl (pronounced "salt") is a new easy-to-use high-speed software library for network communication, encryption, decryption, signatures, etc. NaCl's goal is to provide all of the core operations needed to build higher-level cryptographic tools. Of course, other libraries already exist for these core operations. NaCl advances the state of the art by improving security, by improving usability, and by improving speed. Creator of qmail is one of the developers. (via Simon Willison)
tags: big data, cryptography, mobile, opensource, security, semantic web
| comments: 0
submit:
Four short links: 18 June 2009
Weaker Copyright Good, YQL.gov, GeoSPARQL, Happiness
by Nat Torkington | @gnat | comments: 3
- Harvard Study Finds Weaker Copyright Protection Has Benefited Society (Michael Geist) -- Given the increase in artistic production along with the greater public access conclude that "weaker copyright protection, it seems, has benefited society." This is consistent with the authors' view that weaker copyright is "uambiguously desirable if it does not lessen the incentives of artists and entertainment companies to produce new works." (read the original paper)
- Using Public Data for Good With the Power of YQL -- The first part is a new batch of YQL tables providing data on the U.S. government, earthquake data, and the non-profit micro-lender Kiva. The second part is an incredibly easy way to render YQL queries on websites. After all, what good is data that no one can see?
- GeoSPARQL -- RDF meets geo goodness. SELECT ?s ?p ?o WHERE { ?s gn:name "Dallas" . ?s ?p ?o } (via the geowanking mailing list)
- How To Be Happy in Business -- this Venn diagram makes me happy. (via Ned Batchedler)
tags: copyright, geodata, gov2.0, lifehacks, location, open data, search, semantic web, yahoo
| comments: 3
submit:
Google's Rich Snippets and the Semantic Web
by Tim O'Reilly | @timoreilly | comments: 18There's a long-time debate between those who advocate for semantic markup, and those who believe that machine learning will eventually get us to the holy grail of a Semantic Web, one in which computer programs actually understand the meaning of what they see and read. Google has of course been the great proof point of the power of machine learning algorithms.
Earlier this week, Google made a nod to the other side of the debate, introducing a feature that they call "Rich Snippets." Basically, if you mark up pages with certain microformats ( and soon, with RDFa), Google will take this data into account, and will provide enhanced snippets in the search results. Supported microformats in the first release include those for people and for reviews.
So, for example, consider the snippet for the Yelp review page on the Slanted Door restaurant in San Francisco:
The snippet is enhanced to show the number of reviews and the average star rating, with a snippet actually taken from one of the reviews. By contrast, the Citysearch results for the same restaurant are much less compelling:
(Yelp is one of Google's partners in the rollout of Rich Snippets; Google hopes that others will follow their lead in using enhanced markup, enabling this feature.)
Rich snippets could be a turning point for the Semantic Web, since, for the first time, they create a powerful economic motivation for semantic markup. Google has told us that rich snippets significantly enhance click-through rates. That means that anyone who has been doing SEO is now going to have to add microformats and RDFa to their toolkit.
Historically, the biggest block to the Semantic Web has been the lack of a killer app that would drive widespread adoption. There was always a bit of a chicken-and-egg problem, in which users would need to do a lot of work to mark up the data for the benefit of others before getting much of a payoff themselves. But as Dan Bricklin remarked so insightfully in his 2000 paper on Napster, The Cornucopia of the Commons, the most powerful online dynamics are released not by appeals to volunteerism, but by self-interest:
What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present...(Aside: @akumar, this is the answer to your question on Twitter about why in writing up this announcement we didn't make more of Yahoo!'s prior support for microformats in searchmonkey. You guys did pioneering work, but Google has the market power to actually get people to pay attention.)
What I also find interesting about the announcement is the blurring line between machine learning and semantic markup.
Machine learning isn't just brute force analysis of unstructured data. In fact, while Google is famous as a machine-learning company, their initial breakthrough with pagerank was based on the realization that there was hidden metadata in the link structure of the web that could be used to improve search results. It was precisely their departure from previous brute force methods that gave them some of their initial success. Since then, they have been diligent in developing countless other algorithms based on regular features of the data, and in particular regular associations between data sets that routinely appear together - implied metadata, so to speak.
So, for example, people are associated with addresses, with dates, with companies, with other people, with documents, with pictures and videos. Those associations may be made explicitly, via tags or true structured markup, but given a large enough data set, they can be extracted automatically. Jeff Jonas calls this process "context accumulation." It's the way that our own brains operate: over time, we make associations between parallel data streams, each of which informs us about the other. Semantic labeling (via language) is only one of many of those data streams. We may see someone and not remember their name; we may remember the name but not the face that goes with it. We might connect the two given the additional information that we met at such and such conference three years ago.
Google is in the business of making these associations, finding pages that are about the same thing, and they use every available handle to help them do it. Seen in this way, SEO is already a kind of semantic markup, in which self-interested humans try to add information to pages to enhance their discoverability and ranking by Google. What the Rich Snippets announcement does is tell webmasters and SEO professionals a new way to add structure to their markup.
The problem with explicit metadata like this is that it's liable to gaming. But more dangerously, it generally only captures what we already know. By contrast, implicit metadata can surprise us, giving us new insight into the world. Consider Flickr's maps created by geotagged photos, which show the real boundaries of where people go in cities and what they do there. Here, the metadata may be added explicitly by humans, but it is increasingly added automatically by the camera itself. (The most powerful architecture of participation is one in which data is provided by default, without the user even knowing he or she is doing it.)
Google's Flu Trends is another great example. By mining its search database (what John Battelle calls "the database of intentions") for searches about flu symptoms, Google is able to generate maps of likely clusters of infection. Or look at Jer Thorp's fascinating project announced just the other day, Just Landed: Processing, Twitter, MetaCarta & Hidden Data. Jer simulated the possible spread of swine flu built by extracting the string "Just landed in..." from Twitter. Since Twitter profiles include a location, and the object of the phrase above is also likely to be a location, he was able to create the following visualization of travel patterns:
Just Landed - Test Render (4 hrs) from blprnt on Vimeo.
This is where the rubber meets the road of collective intelligence. I'm a big fan of structured markup, but I remain convinced that even more important is to discover new metadata that is produced, as Wallace Stevens so memorably said, "merely in living as and where we live."
P.S. There's some small irony that in its first steps towards requesting explicit structured data from webmasters, Google is specifying the vocabularies that can be used for its Rich Snippets rather than mining the structured data formats that already exist on the web. It would be more "googlish" (in the machine learning sense I've outlined above) to recognize and use them all, rather than asking webmasters to adopt a new format developed by Google. There's an interesting debate about this irony over on Ian Davis' blog. I expect there to be a lot more debate in the weeks to come.
tags: google, microformats, semantic web
| comments: 18
submit:





