Entries tagged with “microformats” from O'Reilly Radar
Four short links: 17 July 2009
by Nat Torkington | @gnat | comments: 0
- NodeXL: Network Overview, Discovery and Exploration in Excel -- Excel plugin for analysing graph data within Excel. Visualization and data wizardry come to the corporates who live in Excel.
- Managing the Environmental Crisis -- a comment by Edwin Winge: "Public involvement does offer long-range benefits, the most pragmatic of which is that it results in better decisions. Park Service managers have discovered through experience that when they are willing to modify their professional judgements by considering ideas and opinions (values) of concerned citizens, the final decision that results is not only more acceptable to the public, it is also more satisfying to the Service." A banner quote for Gov 2.0, from the father of O'Reilly's Sara Winge. (via timoreilly on Twitter)
- Dopplr Social Atlas for iPhone -- an iPhone app that gives you the recommendations by Dopplr users for places to eat, things to do, places to stay around the world.
- Microformats Dev Camp -- July 25-6 (weekend following OSCON), in San Francisco at the Automattic offices. (via Tantek)
tags: data, dopplr, events, gov2.0, iphone app, microformats, visualization
| comments: 0
submit:
Google's Rich Snippets and the Semantic Web
by Tim O'Reilly | @timoreilly | comments: 18There's a long-time debate between those who advocate for semantic markup, and those who believe that machine learning will eventually get us to the holy grail of a Semantic Web, one in which computer programs actually understand the meaning of what they see and read. Google has of course been the great proof point of the power of machine learning algorithms.
Earlier this week, Google made a nod to the other side of the debate, introducing a feature that they call "Rich Snippets." Basically, if you mark up pages with certain microformats ( and soon, with RDFa), Google will take this data into account, and will provide enhanced snippets in the search results. Supported microformats in the first release include those for people and for reviews.
So, for example, consider the snippet for the Yelp review page on the Slanted Door restaurant in San Francisco:
The snippet is enhanced to show the number of reviews and the average star rating, with a snippet actually taken from one of the reviews. By contrast, the Citysearch results for the same restaurant are much less compelling:
(Yelp is one of Google's partners in the rollout of Rich Snippets; Google hopes that others will follow their lead in using enhanced markup, enabling this feature.)
Rich snippets could be a turning point for the Semantic Web, since, for the first time, they create a powerful economic motivation for semantic markup. Google has told us that rich snippets significantly enhance click-through rates. That means that anyone who has been doing SEO is now going to have to add microformats and RDFa to their toolkit.
Historically, the biggest block to the Semantic Web has been the lack of a killer app that would drive widespread adoption. There was always a bit of a chicken-and-egg problem, in which users would need to do a lot of work to mark up the data for the benefit of others before getting much of a payoff themselves. But as Dan Bricklin remarked so insightfully in his 2000 paper on Napster, The Cornucopia of the Commons, the most powerful online dynamics are released not by appeals to volunteerism, but by self-interest:
What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present...(Aside: @akumar, this is the answer to your question on Twitter about why in writing up this announcement we didn't make more of Yahoo!'s prior support for microformats in searchmonkey. You guys did pioneering work, but Google has the market power to actually get people to pay attention.)
What I also find interesting about the announcement is the blurring line between machine learning and semantic markup.
Machine learning isn't just brute force analysis of unstructured data. In fact, while Google is famous as a machine-learning company, their initial breakthrough with pagerank was based on the realization that there was hidden metadata in the link structure of the web that could be used to improve search results. It was precisely their departure from previous brute force methods that gave them some of their initial success. Since then, they have been diligent in developing countless other algorithms based on regular features of the data, and in particular regular associations between data sets that routinely appear together - implied metadata, so to speak.
So, for example, people are associated with addresses, with dates, with companies, with other people, with documents, with pictures and videos. Those associations may be made explicitly, via tags or true structured markup, but given a large enough data set, they can be extracted automatically. Jeff Jonas calls this process "context accumulation." It's the way that our own brains operate: over time, we make associations between parallel data streams, each of which informs us about the other. Semantic labeling (via language) is only one of many of those data streams. We may see someone and not remember their name; we may remember the name but not the face that goes with it. We might connect the two given the additional information that we met at such and such conference three years ago.
Google is in the business of making these associations, finding pages that are about the same thing, and they use every available handle to help them do it. Seen in this way, SEO is already a kind of semantic markup, in which self-interested humans try to add information to pages to enhance their discoverability and ranking by Google. What the Rich Snippets announcement does is tell webmasters and SEO professionals a new way to add structure to their markup.
The problem with explicit metadata like this is that it's liable to gaming. But more dangerously, it generally only captures what we already know. By contrast, implicit metadata can surprise us, giving us new insight into the world. Consider Flickr's maps created by geotagged photos, which show the real boundaries of where people go in cities and what they do there. Here, the metadata may be added explicitly by humans, but it is increasingly added automatically by the camera itself. (The most powerful architecture of participation is one in which data is provided by default, without the user even knowing he or she is doing it.)
Google's Flu Trends is another great example. By mining its search database (what John Battelle calls "the database of intentions") for searches about flu symptoms, Google is able to generate maps of likely clusters of infection. Or look at Jer Thorp's fascinating project announced just the other day, Just Landed: Processing, Twitter, MetaCarta & Hidden Data. Jer simulated the possible spread of swine flu built by extracting the string "Just landed in..." from Twitter. Since Twitter profiles include a location, and the object of the phrase above is also likely to be a location, he was able to create the following visualization of travel patterns:
Just Landed - Test Render (4 hrs) from blprnt on Vimeo.
This is where the rubber meets the road of collective intelligence. I'm a big fan of structured markup, but I remain convinced that even more important is to discover new metadata that is produced, as Wallace Stevens so memorably said, "merely in living as and where we live."
P.S. There's some small irony that in its first steps towards requesting explicit structured data from webmasters, Google is specifying the vocabularies that can be used for its Rich Snippets rather than mining the structured data formats that already exist on the web. It would be more "googlish" (in the machine learning sense I've outlined above) to recognize and use them all, rather than asking webmasters to adopt a new format developed by Google. There's an interesting debate about this irony over on Ian Davis' blog. I expect there to be a lot more debate in the weeks to come.
tags: google, microformats, semantic web
| comments: 18
submit:
Google Engineering Explains Microformat Support in Searches
by James Turner | comments: 8
You may also download this file. Running time: 18:24
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
Today, Google is releasing support for parsing and display of microformat data in their search results. While the initial launch will be limited to a specific set of partners (including LinkedIn, Yelp and CNet reviews), the intent is that very quickly, anyone who marks their pages up with the appropriate microformat data will be able to make their information understandable by Google. This technology would allow you to explicitly search, for example, for only printers that had an average customer review of 3 stars or higher. Initial support will include things such as:
- Review Ratings
- Product Prices
- Personal Details
We talked this morning with Othar Hansson and RV Guha, two of the Google engineers responsible for the new functionality, and you can listen to them discuss it in this exclusive O'Reilly interview.
JAMES TURNER: Why don't you guys start by introducing yourselves?
OTHAR HANSSON: Sure. I'm Othar Hansson: and I'm a tech lead on this project. And I'm in Google's Search UI Group.
RV GUHA: My name is Guha. I'm an engineer at Google and I do stuff across the board.
JT: So can you describe briefly, to start off, exactly what it is you're releasing today?
RVG: Okay. We are asking webmasters who have pieces of data like reviews or people profiles, and in an experimental form, things like information about organizations and products, to put the structure data representing the content on the webpage in a machine-understandable form on the webpage. Typically, what happens is that if you take a website and having created opinions, I can talk about the context of opinions. You would typically have a database in the back-end which has lots of information about products. People write reviews about them. And you get information such as the number of reviews, the average rating of the reviews, the price of the product, who sells it, et cetera, et cetera, et cetera. It's stored in a structured database in your back-end. You then use some scripts to format it into HTML as per the site's design. Now going from the structured data to the HTML is quite straight-forward. But going from the HTML back to the structured data in a fashion which works across sites is very, very, very hard. Now our search engine doesn't -- it's very difficult for a search engine to understand -- to sort of get back the structured data for all of the sites. Now if it were to understand that, if it were to understand that this is a review site where the product being reviewed is such and such and it has 30 reviews with an average rating of 3.2 and so on and so forth, we could do a better job of the search. In particular, we could do a better job of presenting the two or three lines of text that appeared as part of the search result so that the user has a better idea of what to expect on that page. And from our experiments, it seemed that giving the user a better idea of what to expect on the page increases the click-through rate on the search results. So if the webmasters do` this, it's really good for them. They get more traffic. It's good for users because they have a better idea of what to expect on the page. And, overall, it's good for the web.
JT: So in some ways, that's in the same way that right now for certain sites, you'll give the internal structure of the site as part of the search result or for shopping results, you'll give price ranges and things like this. This is just, again, enriching and providing more structured -- more than just a snippet, giving more of a structured display of the information on that page?
RVG: Yes. If we have a structured data, we can do lots of things. We're starting off by improving the snippets. It's an absolute no-brainer. It seems to be helping everybody. And, as you know us, we keep playing it on with different ideas and different things. As structured data becomes more prevalent, there's a ton of ideas, both inside Google and outside Google, on how you might improve search.
tags: google, interviews, microformats, search, seo
| comments: 8
submit:
Portable Contacts API Starts to Get Real
by David Recordon | @daveman692 | comments: 13

This evening Joseph and John of Plaxo and I have been hosting a hackathon at Six Apart for the Portable Contacts API (video about PorC). The Portable Contacts API is designed "to make it easier for developers to give their users a secure way to access the address books and friends lists they have built up all over the web."
We originally expected a handful of people to show up and hack on implementing bits of the specification, but so far have been blown away at the progress made and about the twenty people that came. Tomorrow is a summit style meeting hosted by MySpace also in San Francisco to try to finalize the specification among a wide range of providers and consumers. I'm expecting a handful of interesting demos, but wanted to share two that have already come together tonight.
Joseph Smarr and Kevin Marks of Google hacked together a web transformer that integrates Microformats, vCard, and the Portable Contacts API. Given Kevin's homepage which is full of Microformats, they've built an API that extracts his profile information from hCard, uses a public API from Technorati to transform it to vCard, and then exposes it as a Portable Contacts API endpoint. Not only does this work on Kevin's own page, but his Twitter profile as well which contains basic profile information such as name, homepage, and a short bio.
Brian Ellin of JanRain has successfully combined OpenID, XRDS-Simple, OAuth, and the Portable Contacts API to start showing how each of these building blocks should come together. Upon visiting his demo site he logs in using his OpenID. From there, the site discovers that Plaxo hosts his address book and requests access to it via OAuth. Finishing the flow, his demo site uses the Portable Contacts API to access information about his contacts directly from Plaxo. End to end, login with an OpenID and finish by giving the site access to your address book without having to fork over your password.
While the individual building blocks are fairly geeky themselves, pulling them together like has been happening tonight shows that we're only at the beginning of building the next generation of social networks. When the pieces work together, people won't have to know what's going on under the hood; it will just work--and will be almost like magic. John has more photos up on his blog.
tags: apis, buzzwords, microformats, oauth, openid, portable contacts api, social networking, the social network, web 2.0
| comments: 13
submit:






