Entries tagged with “tags” from Tools of Change for Publishing

Taxonomies and Starting With XML

This is an excerpt from a blog post I wrote last week on taxonomies and chunking.

Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?

Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.

We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.

But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.

Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.

Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.

Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.

My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.

Tagging the Real World through Barcode Apps

Earlier this week, Peter Brantley noted an interesting barcode application for Android phones that connects the ISBN data on a physical book with Google Book Search listings. This merging of the physical and digital worlds isn't novel -- other companies offer similar applications -- but the discussion surrounding these apps tends to focus on retail threats and opportunities rather than broader uses.

Speaking as an unabashed content geek, I find the information curation possibilities from this digital-physical merge particularly interesting. The Web has provided an assortment of organization tools -- RSS feeds, readers, tags, categories, etc. -- that help me find and synthesize a vast amount of information. But the same can't be said for the real world. If something pops onto my radar while I'm sitting in front of the TV or shopping at a store, I need to open a browser (assuming I have a computer or phone), punch in the information and save it for later retrieval. This isn't an arduous task, but it lacks the elegance of scanning and tagging Web-based data.

My online efficiency increased exponentially a few years ago when I incorporated RSS feeds and readers into my daily routine. Instead of tediously visiting particular sites or running open-ended search queries, I could now gather useful sources in one application and sort that data into segments geared toward my own needs. Not to get too syrupy here, but it was an eye-opening experience that revealed a new depth to the Web. These barcode apps offer similar possibilities for seamlessly accessing the physical world's stored information. Armed with a cell phone and a data plan, those of us who are curation minded can expand the boundaries of discoverability into an untapped region.

Standardizing Tags in the Metadata Minefield

One issue we haven't discussed much is that of metadata. XML documents are by definition rife with metadata. At what point does metadata cross the line from useful to pollution?

When it's not standardized.

The kind of XML tagging we're primarily talking about can be sectioned into three buckets: rights data ("this picture is good for print products but not electronic ones," "we can use this graphic anywhere," "these animations are exclusively for the workbook"), formatting data ("this is a chapter," "this is a footnote"), and context data ("Paris," "1955," "General Robert E. Lee," "noodles").

This is a perfect recipe for complete chaos. Obviously standards are crucial to the success of using XML in publishing. Even standards within a department -- using tags the same way from one project to the next, from one PERSON to the next -- are crucial.

There's been some talk about the role of the Book Industry Study Group in developing tagging standards, in the same way they've developed BISAC code standards. And this makes a great deal of sense. The rights and formatting tag standards should be relatively easy to establish -- publishing houses, no matter whether big or small, tend to use this data fairly consistently. It's the context tags that pose the more serious challenges.

Library of Congress has done this sort of thing with its subject headings. But, like the BISAC codes, these refer to the subject of an entire book. Many books, however, are comprised of more than one topic - many chapters are comprised of more than one topic. That level of granularity has never been taxonomized before.

Still, it's important to do so in a standardized way, to avoid a cacophony that drowns out meaning. (Is it "pasta" or "noodles"? When you say "diamond," are you talking about baseball or gemstones or Neil? Why is a chapter published by Mosby about dentistry coming up in search results with the chapters on collecting Limoges china published by Antique Trader? Hint: "porcelain.")

If you've ever seen a tag cloud on a website, you'll know what I mean. You never know what you're going to get when you click on it. Standardizing context tags is probably the most thankless, boring job publishers will ever engage in. But it's also the one that's going to ensure that books are actually discoverable the way they're meant to be discovered.

Stay Connected
RSS TOC RSS Feeds
 News Posts
 Commentary Posts
 Combined Feed
 New to RSS?
Newsletter Subscribe to the TOC newsletter.
Tarsier Icon Follow TOC on Twitter.
Newsletter Join the TOC Facebook group.
Newsletter Join the TOC LinkedIn group.
TOC Widget Get the TOC Headline Widget.
Search
Tag Cloud