Entries tagged with “taxonomy” from Tools of Change for Publishing
Taxonomies and Starting With XML
This is an excerpt from a blog post I wrote last week on taxonomies and chunking.
Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?
Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.
We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.
But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.
Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.
Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.
Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.
My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.
A Correction!
Frank Grazioli, of Wiley, writes in to correct my last post about taxonomies:
Wiley has been exploring taxonomies for its travel content business; the cooking/psych/accounting spaces might be our next logical opportunities because the disciplines are well developed, specific, etc., that content is authored or edited in fairly controlled templates that map to our own XML content models and our belief in content models and XML has evolved that "lighter" and "more agile" are better than taggy and dense. As you so aptly point to the contextuality and "rigor" of taxonomies, these tools would allow our XML to "slip on the right jacket" for the occasion. I apologize if we led you to believe that we already have firm taxonomies in place for the three areas you specify--I wouldn't want readers/event guests to get that impression anyway.
Beyond the Tag Cloud
This is an excerpt from our research paper, which will publish in concert with the StartWithXML Forum on January 13th at the McGraw-Hill Auditorium in New York. Early bird discounting for BISG members is ending soon!
A good taxonomy is the backbone of your business -- it's how you sort your content. It allows for effective merchandising, effective marketing -- you can aim your content with the precision of a pool cue. It allows for inventorying your content -- so you know what you have ... and what you need. With your content tagged and organized, you know where everything is and how to deploy it.
Taxonomies are contextually sensitive and rigorous -- and in establishing your own, it helps to look at what other industries are doing. Wiley has adopted accounting and cooking and psychology taxonomies from those industries to organize information in its professional development titles. Educational publishers are increasingly arranging their textbooks around "learning objects" -- taxonomized pedagogical goals developed by educators themselves. Even the BISAC codes -- which are part of the ONIX system of organizing book information and therefore an XML-based taxonomy -- are developed very carefully and consensually among book industry professionals in monthly meetings.
An important aspect of taxonomy development is scope notes. Terms need definition and clarity around how they're going to be used. Documenting your taxonomy -- what you mean when you say "porcelain" (collectible china, dental work, household fixtures?), parent-child relationships between categories, and why you choose certain terms over others -- is important for the long term. Future editors and authors will need to know why your taxonomy has developed as it has.
Consistency in application is also crucial. Drop-down menus (as opposed to free-text fields) enforce structure and ensure that users don't come up with their own terms that pollute your taxonomy with duplicates or irrelevancies (or misspellings).
An advantage to using XML is that you don't have to accomplish everything at once, perfectly, from the outset. You will not be able to tag your documents thoroughly right off the bat -- who can know everything in advance? The act of tagging is recursive, and depends on market and company needs. XML allows for this flexibility. Depending on how you envision chunking and re-use, you'll tag your documents differently with each iteration. Unlike the "fire and forget" model, iterative tagging means that your books are living documents.
- Stay Connected
-

TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. 
Follow TOC on Twitter. 
Join the TOC Facebook group. 
Join the TOC LinkedIn group. 
Get the TOC Headline Widget.
- Search
-
