Entries tagged with “new york times” from Tools of Change for Publishing

Photos from New York Times R&D Lab

Nick Bilton was a hit yesterday at the TOC Conference, and during his keynote he talked about what they're working on with content at the NYT R&D Lab. Nick was kind enough to give a few of us a private tour earlier this week, and here's some photos from the trip:

IMG_0277.JPG

IMG_0278.JPG

IMG_0280.JPG

IMG_0282.JPG

IMG_0283.JPG

New York Times Opens "Best Sellers API"

The New York Times on Tuesday opened up its "Best Sellers API," offering programmatic access to best-seller data (going back to 1930!) from the Times:

The Times Best Sellers API gives you quick access to current and past best-seller lists in 11 different categories, such as Hardcover Nonfiction and Paperback Mass-Market Fiction. The initial launch offers every weekly list since June 2008, and in the coming months, we plan to add data going back to 1930 (thanks to the hard work of our Books staff). The API also offers details about specific best sellers, including historical rank information and links to New York Times reviews and excerpts. And these aren't just canned responses; they're searchable and sortable, with even more robust options coming in the next release.

I'm a huge fan of what the Times has done to embrace open architecture and data formats (and Nick Bilton, from the Times' R&D Lab, will be a keynote speaker at next month's TOC Conference), and this is a great example of what content creators and curators (i.e., publishers) can do to give customers the opportunity to create new value on top of that content. We've offered an API for our Safari Books Online product for several years now, and have some very interesting internal projects percolating to take things a step further.

New York Times Settles Linking Suit

In what many of us thought was a slightly bizarre case, the New York Times Co. has settled with GateHouse Media in a suit attempting to cease the automated aggregation of Gatehouse content on Boston.com's affiliated properties (Boston.com is owned by the Times Co.). It is not clear why the settlement was reached, since precedence was on the side of the Times' operation.

Mathew Ingram examines the settlement at the Nieman Journalism Lab:

Because while the settlement is not a legally-binding precedent -- the one piece of what might be called good news -- it still involves the New York Times voluntarily refraining from what many would argue is perfectly defensible behaviour. As Joshua Benton notes in his post at the Nieman Journalism Lab, that could well embolden other publications to launch similar cases, on the assumption that if the NYT caved then someone else might too. [Links included in original post.]

Report: Wall Street Journal Grabbing High-End Ads from New York Times

Silicon Alley Insider and others are reporting on Bloomberg's notice that the Wall Street Journal is grabbing high-end luxury advertising revenue from the New York Times:

As if the New York Times wasn't having enough trouble keeping up with an ad recession and the Internet crushing its print business. Now the newspaper is facing increasing competition for print ad s... from Murdoch's Wall Street Journal ...

... And then there's the stats: The WSJ has a paid circulation of 1.4 million, up 2.4% y/y. The NYT: 859,000, down 5.5%. With more readers, the WSJ can charge more for ads, $264,426 for full page color vs. $193,800 at the NYT.

New York Times Movie Reviews Released as API

The New York Times has released an application programming interface (API) to its movie reviews, which is a rather significant feature. From the Times' Open blog:

Finally -- and this is the key -- we're giving you access to our Movies search feature, containing all 22,000 reviews indexed by title, reviewer's name, director's name, names of the top five actors, and plot keywords. So, if you'd like to build a list of what The New York Times thinks of Pedro Almodóvar or Lindsay Lohan, we've got you covered. And this is only the beginning: in the next few weeks we'll be rolling out better lookup and search features that will let you call up reviews based on publication date or the movie's release date, just to name two.

The Times also released campaign finance and metadata APIs earlier this month.

Sulzberger: "Be of the Internet, Not on the Internet"

Arthur Sulzberger Jr. indicates he is willing to consider radical change to continue the New York Times' relevance in the digital age. From News.com:

Sulzberger would brand this not as a crisis, but rather as change that requires adaptation. "It's important for traditional companies to adopt strategies that enable us to be of the Internet, not on the Internet," he said. "There must be an institutional commitment to engage in reinvention, especially as the information revolution picks up steam."

That's why, he said, the Times has undergone some digital initiatives unusual for the print media business. It launched bookmarking and sharing service TimesPeople earlier this year. Soon, it will launch TimesExtra, which integrates acquisition Blogrunner onto the publication's home page to provide related links from across the Web. And it has also announced an API for developers to work with one of its most popular online features, the "Most Emailed" list.

Processing the Deep Backlist at the New York Times

At the O'Reilly Open Source Convention (OSCON), Derek Gottfrid of the New York Times led a fascinating session on how the Times was able to utilize Amazon's cloud computing services to quickly and cheaply get their huge historical archive online and freely viewable to the public.

How big is the archive? Eleven million individual articles from 1851 to 1980, or 4 terabytes of data (over 4,000 gigabytes). The Times got it ready for distribution in 24 hours, for a total cost of $240 in computing fees and $650 in storage fees.

As part of their original TimesSelect subscription service, the paper had scanned their entire print archive. Each full-page scan was cut into individual articles. Typical of newspaper format, the articles often spanned column or page boundaries, which meant that many articles were composed of several scans. In the original subscription-based program, whenever a reader requested one of these historical articles, the Times computer would need to stitch together all of the scans for a particular article before presenting it.

This on-demand process used significant computing resources, but because TimesSelect was subscription-based there was never much traffic. Once this archive was open to the public it was expected to generate greater usage, and the safest approach in those cases is to serve pre-generated versions of all 11 million articles. Using traditional software development practices -- with a single computer churning through one article at a time -- the processing could potentially take weeks and tie up Times servers that were needed for other tasks.

Gottfrid turned to Amazon Web Services (AWS) and its two main products:

Amazon Elastic Compute Cloud (EC2) is a form of "virtualization" where one very large computer is divided up into many virtual computers that can be individually leased out for use. Traditional hosting costs money whether the server is working or idle; in EC2 you pay only as long as the virtual computer is running. When it's no longer needed, it's shut down. This makes the service ideal for one-off processing jobs.

In addition, Amazon doesn't care whether you use one EC2 "instance" 100 times, or 100 instances all at once -- the cost is the same. The difference is when you can usefully divide a job into 100 concurrent tasks, because then it takes 1/100th the total time.

Amazon's other major AWS offering is the Simple Storage Service (S3), for large-scale file hosting. Like EC2, it is a leased model -- you pay only for the space that you use in a given time period.

Gottfrid leveraged these technologies in combination with a relatively new software library called Hadoop. Hadoop is written in the Java language and is based on work done at Google. It allows programmers to very easily write programs that can be run simultaneously on multiple computers.

Combining Hadoop concurrency with EC2 and S3, the Times was able to run a job that might have taken weeks of processing time and complete it in 24 hours, using 100 EC2 instances. They were pleased enough with S3 it became their permanent hosting platform for the scans. Hosting with Amazon or other cloud computing services is usually cheaper and has much better bandwidth than the average provider, although downtime can and does occur.

At last year's OSCON, the Times announced the formation of its developer blog, Open. You can read more about the original AWS project as well as TimesMachine, a project that became economically feasible due to the low cost of AWS.

Stay Connected
RSS TOC RSS Feeds
 News Posts
 Commentary Posts
 Combined Feed
 New to RSS?
Newsletter Subscribe to the TOC newsletter.
Tarsier Icon Follow TOC on Twitter.
Newsletter Join the TOC Facebook group.
Newsletter Join the TOC LinkedIn group.
TOC Widget Get the TOC Headline Widget.
Search
Tag Cloud