Entries tagged with “nitty gritty tech” from O'Reilly Radar
Random OSCON Tidbits
by Nat Torkington | @gnat | comments: 3
Some things I learned about at the Django/Python meetup in downtown Portland during OSCON:
- JS Bridge: a Python to Javascript bridge for all Mozilla applications, still under very active development (i.e., changing daily).
- 960.gs: a grid framework for Javascript (replacing Blueprint CSS) with a naming scheme that makes prototyping designs a lot less painful.
- Dojo has Django Templates: I take my eye off Dojo for a year and it suddenly grows the ability to have full Django templates in the browser. Holy CRAP.
tags: nitty gritty tech, open source, oscon, web 2.0
| comments: 3
submit:
Special Purpose Computing Focuses on Energy Efficiency
by Jim Stogdill | @jstogdill | comments: 0
To improve the climate models that predict global warming, climatologists are seeking model resolutions on the order of 1 km. Unfortunately, building the required 200 petaflop machine with today's commodity-hardware approach would cost $1B and would result in a staggering 40 megawatts of power consumption.
A group of researchers at Lawrence Berkeley National Labs, who must be aware of the irony inherent in using 40 continuous megawatts to better predict global warming, may be returning supercomputing to its specialized roots but along a new vector (yes, weak pun intended). In addition to the Cray-era focus on raw power, they are emphasizing energy efficient computation where floating point operations per watt is the key metric.
Their approach has already been described at EcoTech Daily and the lab's Research News so I'm just going to summarize it here. They are working on specialized hardware consisting of 20 million very low power embedded processors (of the sort used in iPods and cell phones) wired together with the specific climate calculations in mind. By trading flexibility for efficiency, the design should achieve a ten-fold improvement in the floating point operations per watt metric and the resulting 200 petaflop machine is predicted to require only 4MW of power and cost $75M to build.
Their motivations in their own words:
"What we have demonstrated is that in the exascale computing regime, it makes more sense to target machine design for specific applications," Wehner said. "It will be impractical from a cost and power perspective to build general-purpose machines like today's supercomputers."
Specialized problems are amenable to specialized solutions and scientific computation seems particularly suitable to this kind of approach. However, on the web and in corporate IT where computing is both more general and inefficiently deployed, the first wave of energy efficiency improvements are being addressed primarily through a combination of virtualization and incremental improvements in commodity chip design.
I don't think software carpooling will be the only game in town for long though. While virtualization and dynamic provisioning are facilitating better utilization of existing hardware, virtualization comes with a performance cost of its own and can be no better than the hardware it is running on. Once you get four passengers in a V-8 powered SUV further improvements have to come from changing driving habits and modifying the vehicle.
As virtualization initiatives pick the low hanging fruit, further gains will come from fundamental hardware improvements (which may include analogous specialization) in concert with "best efficiency" dispatching that targets optimal server utilization in a dynamic server pool. An interesting example of this kind of approach is described here (pdf).
As I've touched on before, beyond that, a "systems view" to optimize the whole data center as it operates under changing conditions and with heterogeneous hardware might come next. Returning one last time to the carpooling analogy, this would be like a smart traffic routing system that keeps each car-pooling hybrid moving at its most efficient speed. The end result might be an optimally-sized mixed pool of specialized and commodity hardware each dispatched to operate the data center holistically at its best unit of work per watt.
tags: datacenter, energy, nitty gritty tech, specialized services, supercomputing
| comments: 0
submit:
Review Board is good software
by Marc Hedlund | comments: 7
After having tried and failed to have useful code reviews at several different companies, and after feeling deep envy for Mondrian, Google's web-based code review tool, I'd been looking for some tool that would help make code reviews more painless. I think I've found what I was looking for in Review Board.
Code reviews usually amount to infrequent lunchtime sessions where some poor engineer's code gets put on a projector and strip-searched by the whole engineering group at once. It's not a fun experience for anyone, and post-traumatic stress or severe empathy often result in the next session mysteriously taking several weeks or months to make it onto the calendar. Public humiliation has its place, perhaps, but as a last resort -- attempts to make it regular, good sport usually fail on the launchpad.
Yet good code reviews -- as hard as they are to find -- can produce great effects. There's no better way to learn how to improve your code than to have someone look over it carefully and make suggestions line by line. I've been impressed by the results that teams using careful code review report: that making changes and fixing bugs in the code is relatively easy, since everything is fairly clean and accessible. Great code review makes bugs more shallow.
There are a few web-based tools for code reviews. Mondrian isn't available outside of Google, but Codestriker (Perl-based) has been around for a while, and Crucible from Atlassian has a nice UI and good features -- but a US$2,400.00 starting price point, including the required FishEye server.
I spent a ton of time getting Crucible set up, but before taking the plunge I decided to take one more look for alternatives, and stumbled on Review Board. It's a Django/Python-based open source project, and it seems to have an active and responsive community. The documentation for getting it set up is a little thin, but it still took far less time than Crucible to get going. The UI isn't quite as nice, but it's serviceable, and the iPhone/JSON API/Git & Mercurial & SVN & Perforce & CVS support all turned my eye. Also, I like that Review Board allows pre-commit reviews, which Crucible as yet does not.
You can immediately see why Review Board is going to be a great open source project when you submit a patch. All patches are, of course, code reviewed using Review Board, and nobody working on the project is going to let a minor glitch go by. My first patch got an immediate "no way"; later patches (such as this one) were up to snuff. I've learned a couple of tricks already from the review comments, and I definitely am spending more time getting things right before submitting.
Take a look through the project launch post and you'll see what the authors are going for. I have Review Board set up at our office, and I'm psyched to give it a try and see how it goes. It's great to see such a healthy project in this area, and I hope it continues to grow and go well.
tags: nitty gritty tech
| comments: 7
submit:
NYT and Sun on Concurrency
by Nat Torkington | @gnat | comments: 1
Two interesting stories on concurrency came past my browser this morning: NYTimes on Microsoft's concurrency efforts and Allan Packer from Sun on open source databases. The NYT piece is about Microsoft's efforts to produce multicore programming tools, which include hiring a bunch of supercomputing veterans.
“Industry has basically thrown a Hail Mary,” said David Patterson, a pioneering computer scientist at the University of California, Berkeley, referring to the hardware shift during a recent lecture. “The whole industry is betting on parallel computing. They’ve thrown it, but the big problem is catching it.”
The chip industry has known about the hurdles involved in moving to parallel computing for four decades. One problem is that not all computing tasks can be split among processors.
It's not a particularly deep piece, as befits the NYT's mass readership, but it's nice to see the issue getting some attention outside we early adopters. It's going to be interesting watching programs like web browsers adapt to a multicore world: can layout be distributed? Javascript? (Brendan has already said JS3's concurrency won't be threading, it'll be something more implicit)
The Sun blog post is an article posing the question Are Proprietary Databases Doomed?. The interesting part for me was the table showing how Oracle seat license costs went nonlinear with cores. The author posits this is a big push in the favour of open source software and anything else free of per-core licensing business models.
Anecdotally, some companies are finding that database licenses have become their single biggest IT cost. The impact is probably greater on small and medium-sized companies that don't have the same ability to command the hefty discounts that larger companies typically enjoy from database vendors. A colleague related a story that illustrates the issue. His brother worked for a 200-person company that decided it was time to upgrade their database applications. They set out to deploy a well-known proprietary database until they discovered that the database license fee was going to exceed their entire current annual IT budget! They ended up deploying an open source database instead.
Thanks to Simon Phipps for the pointer to Sun blog.
tags: nitty gritty tech
| comments: 1
submit:
Hey, AT&T, What's the Value of a Closed Network Again?
by Jimmy Guterman | comments: 5
Closed networks, its proponents maintain, offer a trade-off. Individuals or outside developers can't make any changes or improvements to it. But since the network and its applications are controlled at a single source, individuals are supposed to get an easier experience in which they don't have to think about the network, just what they're doing on it. Trust the network.
I was thinking about that earlier today when I tried to check my AT&T balance on my locked-into-AT&T iPhone. The built-in "AT&T MyAccount" link deposited me on a web page that could not recognize my phone or my account. That's right: I was using an AT&T software service on an AT&T network and it couldn't identify my device. Even when my phone's entire interaction was on an AT&T network and software, the company was unable to perform the most basic customer service: recognize me. Tell me again what the benefits of a closed network are?
Even our-closed-is-beautiful poster child Apple is starting to move (albeit kicking and screaming) toward a more open iPhone software platform, in the hope that the ensuing innovation will grow its market. Are the mobile networks such as AT&T capable of doing the same?
tags: nitty gritty tech
| comments: 5
submit:
Parrot and Multi-threading
by Tim O'Reilly | @timoreilly | comments: 5
Over on the O'Reilly Network, Kevin Farnham has an interesting blog post about the connection between Parrot and the arrival of multi-core processors (Radar posts.) Kevin makes a number of good points, especially how changing fundamentals can help a technology to "arrive." (Think of how Ruby on Rails made Ruby suddenly the language of choice after ten years as a second-stringer, or how multicore is renewing interest in languages like Erlang and Haskell.) Kevin is also absolutely right about how multi-core will affect even home computing.
Sometimes a technology is invented, and the time simply isn’t right, the need at the moment for solutions that apply that technology is nearly non-existent, though many people readily admit it’s a “wonderful” technology. I wonder if this might apply to a certain extent to Parrot prior to the age of many-core computing?In a few years, inexpensive PCs will have 8, 16, or more processing cores. Some people doubt that the average home or office user is going to have any use for all these cores. I think that’s like saying “no one will ever need more than 640K of RAM.” Once it’s possible for the average home or office user to apply algorithms and image analysis and video processing and stock market simulators that were previously available only on high-end workstations in data centers, you cannot tell me they won’t want to do this.
...
I doubt that applying conventional low-level threads is going to be an efficient way to accomplish this in terms of programming time (I’ve worked at this level for a long time). But on the other side: no one is going to want to convert the mass of existing software platforms/applications that could potentially apply these computation libraries, into C++ or C. A convenient means to enable a broad spectrum of languages to call multithreaded C++, C, and Fortran libraries is going to be needed. Otherwise, again we face enormous software development inefficiency, as a separate interface has to be constructed for each library for each calling language. That’s not a solution that is going to fly, in my opinion.
It seems to me that Parrot is an excellent candidate for addressing this problem. If this is the case, the Parrot team may soon find itself lent increasing support from independent developers, and possibly from companies who recognize the need for this capability with respect to their own applications.
I don’t think this need was really there when PC performance could be improved simply through ever-increasing clock speeds. Single-threaded software that did a few simple calculations was fine then. Multicore, however, changes everything. As highly-scalable multithreaded computation / simulation libraries become available, and people realize they want them, and developers realize they need to be able to call these libraries from every language platform, Parrot’s time may arrive.
Kevin's suggestion that Parrot may solve a big problem for multi-core programming also suggests that the wait for Perl 6 may not have been in vain. (I've always hesitated to count out Larry Wall, one of the true geniuses of programming, and he's backed up by some other great programmers.)
Disclosure: a couple of people on the Parrot team, including Allison Randal and chromatic, work for O'Reilly.
tags: nitty gritty tech
| comments: 5
submit:
Shared nothing parallel programming
by Artur Bergman | comments: 6
I agree strongly with Tim and Nathan's belief in the importance of parallel computing. I've been following this space since 2000, when I took Gurusamy Sarathy's initial work on making perl multi-threaded and finished it for the 5.8 release.
The initial perl threading released in 5.5 had a traditional architecture: all data was shared between all threads. The problem with this approach was the need for continuous synchronization between threads would slow the whole machine down. For 5.8 we revised the plan, and settled on the default of a completely non-shared environment. Each thread had its own context, with its own data space. Only explicitly shared variables were accessible between threads. This let most of the code run at full speed, only paying the synchronization cost when a shared variable was accessed.
I am a firm believer in the shared nothing architecture. Multithreading is hard, with the standard way to solve concurrency problems being to add mutex protection around the non-thread-safe code. Those mutexes allow only one thread to access a particular resource at one time. So imagine your 32-core machine, running an application with 32 threads that uses a mutex to control access to a vital part of the application. All threads need to continuously acquire this mutex, thus creating a bottleneck that allows only a few threads to execute. So your 32 threads, on your 32 core machine, are mostly sitting around waiting for their turn.
With a shared nothing architecture, you can avoid this. If your thread never has to acquire a mutex, it can run at full speed on its assigned CPU. A recent visit to IBM Almaden again underscored the importance of this to me. They showed us a Blue Gene, an awesome beast with 2048 CPUs per rack. Each CPU is a little computer on a chip, with ethernet networking, local interconnects and 512 MB of RAM. They have two of these racks together, and to make it even cooler, you can put 64 of these together for a total of 65536 CPUs. All of these CPUs share no memory, so to implement software on them, you have to use a shared nothing architecture.
The important challenge is not to allow star developers to write multithreaded code; it is to allow the large army of enterprise developers out there to scale their applications to large numbers of cores. Perhaps tools like PeakStream (purchased by Google) or its remaining competitor, RapidMind, can help, but I remain doubtful. I spent a summer reading a printout of all 16,000 lines of perl regular expression code, with a marker pen to find problematic spots. I am unconvinced a tool could have done that for me.
Radar friend Jeff Jonas made me think about this when he posted about performance on his blog. I believe this is direction parallel computing has to go.
Our small database footprint project had the goal of externalizing as much computation off the database engine - pushing this processing into share nothing parallelizable pipelines. So we also did such things as externalized serialization (no more using the database engine to dole out unique record ID’s) and eliminated virtually all stored procedure and triggers - placed more computational weight on these "n" wide pipeline processes instead.
tags: nitty gritty tech, worries
| comments: 6
submit:
Saying only new (-ish) things about the iPhone
by Marc Hedlund | comments: 45
We've all read about how cool flicking is and how lame EDGE is. Enough on that. Below are some things I haven't already read a thousand times about the iPhone. Full disclosure: I've owned one Newton, two Blackberrys, three Palms, and three Treos (geeeeeeek!), and I'm switching from a Treo 650 to an 8GB iPhone.
- The iPhone kicks the Treo's butt in a lot of ways, but between the two devices, the most useful win on the iPhone, believe it or not, is threads. PalmOS is single-threaded, so if you set your mail application to check mail every 15 minutes, at some point every day you'll take your phone out of your pocket and either have to wait for mail download to complete before you can make a call, or cancel the download (which takes forever). As a result I always left my Treo on "manual" download, which means you have to press a button to initiate getting mail, then wait for it to happen. The iPhone either hides the download process a lot better, or is actually doing two network operations at the same time -- I suspect the former, but I don't care. I don't have to worry about it. I set the iPhone to automatically download mail every 15 minutes and haven't caught it in the act once. The mail is just there when I turn on the phone. Such a huge improvement.
- The iPhone keyboard blows. Let's not mince words, here: text input was better on a Newton. The keys are way too close together, full stop. The auto-suggestion works okay if you're typing dictionary words (and not, say, street names, as in the Google Maps app) and if you're in a context where typing space to accept is useful (in URLs, for instance, there is no space bar). The amazing thing to watch is everyone blogging about how they "need to get better at typing" -- that's the drugs talking. The iPhone needs to get better at typing, not you. Jason Santa Maria nailed this one in January in his post, A Plea for the Fat-Fingered, in which he argued that the keyboard should be available in landscape orientation, not just portrait. Fortunately Apple took his advice -- see picture here -- but unfortunately, landscape keyboarding only works in Safari. The first iPhone software update really, really, really needs to enable landscape keyboarding for all apps. That one, over-the-wire, software-only update would by itself vastly improve the experience overall.
- Taking those two notes together, the Treo still wins over the iPhone in placing calls (you know, the "phone" in "iPhone"). The Treo phone application lets you just start typing the first or last name of the person you're calling, and it shows you a list of matches to what you've typed. So fast, so easy. It doesn't matter if I know you as "Mr. Jones" or "Tom" -- either way, I can get your number and make a call very quickly. The iPhone's flicking interface is great and all, but scrolling through over a hundred contacts in the same letter of the alphabet is tiring.
- This weekend's sales numbers -- bigger than Razr's first month, maybe 700,000 units sold, etc. -- are very impressive. But, I think it will be getting a lot bigger from here (assuming Apple can make the things fast enough). The reason? What our dorky industry likes to call "viral growth." In the iPhone's case, media carpet-bombing of the topic means that everyone knows what the thing is, and then any experience of the phone is such a rich, immediate, visual attraction that five minutes playing with one will sell you on it. I've seen stock brokers demoing to barristas, bike messengers showing it off to grandmas. Everyone wants to see, and when they see, everyone wants to buy. I demo it by giving it to people, off, and letting them figure out what to do with no help from me at all. That they can, and that the experience is so good, can only mean a lot more good news coming for Apple.
- Apple may be letting AT&T be AT&T, but I've gotta say, they sure seem on it to me. I transferred a phone number over from Sprint (apparently the biggest loser in this weekend's deluge), whose web site was down all day Monday, and which had just converted everyone's account numbers (interesting timing!), so my number transfer failed. I called AT&T, and while the whole process involved more than an hour on the phone, they called Sprint for me, bitched them out for me while I was on the line, basically hung right up on them once the transfer was complete (which is what I would have done, so thanks), and then called me back later to say, "Oh, we found our that your data plan wasn't set up correctly, so we fixed it for you and you'll see a refund tomorrow." (Attention AT&T: Colleen Boyd in Halifax Center deserves a raise.) When I finally got off the phone, I found two messages waiting from someone else at AT&T who had called me as soon as the transfer failed. Um, are you guys really a phone company? You seem, uh, competent.
- Calling it the best iPod ever is an insult to the experience. I would never want just an iPod again -- the video playback and the browsing interface both feel entirely new, not like a revision of the iPod. Likewise, the iPhone's Safari is unbelievably good. I can't really imagine wanting to use anyone else's browser again. I'd go so far as to call it the first real web browser on a phone.
- I come back again to third-party apps (not a new topic). Apple has released some helpful materials for web developers working on iPhone-specific sites, and all the promised capabilities are there. But, this UI deserves more of a workout than Stocks and Weather. Were it not for the quality of the browser, I'd already feel like I'd hit all four walls of the iPhone experience. The development community around the iPhone is already amazing. Let them have an icon on the home screen for their work and let them at the full capabilities of the device, and I don't see any limits on the utility they'll create for you. Let them go to work.
It's got some warts and some rough edges, but I think it's only fair to call this the best 1.0 evar. Congratulations to everyone at Apple, and thanks. (Ratatouille was great, too!)
Update: After a week, suddenly everyone linked to this post. In the time since, Artur has pointed out to me that the standard Treo mail app does some weird CPU scheduling to simulate threading -- SnapperMail, the Treo mail client I used, doesn't. Visual voicemail is now a serious contender for my favorite novel feature on the iPhone (beating out threads); again, another feature that I can't really imagine leaving behind now that I have it. The 3rd party app scene is cheering up, thanks primarily to the heroic efforts of Joe Hewitt to make a great app framework. The keyboard still blows.
tags: nitty gritty tech
| comments: 45
submit:
Google's Acquisition of Peakstream
by Tim O'Reilly | @timoreilly | comments: 3
Google's acquisition of Peakstream is obviously relevant to the theme we've been sounding here recently about the importance of concurrent programming. Nat wrote about this in backchannel email after my previous blog post, but I thought his comment was appropriate to share more widely.
Nat picked up on one of Adam Beberg's comments about Folding@home's use of GPUs from the IP thread I pointed to earlier. I didn't reproduce this quote in the original post, but it was in the full email thread I'd sent around internally:
We use BrookGPU (also out of Stanford) to fold on the GPU's and manage the insane hardware under the hood, and actually manage to be computationally limited not bandwidth limited. It's a very simple way of hiding all the complexity and nothing a run of the mill programmer can't use - our GPU code is by a chemist. Yes he knows what's under the hood, but it's all hidden by Brook so it's mostly trial and error until you hit the hardware limits.
Nat then pointed out:
Google just bought a company that makes programming tools for general purpose GPU programming. GPUs are a new source of parallel computation. They used to be used to just handle graphics tasks but now companies like nVidia are making general purpose GPUs for arbitrary computation.
tags: nitty gritty tech
| comments: 3
submit:
Google's Folding@Home on the "Multi-Core Crisis"
by Tim O'Reilly | @timoreilly | comments: 18
There's been a fascinating exchange about programming for multi-core computers on Dave Farber's Interesting People List. It started when Andrew Donoho wrote in a thread about the need for US colleges to retool their programming classes:
It is very clear that the software industry is going to hit a programming wall some time in the next 6 years (4 Moore's Law generations). Microprocessors are going to progress from 4 to 64 processor cores. Most algorithms, other than the embarrassingly parallel ones, will hit this wall. This wall is not going to be surmounted by 'code monkeys'. To truly exploit this coming surfeit of processors, we will need both architectural and software invention. This problem will need 'classically trained' disciplined thinkers. While I was not trained as a computer scientist (experimental physics was my field), the problem is going to need folks of the caliber that originally defined our core architectures. (The next Johnny Von Neumann, I'm looking for you.) In other words, we are entering an era of unprecedented architectural research and invention opportunity.
Adam Beberg replied:
Sorry Dave, I just can't let the multi-core "crisis" comment stand without a reply. That false meme needs to die.I work on Folding@home, which is a 250K-node distributed system, where each node is now days a dual-core at minimum, an 8-core Cell on avg, or at maximum an on chip cluster of SMP elements with 1000's of threads in a NUMA setup of vector processors in the new GPUs. Nominally we have a 1M+core system operating as a coordinated whole.
The idea that 64 cores is a problem is patently absurd. Complex on many levels? Sure. CS not available by the 80's? No - the 64-node Caltech Cosmic Cube was in 1981. Wall in sight for multi-core? Nope maybe at 1B cores, I'll keep you posted. GPU compilers ready for mainstream? Give them another year. Courses to learn this stuff? Already out there.
Andrew then replied:
With all due respect to your excellent work on Folding@Home, your algorithms fall into the "embarrassingly parallel" category.
Adam didn't agree at all:
While on the surface Folding@home appears in the job queue category of SETI/BOINC projects, it's not. Under the hood it's a rather crazy feedback process with Markov models and each new work unit depending on the previous ones. If anything this is where the fun CS problems are, and where the active research is. 10 years ago noone even thought you could distribute these problems and had to use supercomputers.
The argument continued into all kinds of fabulous detail about the kinds of problems that are or are not amenable to parallel computing. In summing up, Andrew wrote:
the industrial world, which I am from, is hitting the wall of many core programming. Many of our economically important algorithms (value > $1 Billion) do not scale above 16-20 cores.
Adam replied:
Yes we will solve them, but we have to change our algorithms from what most people are used to and this will take time. The same methods we use for distributed folding also seem to translate to a wide variety of other domains, so I see no hard walls on the horizon. I really do hope we find a wall soon so I can climb it and I'm crossing my fingers for a surprise at a billion.The current generation of programmers is learning in a world of multi-core, and from what I have seen they have zero if any trouble dealing with it. Once they get some experience, we'll wonder why this was ever considered hard.
tags: nitty gritty tech
| comments: 18
submit:
Pycon Wrapup on onlamp.com
by Tim O'Reilly | @timoreilly | comments: 0

There seemed to be quite a lot of interest in my previous post, Pycon a "hiring fest", so readers may be interested in the ONlamp.com Pycon Wrapup. Jeremy Jones gives an overview of some of the Python projects to watch, and a summary of the new features in Python 3000, with Guido's top five migration suggestions, based on an interview with Guido van Rossum, the father of Python.
"Python 3.0a1 is scheduled to be released in June 2007. 2.6a1 is scheduled for December 2007 with 2.6 final coming April 2008. 3.0 final should arrive June 2008. I thought it was interesting that 3.0 alpha will be available prior to 2.6 alpha. It is probably helpful to have a working implementation of 3.0 before 2.6 so they can put in better warnings about things that are potentially broken for a user. It looks like "real soon now" is actually going to happen real soon now."
tags: nitty gritty tech
| comments: 0
submit:
ETech Preview: Super Ninja Privacy Techniques for Web App Developers
by Marc Hedlund | comments: 1
I'll be speaking at ETech this year on "Super Ninja Privacy Techniques for Web App Developers," about the various techniques we use at Wesabe to keep people's data private. Since we deal with very sensitive information (your money, how you spend it, and how you can get more out of it), we've tried to come up with as many new ideas for protecting users in the Web 2.0 world as we can.
My co-worker Brad Greenlee created a simple and very powerful privacy architecture, which we call the "Privacy Wall," for separating sensitive data from personally-identifiable information like your email address, username, and public community posts on our site. Brad wrote up a description of the Privacy Wall and how users can understand it on our blog, and then a longer, more technical explanation of it for other web application designers on his own blog. I'd encourage anyone working on or interested in privacy-sensitive web apps to check it out.
We haven't done anything to secure "intellectual property" rights over this idea. Instead, we're disclosing it in detail, and we're planning to release an open source Rails plugin to allow other developers to use the same approach very easily. We believe that Wesabe, and other Web 2.0/"harnessing collective intelligence" applications, will be more secure and more useful to their users if we have a very public discussion about how users' privacy can be protected when their data lives on servers. With all the news about similar applications taking hold, I think the topic is timely and important.
This is one of six major techniques I'll be talking about in the ETech presentation. If you're interested in this topic, I hope to see you there.
tags: nitty gritty tech
| comments: 1
submit:
Spamonomics 101
by Allison Randal | comments: 17
The biggest thing I've wondered about spam is: Why do the spammers even bother? They spend an enormous amount of effort, time, and (I expect) money to deliver huge quantities of mail to my inbox, which I then spend an enormous amount of effort, time, and (for some people) money to delete unseen and unread. How is this profitable for the spammers?
Last week I talked to Ken Simpson and Stas Bekman of MailChannels, a spam-fighting solution provider. The answer to my question is that the business of spamming is profitable, sometimes enormously so, but it's a volume business and the percentage of profit over that volume is quite small. Spammers are the door-to-door salesmen who knock on every door in the neighborhood to get one sale. Except the neighborhood is the entire planet, and the number of doors they can knock simultaneously is only limited by the cost of computing power. That cost is the key point in the economics of spam: spammers have to get out a high enough volume of spam that the small sliver of profit is greater than the cost of sending it.
These economics drive the patterns of spam we receive. Traditionally porn advertisements have the highest click-through rate, followed by pharmaceutical advertisements, though penny stock spam is gaining popularity. And the spam messages that aren't advertisements, scams, or virus attacks, but just random strings of text? Ken Simpson comments, "Those messages are sent by spammers to poison the spam filters. When someone receives a message full of gibberish and reports it as spam, the spam filters tune themselves to recognize gibberish as spam—which reduces their overall accuracy."
MailChannels has an interesting approach to the problem of spam. They use email traffic-shaping to identify the high-volume traffic patterns of spammers and then slow suspicious packets from those servers down to a crawl. In the short-term this affects the spam influx only on a local level: many spambots simply drop the connection to a slow mail server and move on to higher volume—and so more profitable—targets. (Like an animal taking a big bite out of a tasty-looking thistle, and then deciding it isn't worth the effort.) In the long-term, though, if enough mail servers employed similar tactics, the strategy has the potential to gradually disrupt the economics of spamming, making spam less profitable, or perhaps even unprofitable.
tags: nitty gritty tech
| comments: 17
submit:
Java SE 6 Released
by Allison Randal | comments: 4
Today Sun announced the release of Java SE version 6. Last week I spoke with Danny Coward (Java SE Platform Lead) about the features and significance of the new release.
From my perspective, the most interesting feature is the built-in support for what they're calling "script engines", that is, pluggable alternate syntaxes that run in an interpreted environment on top of the JVM. They're shipping one script engine with Java SE: JavaScript. The others (Ruby, Python, etc.) will be available for download. Sun promotes script engines as a feature for rapid development, and for parts of a system that change frequently, while still encouraging Java syntax for most code.
tags: nitty gritty tech
| comments: 4
submit:
Completely human free shopping
by Nikolaj Nyholm | comments: 4
The "Get & Go Express" convenience stores aren't stores so much as giant vending machines - only done smarter with new technology. All shopping takes place on the outer surface of the box like stores (roughly the foot print of two 40ft shipping containers) from an array of smaller vending machines. The store takes up less space because it doesn't have to make room for people between the shelves. It also cuts out salaries, which is, according to the article above, 40% of the cost of regular convenience stores.
Was this just inevitable or is it physical retail reaching out for the same economies of scale that online retail get naturally?
tags: nitty gritty tech
| comments: 4
submit:
Question about GMail referers
by Marc Hedlund | comments: 5
When I get referers from GMail messages on my new blog, they often contain a query string parameter labeled 'cat' with a cleartext, meaningful value in it. I've often been able to determine, from the 'cat' value, exactly who is talking about my site in email, and in one case, exactly what they thought of what we're doing! (Fortunately, the news was good.) In other cases, the information has been more general, but still meaningful (for instance, the name of a mailing list to which I sent a launch announcement).
I don't use GMail, so I'm not sure exactly what 'cat' is. Labels? Search terms? Any ideas from the GMail crowd? I also don't understand, at all, why I would be getting this information. I should not be seeing any information people are using to organize or search for their mail. (Yahoo Mail and Hotmail both have meaningless, to me, URLs.) Anyone know why this would happen? The Google GMail privacy faq says:
Google also takes several steps to guard the confidentiality of users' information by offering a number of industry-leading protections. Among other things, Gmail users benefit from: [...] Minimized "referrer" header information. When you click on links in messages, the web browser that loads contains a referrer header. When you click on links in Gmail, Google takes steps to eliminate this referrer header, preventing others from knowing that you clicked on a link from an email.
Hmmm....
Update: I can't believe I missed the opportunity to title this post, "The cat's out of the bag."
tags: nitty gritty tech
| comments: 5
submit:
Treemap on Rails
by Tim O'Reilly | @timoreilly | comments: 7
Andy Bruno, who developed the treemap code that we use for our Bookscan data visualizations, has created a new Rails implementation called acts_as_treemap, according to a report by Rob Orsini, author of The Rails Cookbook, who blogs on Rails-related topics at tupleshop.com.
If you're a fan of data visualization, as I am, you'll be excited both about getting your hands on Andy's Ruby treemap code and on Rob's clear description of how it works. And heck, Andy even applied it to an example data set that is fascinating in and of itself: SourceForge projects.
Here's the resulting treemap view:
This visualization uses the SourceForge project name for labeling each region of the treemap; the size of each region is be based on the number of downloads for the current month, and the color of each region conveys information about the rate of change in the number of downloads for each project. While the color-scheme is a bit different than we use for the book visualizations, green means up, red means down, and the paler colors are in-between. There's a bit of overlaid information about categories, but the treemap doesn't really organize the downloads by category like we do with book sales. But it's still pretty interesting to pick out the biggest downloads.
And what's more, if you like this kind of visualization, Andy and Rob have now made it easy for you to apply it to your own data sets.
tags: nitty gritty tech
| comments: 7
submit:
Metaprogramming in Ruby and Java
by Nat Torkington | @gnat | comments: 4
Metaprogramming is modifying your programming language to make it fit your problem domain. Lisp started it, Perl's source filters did something along those lines, but Ruby's got it in spades (caution: Why The Lucky Stiff content behind that last link). In the last few weeks I've been pounding through Ruby like Rush Limbaugh through an Everest of Viagra, and I wish I'd seen Glenn Vanderburg's session (PDF!) at OSCON last year about metaprogramming in Ruby. If you're a Java programmer, you should check out Howard Lewis Ship's session at OSCON on metaprogramming in Java.
Howard uses HiveMind's inversion of control container (Howard's the creator of HiveMind) and the Javassist bytecode library. Javassist is, frankly, terrifyingly powerful. I don't know what to be more amazed by: that Javassist exists at all, or that it's a subproject of the relentlessly Enterprisey (and therefore sober and reliable and not at all meant to be terrifying) JBoss. That's like the Betty Ford Clinic funding research into more efficient methods of producing crystal meth. Javassist reminds me of the alt.folklore.computers warstories about programmers who know too much about their execution environment.
If metaprogramming is too much like performing open brain surgery on yourself, you might be more comfortable with Howard's sane contribution to OSCON: Building Java Web Applications with Tapestry (Howard also created Tapesty). No meth and no autocortical cauterization required. Viagra optional.
tags: nitty gritty tech
| comments: 4
submit:
Vacuum Lines, Belt Tensioners, and Electric Motors
by Marc Hedlund | comments: 4
My friend Adam sent me this interesting note today:
I remember my dad talking about how the windshield wipers in his first car were driven by vacuum lines. The speed of the wipers varied with the load on the engine -- the wipers would come to complete stop if you were climbing a steep enough hill. Apparently this is how many items, like automatic antennas, were powered in cars through the 50's. Eventually alternators were introduced and the electric systems in cars became powerful and reliable enough that electric motors replaced vacuum as the primary power for accessories.
Looking inside a Toyota Prius, one might notice a similar change is taking place now that the electrical system has undergone another quantum leap.
tags: nitty gritty tech
| comments: 4
submit:
Intel OS X Boxes to Dual Boot Windows XP
by Nat Torkington | @gnat | comments: 1
Apple yesterday announced "Boot Camp", a system that lets you dual boot Windows XP and OS X on the Intel-based Macs. It's software to make the partitioning and installation easy. "Dual booting" is, for those of you who haven't struggled with your own Linux boxes, when you install both operating systems on a single hard disk and decide each time you restart the machine which OS to run. It's not the picture-in-picture of emulation (where I could run OS X but have a window that contained a Windows XP desktop and apps), but it's still better than nothing. Aah, happy Nat.
tags: nitty gritty tech
| comments: 1
submit:









