Entries tagged with “infrastructure” from O'Reilly Radar
Four short links: 22 October 2009
Cognitive Surplus, Scaling, Chinese Blogs, CS Education for Growth
by Nat Torkington | @gnat | comments: 1
- Eight Billion Minutes Spent on Facebook Daily -- you weren't using that cognitive surplus, were you?
- How We Made Github Fast -- high-level summary is that the new "fast, good, cheap--pick any two" is "fast, new, easy--pick any two". (via Simon Willison)
- Isaac Mao, China, 40M Blogs and Counting -- Today, there are 40 million bloggers in China and around 200 million blogs, according to Mao. Some blogs survive only a few days before being shut down by authorities. More than 80% of people in China don’t know that the internet is censored in their country. When riots broke out in Xinjiang province this year, the authorities shut down internet access for the whole region. No one could get online.
- Congress Endorses CS Education as Driver of Economic Growth -- compare to Economist's Optimism that tech firms will help kick-start economic recovery is overdone.
tags: blogging, china, economy, education, facebook, infrastructure, scale
| comments: 1
submit:
John Adams on Fixing Twitter: Improving the Performance and Scalability of the World's Most Popular Micro-blogging Site
by Jesse Robbins | @jesserobbins | comments: 2
Twitter is suffering outages today as they fend off a Denial of Service attack, and so I thought it would be helpful to post John Adams’ exceptional Velocity session about Operations at Twitter.
Good luck today John & team… I know it’s going to be a long day!
Update: Apparently Facebook & Livejournal have had similar attacks today. Rich Miller from Data Center Knowledge reminds us that this is just the latest in a series of major attacks.
tags: attacks, critical infrastructure, infrastructure, operations, performance, security, twitter, velocity, velocity09, velocityconf, video, web2.0, webops
| comments: 2
submit:
Four short links: 9 July 2009
by Nat Torkington | @gnat | comments: 1
- Ten Rules That Govern Groups -- valuable lessons for all who would create or use social software, each backed up with pointers to the social science study about that lesson. Groups breed competition: While co-operation within group members is generally not so much of a problem, co-operation between groups can be hellish. People may be individually co-operative, but once put in a 'them-and-us' situation, rapidly become remarkably adversarial. (via Mind Hacks)
- Yahoo! TrafficServer Proposal -- Yahoo! want to open source their TrafficServer product, an HTTP/1.1 caching proxy server. Alpha geeks who worked with it are excited at the prospect. It has a plugin architecture that means it can cache NNTP, RTSP, and other non-HTTP protocols.
- App Engine Conclusions -- I've reluctantly concluded that I don't like it. I want to like it, since it's a great poster child for Python. And there are some bright spots, like the dirt-simple integration with google accounts. But it's so very very primitive in so many ways. Not just the missing features, or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive.
- Microsoft Hohm -- Sign up with Hohm and we'll provide you with a home energy report and energy-saving recommendations tailored to your home. Wesabe for power at the moment, with interesting possibilities ahead should Microsoft partner with smartmetering utility companies the way Google Powermeter does. This is notable because this is a web app launched by Microsoft, with no connection to Windows or other Microsoft properties beyond requiring a "Live ID" to login. For commentary, see Microsoft Hohm Gets Green Light for Launch and PC Mag. (via Freaklabs)
tags: energy, google app engine, infrastructure, microsoft, opensource, powermeter, psychology, scalability, social software, yahoo
| comments: 1
submit:
Announcing: Spike Night at Velocity
by Scott Ruthfield | @scottru | comments: 5Guest blogger Scott Ruthfield is a Program Committee member of the O'Reilly Velocity: Web Performance & Operations Conference.
- Chris Bissell, Chief Software Architect at MySpace, and members of the MySpace team will demonstrate a massive, real increase in traffic, and will manage it on-stage. MySpace already deals with tens of thousands of hits each second - we can't throw enough traffic at them to cause any harm - so they'll cause their own harm and then show how they work through it.
- Ryan Nelson, Operations Director for MLB Advanced Media and MLB.com, will walk us through a combination of war stories and live traffic management to show what happens when millions of baseball fans all want to see what's happened after the commercial break at the exact same time. Between their very popular desktop apps and their newly-announced iPhone game streaming, the MLB is a true leader in technology innovation with a rabid fan base that goes well beyond the Web 2.0 echo chamber.
tags: cloud, infrastructure, operations, performance, scalability, scale, spikenight, velocity, velocity09, velocityconf, web2.0, webops
| comments: 5
submit:
Velocity 2009 - Big Ideas (early registration deadline)
by Jesse Robbins | @jesserobbins | comments: 7
My favorite interview question to ask candidates is: "What happens when you type www.(amazon|google|yahoo).com in your browser and press return?"
While the actual process of serving and rendering a page takes seconds to complete, describing it in real detail can take an hour. A good answer spans every part of the Internet from the client browser & operating system, DNS, through the network, to load balancers, servers, services, storage, down to the operating system & hardware, and all the way back again to the browser. It requires an understanding of TCP/IP, HTTP, & SSL deep enough to describe how connections are managed, how load-balancers work, and how certificates are exchanged and validated... and that's just the first request!
Web Performance & Operations is an emerging discipline which requires incredible breadth, focusing less on specific technologies and more on how the entire system works together. While people often specialize on particular components, great engineers always think of that component in relation to the whole. The best engineers are able to fly to the 50,000 foot view and see the entire system in motion and then zoom in to microscopic levels and examine the tiny movements of an individual part.
John Allspaw recently described this interconnectedness on his blog:
With websites, the introduction of change (for example, a bad database query) can affect (in a bad way) the entire system, not just the component(s) that saw the change. Adding handfuls of milliseconds to a query that’s made often, and you’re now holding page requests up longer. The same thing applies to optimizations as well. Break that [bad] query into two small fast ones, and watch how usage can change all over the system pretty quickly. Databases respond a bit faster, pages get built quicker, which means users click on more links, etc. This second-order effect of optimization is probably pretty familiar to those of us running sites of decent scale.
Working with these systems requires an understanding not only of the way technology interacts, but the way that people do as well. The structure, operation, and development of a website mirrors the organization that creates it, which is why so many people in WebOps focus on understanding and improving management culture & process.
Organizing a conference like Velocity is a wonderful challenge because it requires the same sort of thinking. We focus on the big concepts that everyone needs to know and then go deep into the technologies that change our understanding of the system. We find ways to share the unique experience that can only be gained by operating at scale. We make it safe to share as much of the "Secret Sauce" as we can.
Please join us at Velocity this year, we have an amazing lineup of speakers & participants. Early registration ends on Monday, May 11th at 11:59 PM Pacific. (Radar readers can use "vel09cmb" for an additional 15% discount.)
tags: cloud, data, infrastructure, operations, scale, velocity, velocity09, velocityconf, web, web2.0
| comments: 7
submit:
Four short links: 13 Apr 2009
by Nat Torkington | @gnat | comments: 1
Worms, sorting, languages, and infrastructure:
- Twitter XSS Attacks (Lynne Pope) -- several incarnations of a worm spread quickly across Twitter this weekend. Twitter profiles are generated by themes, whose parameters users can change. The user-supplied value for the colour was used directly in the CSS color field without filtering, which the original worm strain used to end the CSS and begin Javascript to put the worm into the profile of any Twitter user who viewed the infected profile. Infected users were made to tweet about the worm, with links that would infect anyone who viewed. The worm spread quickly through RTing one of the worm's messages, which claimed to link to instructions on fighting the worm. Later variants use background-color and background parameters. Initial variations downloaded Javascript from mikeyylolz.uuuq.com, since closed down by its hosting company. Later variants download the code from stalkdaily.com, the site that the initial variation spammed about. I wonder whether the 17-year old author of the variants will be able to pay his inevitable legal bills through Google click dollars? (also interesting: Sophos and bdonews)
- Visualising Sorting -- some beautiful and informative illustrations of how sorting algorithms work. (via @ajtowns)
- Art and Code: Obscure or Beautiful? -- In the presentation called “50 in 50″ you can see Guy Steele rap about APL and later in the video about spelling keywords backwards. The song about God wrote in Lisp code is also a part of the presentation. Among the languages mentioned are APL, Cobol, AP/I, Scheme, IPL-V, AED, Madcap, Piet, SNOBOL, ADA, Algol60, Intercal, Logo, Perligata, Shakespeare, Lucid, Occam, HQ9+, MUMBLE, Rake, Perl and of course Lisp. It kicks in at about 3m20s and is rather a post-modern presentation. (via
- Experiences Deploying Large-Scale Infrastructure in Amazon EC2 -- As an aside, I've been very impressed with the reliability of EC2. Like many other people, I didn't know what to expect, but I've been pleasantly surprised. Very rarely does an EC2 instance fail. In fact I haven't yet seen a total failure, only some instances that were marked as 'deteriorated'. When this happens, you usually get a heads-up via email, and you have a few days to migrate your instance, or launch a similar one and terminate the defective one. (via Simon Willison)
tags: amazon, cloud, infrastructure, security, twitter
| comments: 1
submit:
AT&T Fiber cuts remind us: Location is a Basket too!
by Jesse Robbins | @jesserobbins | comments: 3
The fiber cuts affecting much of the San Francisco Bay Area this week are similar to the outages in the Middle East last year (radar post), although far more limited in scope and impact. What I said last year still holds true and is repeated below:
From an operations perspective these kinds of outages are nothing new, and underscore why having "many eggs in few baskets" is such a problem. I believe we will see similar incidents when we have the first multi-datacenter failures where multiple providers lose significant parts of their infrastructure in a single geographic area.
Remember: Don't put all your eggs in one basket... and Location is a basket too!
To really understand the issue, I recommend Neal Stephenson's incredible (and lengthy) Wired article from 1996 entitled "Mother Earth Mother Board":
It's also worth mentioning the outages to multiple service providers hosted in a single colocation facility when the FBI sized all the equipment in the facility, the big outage at 365 Main from two years ago, and many others (see: Radar posts & comprehensive coverage at Data Center Knowledge).[...] It sometimes seems as though every force of nature, every flaw in the human character, and every biological organism on the planet is engaged in a competition to see which can sever the most cables. The Museum of Submarine Telegraphy in Porthcurno, England, has a display of wrecked cables bracketed to a slab of wood. Each is labeled with its cause of failure, some of which sound dramatic, some cryptic, some both: trawler maul, spewed core, intermittent disconnection, strained core, teredo worms, crab's nest, perished core, fish bite, even "spliced by Italians." The teredo worm is like a science fiction creature, a bivalve with a rasp-edged shell that it uses like a buzz saw to cut through wood - or through submarine cables. Cable companies learned the hard way, early on, that it likes to eat gutta-percha, and subsequent cables received a helical wrapping of copper tape to stop it.
[...] There is also the obvious threat of sabotage by a hostile government, but, surprisingly, this almost never happens. When cypherpunk Doug Barnes was researching his Caribbean project, he spent some time looking into this, because it was exactly the kind of threat he was worried about in the case of a data haven. Somewhat to his own surprise and relief, he concluded that it simply wasn't going to happen. "Cutting a submarine cable," Barnes says, "is like starting a nuclear war. It's easy to do, the results are devastating, and as soon as one country does it, all of the others will retaliate."
As the capacity of optical fibers climbs, so does the economic damage caused when the cable is severed. FLAG makes its money by selling capacity to long-distance carriers, who turn around and resell it to end users at rates that are increasingly determined by what the market will bear. If FLAG gets chopped, no calls get through. The carriers' phone calls get routed to FLAG's competitors (other cables or satellites), and FLAG loses the revenue represented by those calls until the cable is repaired. The amount of revenue it loses is a function of how many calls the cable is physically capable of carrying, how close to capacity the cable is running, and what prices the market will bear for calls on the broken cable segment. In other words, a break between Dubai and Bombay might cost FLAG more in revenue loss than a break between Korea and Japan if calls between Dubai and Bombay cost more.
The rule of thumb for calculating revenue loss works like this: for every penny per minute that the long distance market will bear on a particular route, the loss of revenue, should FLAG be severed on that route, is about $3,000 a minute. So if calls on that route are a dime a minute, the damage is $30,000 a minute, and if calls are a dollar a minute, the damage is almost a third of a million dollars for every minute the cable is down. Upcoming advances in fiber bandwidth may push this figure, for some cables, past the million-dollar-a-minute mark. [Link]
tags: at&t, cloud, failure, failure happens, fiber, infrastructure, operations, outages, velocity, velocity09, web infrastructure, web operations, web2.0, webops, worries
| comments: 3
submit:
It's Really Just a Series of Tubes
by Jesse Robbins | @jesserobbins | comments: 12
Molly Wright Steenson hit the Ignite jackpot at Etech this year with her explanation of the steam powered network of pneumatic tubes of the 1800s. If you're someone that, like me, has a somewhat obsessive relationship with Internet Infrastructure, you must watch this talk.
tags: etech, ignite, ignite show, infrastructure, internet, steam, steampunk, tubes, velocity, velocity09, velocityconf, web2.0
| comments: 12
submit:
Four short links: 30 Jan 2009
by Nat Torkington | @gnat | comments: 1
Two serious links and two fun today, thanks to Waxy and BoingBoing:
- EveryBlock Business Model Brainstorming -- Adrian Holovaty's project was funded by a Knight Foundation grant that's about to run out. The software will be open sourced but he's inviting suggestions of business models that would enable the project team to continue working on it full-time. Having used and created open source to show newspaper companies how to do journalism online, will he now work on an open source way for them to make money?
- Infrastructure for Modern Web Sites -- Leonard Lin lays out what's required in systems and platforms for modern web sites. Perl succeeded in part because its data types were the things you had to deal with (files, text, sockets). Will the next gen of tools (the 'Rails killer' if you will) offer users, taggable objects, social objects, etc. as primitives?
- Academic Earth -- takes open courseware from different universities and integrates them into a coherent UI. Transcripts. Slurp.
- Love2D -- a Lua-based 2D game engine. I'm looking at it to see whether it works for me as the next step for 9 year-old kids interesting in programming games in my computer club.
tags: adrianholovaty, education, games, infrastructure, journalism, lua, open source, programming, velocity
| comments: 1
submit:
Service Monitoring Dashboards are mandatory for production services!
by Jesse Robbins | @jesserobbins | comments: 6
Google App Engine went down earlier today. GAE is still a developer preview release, and currently lacks a public monitoring dashboard. Unfortunately this means that many people either found out from their app and/or admin consoles being unavailable or from Mike Arrington's post on TechCrunch.
Google has a strong Web Operations culture, and there are numerous internal monitoring tools in use across the company, along with a smaller set available to customers. It's suprising that Google launched a developer platform without providing something beyond an email group, although they are by no means the first to do so.

Service Monitoring Dashboards are mandatory for production services and platforms!
- If you launch a platform that people pay you money for, you need to have a real time service dashboard. Ideally this should be decoupled from the rest of your infrastructure.
- Don't rely on platforms that lack service monitoring dashboards for production.
Many companies are initially reluctant to provide this kind of monitoring to the public, and only do so in reaction to an outage. However, it seems that every company that offers such a dashboard uses it as a source of competitive advantage.
The best example of this is trust.salesforce.com which they launched after series of outages in 2006. Amazon (eventually) launched a status dashboard for AWS, and added RSS feeds for specific services which I think is pretty cool.
Javier Soltero at Hyperic points out
1. The reports of service outages arrive long after anyone who depends on the services can possibly do anything to mitigate their effect.
2. The services themselves seem incapable of providing any visibility into the circumstances that might lead to future outages.[...]Even TechCrunch points out that the Google Apps blog doesn’t even mention the outage. Other clouds rely on blogs such as this one, this one, or maybe even this one (from our good friends at Mosso). These are all places where outages can be discussed, but not the right means for people to find out whether it their application that crashed, or the cloud that it depends on.
(Updated:Niall Kennedy pointed out that GAE is still a preview release, and I agree that my original wording was wrong. My intent is to emphasize the importance of providing a public service dashboard and so I've edited accordingly.)
tags: failure happens, google app engine, infrastructure, internet policy, monitoring, operations, outages, platform plays, platforms, saas, velocity, web 2.0, web services, webops
| comments: 6
submit:
Automated Infrastructure Podcast on IT Conversations
by Jesse Robbins | @jesserobbins | comments: 0
Adam Jacob and I did an IT Conversations podcast with Phil Windley last week, which I really enjoyed. We started with a summary of Adam's excellent Web2.0 Expo session, covered the phases of startup growth using virtual infrastructures like EC2 and 3tera, and discussed how Puppet shifts us to "Infrastructure as Code". We even got into the challenges and opportunities of Sarbanes-Oxley compliance for startups.
Adam also talked a lot about iClassify, his open source systems management tool. He announced iClassify at the Web2.0 Expo, and will be discussing it in-depth at Velocity next month.
You can download the podcast here.
tags: 3tera, ec2, infrastructure, operations, s3, sarbanes-oxley, sarbox, sox, startups, velocity, velocity08, web 2.0, web 2.0 expo, webops
| comments: 0
submit:
Structure and Velocity
by Jesse Robbins | @jesserobbins | comments: 4
Several people have asked me about the differences between Om Malik's Structure conference and our Velocity Web Performance & Operations conference. Velocity is on June 23 & 24th at the SFO Mariott, and Structure follows on June 25th in San Francisco.
The conferences are complementary: Structure discusses what is changing in internet infrastructure, and Velocity teaches how to make that change happen.
I've been recommending that anyone considering Structure make sure their engineering teams are going to Velocity. For many technical leaders I think there is value in attending both, and I definitely plan on doing so.
The knowledge and skills learned at Velocity can be put to immediate use and will have significant impact on your business. The reason for this is simple:
Faster, scalable, and highly available websites serve more pages to more customers in the same amount of time.
That's why we've worked hard to make Velocity the best resource for engineers to learn how to build and operate at web scale. Here are a few examples:
Adam Jacob will give a step-by-step overview of Building an Automated Infrastructure, and then Luke Kanies will follow up with an in-depth session on Puppet. This is the exact combination I used to explain how effective operations is a huge competitive advantage:
Luiz Barroso will describe Google's approach to energy-efficient datacenter design and management. Applying these lessons can ultimately save millions of dollars, increase your operational agility, and decrease your environmental footprint.
Mandi Walls will teach how actionable logging can mean the difference between a 20-minute outage and a 2-hour outage while esoteric error codes are deciphered or developers are contacted to investigate.
Eric Lawrence, Program Manager for Internet Explorer, and Mike Connor, lead developer for Mozilla Firefox will explain how to optimize page performance for their respective browsers. We'll also have demos of leading performance testing tools: HTTPwatch, Fiddler, AOL PageTest, and Firebug.
John Allspaw from Flickr will be be giving a talk about Capacity Management. John's way of explaining both the problem and the opportunity is wonderfully straightforward:

You can check out the rest of the program and register on the Velocity site. (Hint: You can use the code "vel08js" for a 20% discount.) I'll be posting frequently as we add speakers and events. I hope to see you at Velocity!
tags: conferences, gigaom, infrastructure, om, operations, platform plays, structure, structure08, velocity, velocity08, web 2.0, webops
| comments: 4
submit:
What is Web Operations?
by Jesse Robbins | @jesserobbins | comments: 0
Theo Schlossnagle wrote a brilliant summary of one of the biggest challenges we discussed at the Velocity Summit in January:

What is this Velocity Summit thing? It was a bunch of web architects from highly trafficked sites sitting around talkin' smack. It was operated in Foo style. However, one thing that made me really appreciate this meet-up was the lack of self-importance displayed by attendees. Everyone was just there to talk -- not to make people understand how much they knew. We were talking about The O'Reilly Velocity Web Performance and Operations Conference: what it should be and why.
Two things that I walked away with were (1) a realization of the lack of a career path for people who do what we do (no standard titles, no standard roles and responsibilities and certainly a lack of sex appeal) and (2) a clear lack of terminology for the technology requirements that are so common in these environments. Terminology is easy, in my opinion -- you just argue until someone wins. Of course, arguing is a hobby of mine, so I have bias. On the other hand, defining a career path that is an industry accepted path is hard.
The term Web Operations was used a lot during this event. While it isn't awful, I really don't like this term. The hard part is that the captains, superstars, or heroes in these roles are multidisciplinary experts. They have a deep understanding of networks, routing, switching, firewalls, load-balancing, high availability, disaster recovery, TCP & UDP services, NOC management, hardware specifications, several different flavors of UNIX, several web server technologies, caching technologies, several databases, storage infrastructure, cryptography, algorithms, trending and capacity planning. The issue: how can we expect to find good candidates that have fluency in all of those technologies? In the traditional enterprise, you have architects which are broad and shallow and their team of experts which are focused and deep. However, in the expectation is that your "web operations" engineer be both broad and deep: fix your gigabit switch, optimize your MySQL database and guide the overall architecture design to meet scalability requirements.
I struggle with this. Not everyone can be a superstar. More importantly, no one can really start as a superstar. If we use an apprentice model (which is common in industries without institutional support) we limit the total number of able workers in this field. So, how do we (re)define the requirements for a junior web operations person? [read more]
One of the reasons I'm excited about Velocity is that we're increasing the pool of great operations people. We're getting inquiries from companies interested in sending groups of 30-40 people, and I expect more as we confirm speakers and sessions. You can secure a spot now and get a $350 early registration discount.
tags: foo camp, hiring, infrastructure, omniti, operations, platform plays, startups, velocity, velocity08, web 2.0, webops, webperformance
| comments: 0
submit:
Amazon improves EC2 (by embracing failure)
by Jesse Robbins | @jesserobbins | comments: 5
Amazon just announced two big improvements to EC2:
- Multiple Locations
Amazon EC2 now provides the ability to place instances in multiple locations. Amazon EC2 locations are composed of regions and Availability Zones. Regions are geographically dispersed and will be in separate geographic areas or countries. Currently, Amazon EC2 exposes only a single region. Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same region. Regions consist of one or more Availability Zones. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location.
- Elastic IP Addresses
Elastic IP addresses are static IP addresses designed for dynamic cloud computing. An Elastic IP address is associated with your account not a particular instance, and you control that address until you choose to explicitly release it. Unlike traditional static IP addresses, however, Elastic IP addresses allow you to mask instance or Availability Zone failures by programmatically remapping your public IP addresses to any instance in your account. Rather than waiting on a data technician to reconfigure or replace your host, or waiting for DNS to propagate to all of your customers, Amazon EC2 enables you to engineer around problems with your instance or software by quickly remapping your Elastic IP address to a replacement instance.
Datacenters and geographic regions are Single Points of Failure (SPOF) too. Failure Happens, and it's far better (and cheaper) to build services that are resilient to failure than to try to prevent them from happening. This is a big step in the right direction.
Update: RightScale posted an excellent overview of how this works.
tags: amazon, aws, ec2, failure happens, infrastructure, internet policy, mysql conference, operations, platform plays, velocity08
| comments: 5
submit:
Steve Souders asks: "How green is your web page?"
by Jesse Robbins | @jesserobbins | comments: 4
Steve Souders, my Velocity conference Co-Chair and author of High Performance Websites, gave me permission to repost this great analysis:
How green is your web page?
Writing faster web pages is great for your users, which in turn is great for you and your company. But it’s better for everyone else on the planet, too.
![]()
Intrigued by an article on Radar about co2stats.com, I looked at my web performance best practices from the perspective of power consumption and CO2 emissions. YSlow grades web pages according to how well they follow these best practices. What if it could convert those grades into kilowatt-hours and pounds of CO2?
Let’s look at one performance rule on one site. Wikipedia is one of the top ten sites in the world (#9 according to Alexa). I love Wikipedia. I use it almost every day. Unfortunately, it has thirteen images in the front page that don’t have a far future Expires header (Rule 3). Every time someone revisits this page the browser has to make thirteen HTTP requests to the Wikipedia server to check if these images are still usable, even though these images haven’t changed in over seven months on average. A better way to handle this would be for Wikipedia to put a version number in the image’s URL and change the version number whenever the image changes. Doing this would allow them to tell the browser to cache the image for a year or more (using a far future Expires or Cache-Control header). Not only would this make the page load faster, it would also help the environment. Let’s try to estimate how much.
- Let’s assume Wikipedia does 100 million page views/day. (I’ve seen estimates that are over 200 million/day.)
- Assume 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page views/day.
- Assume 10%, no, 5% of those are for the home page. We’re down to 4M page views/day for the home page with a primed cache. Each of those contains 13 HTTP requests to validate the images, for a total of 52M image validation requests/day.
- Assume one web server can handle 100 of these requests/second, or 8.6M requests/day. That’s six web servers running full tilt year-round to handle this traffic.
- Assume a fully loaded server uses 100W. Six servers, year-round, consume 5,000 kilowatt-hours per year or approximately 500-1000 pounds of CO2 emissions.
I think this is a conservative estimate, but there are a lot of assumptions above. And six servers doesn’t sound like a lot. 5,000 kilowatt-hours is a drop in the bucket if you look at data center power consumption. But this was just one rule on one page on one site. Think about the impact of not gzipping, not minifying JavaScript, wasteful redirects, and bloated images. If we extrapolate this across all the performance rules across all sites the numbers are much bigger.
Make your pages faster. It’s good for your users, good for you, and good for Mother Earth.
-Steve
Steve has a SXSW Bookreading on Saturday @11 AM, and will be at the O'Reilly booth on Sunday from 3:30-4:30. Stop by and say hello!
tags: co2, energy, greentech, hard numbers, infrastructure, operations, performance, stevesouders, velocity, velocity08, web 2.0, webops, webperformance
| comments: 4
submit:
Operations is a competitive advantage... (Secret Sauce for Startups!)
by Jesse Robbins | @jesserobbins | comments: 13
My lunchtime conversations at the Summit centered around Operations as a competitive advantage (and occasionally a "strategic weapon"). This advantage is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally.
Many people think of Operations as "a bunch of boring work... which I'm hoping someone else is doing." It often takes less time to set up a development environment than the tools and infrastructure needed to test, deploy, monitor, and scale new software. The survival of most projects depend on working software, at least initially, and so if there is money or time many people will spend it on development. Unfortunately, people say they will "figure that ops stuff out soon", but what they mean is "when we're totally screwed!!!" It doesn't have to be that way...
The example above is the tale of two Web 2.0 startups scaling to 20 systems during their first three months. The first team starts writing software and installing systems as they go, waiting to deal with the "ops stuff" until they have an "ops person". The second team dedicates someone to infrastructure for the first few weeks and ramps up from there. They won't need to hire an "ops person" for a long time and can focus on building great technology.
In my experience it takes about 80 hours to bootstrap a startup. This generally means installing and configuring an automated infrastructure management system (puppet), version control system (subversion), continuous build and test (frequently cruisecontrol.rb), software deployment (capistrano), monitoring (currently evaluating Hyperic, Zenoss, and Groundwork). Once this is done the "install time" is reduced to nearly zero and requires no specialized knowledge. This is the first ingredient in "Operations Secret Sauce".
This kind of scalability becomes really interesting when you find yourself suddenly popular, as iLike did when it launched its Facebook app and had to scale up fast (Radar):
In our first 20 hours of opening doors we had 50,000 users sign up, and it is only accelerating. (10,000 users joined in the first 12 hrs. 10,000 more users in the next 3 hrs. 30,000 more users in the next 5 hrs!!)
We started the system not knowing what to expect, with only 2 servers, but ready with backup. Facebook's rabid userbase chewed up our 2 servers almost instantly. We doubled our capacity to catch up. And then we doubled it again. And again. And again. Oh crap - we ran out of servers!! Although iLike.com has a very healthy level of Web traffic, and even though about half of all the servers in our datacenter were sitting unused, idle, as backup capacity, we are now completely maxed out.
We just emailed everybody we knjow across over a dozen Bay Area startups, corporations, and venture firms in a desperate plea to find spare servers so we can triple our capacity for the continued onslaught. Tomorrow we are picking up over 100 servers from different companies to have them installed just to handle the weekend's traffic. (For those who responded to our late night pleas, thank you!)
Not being able to acquire hardware fast enough is by far a better problem than not being able to install it. iLike is something of a poster-child for puppet.
Are any VCs out there including effective operations in their due-dilligence? Are startups incorporating this in their pitch? (Amazon seems to be pushing this as part of the AWS "Start-Up Project" if you're using S3 and EC2)
Update: Luke points out Adam Jacob's post about implementing Puppet for iLike. (Disclosure: I'm discussing collaboration with Adam's company, HJK solutions.)
Update #2: John Allspaw of Yahoo/Flickr fame has great commentary on procurement and capacity management challenges for successful startups.
tags: automation, infrastructure, operations, startups, velocity, velocityconf, web 2.0, webops
| comments: 13
submit:






