Entries tagged with “statistics” from O'Reilly Radar

Thu

Nov 5
2009

Nat Torkington

Four short links: 5 November 2009

Heat Maps in R, EC2 Blackhat Tricks, Snickersome Unicode, and Decoding Statistics

by Nat Torkington@gnatcomments: 0

  1. Heat Maps in R -- We used financial data here because it's easier to access than the airline data, but it's actually a pretty interesting way of looking at a financial time series. Weekend and holiday effects are a bit more obvious, and it's a bit like being able to see the daily, weekly, monthly and yearly closes all at once (by scanning your eye over the calendar in different directions). Includes source code. (via migurski on Delicious)
  2. BlackHat and EC2 -- Theft of resources is the red-headed step-child of attack classes and doesn't get much attention, but on cloud platforms where resources are shared amongst many users these attacks can have a very real impact. With this in mind, we wanted to show how EC2 was vulnerable to a number of resource theft attacks and the videos below demonstrate three separate attacks against EC2 that permit an attacker to boot up massive numbers of machines, steal computing time/bandwidth from other users and steal paid-for AMIs. (via straup on Delicious)
  3. Funny Characters in Unicode -- I never get tired of the wacky stuff in Unicode. I love the thought of a Unicode committee somewhere arguing passionately about the number of buttons on the snowman .... (via Hacker News)
  4. Statistics to English Translation -- The terms sensitivity and specificity generally refer to diagnostic or screening procedures, such as an HIV or allergy tests. The sensitivity of a test is its true positive rate; the specificity is its true negative rate, although it can be more intuitive to think of specificity as the complement of the false positive rate. This matters. Bandying around numbers with misleading labels, or misinterpreting numbers that have a precise and defined meaning, does not further understanding. (Said 78.4% of statisticians, with a 20% confidence factor probability of false positives)

 

tags: amazon, cloud, ec2, language, R, security, statistics, visualizationcomments: 0
submit: Reddit Digg stumbleupon   

 

Fri

Aug 14
2009

Nat Torkington

Four short links: 14 August 2009

EPub FTW, SQL Horror, Computer Vision Explained, and A Massive Dump of Twitter Stats

by Nat Torkington@gnatcomments: 1

  1. Page2Pub -- harvest wiki content and turn it into EPub and PDF. See also Sony dropping its proprietary format and moving to EPub. Open standards rock. (via oreillylabs on Twitter)
  2. SQL Pie Chart -- an ASCII pie chart, drawn by SQL code. Horrifying and yet inspiring. Compare to PostgreSQL code to produce ASCII Mandelbrot set. (via jdub on Twitter and Simon Willison)
  3. How SudokuGrab Works -- the computer vision techniques behind an iPhone app that solves Sudoku puzzles that you take a photo of. Well explained! These CV techniques are an essential part of the sensor web. (via blackbeltjones on Delicious)
  4. Twitter by the Numbers -- massive dump of charts and stats on Twitter. I love that there's a section devoted to social media marketers, the Internet's head lice. (via Kevin Marks on Twitter)

tags: book related, computer vision, ebooks, fun, iphone app, publishing, sql, statistics, twittercomments: 1
submit: Reddit Digg stumbleupon   

 

Thu

Aug 13
2009

Nat Torkington

Four short links: 13 August 2009

by Nat Torkington@gnatcomments: 1

  1. Under the Hood of App Inventor for Android -- regular readers know I'm a big fan of visual programming language Scratch, and apparently Google are too. They've got twelve university classes testing App Inventor for Android, a visual connect-the-bits programming environment for Android. University classes probably because one of the co-creators is Hal Abelson, coauthor of the definitive programming textbook. Also found online: the PR-type announcement, a Professor using it, and @AppInv (nothing juicy on Twitter--it looks like might be a channel for tech support for the students). (via Hacker News)
  2. Google Web Optimizer Case Study (Four Hour Work Week) -- GWO manages A/B tests for you, with a lot of statistical analysis. It's a fascinating read to see how these should be done. Every equation may halve the readership of a book, but every table of numbers and relevancy analysis doubles the value of a post like this. (via Hacker News)
  3. Opening Up The BBC's Natural History Archive -- the BBC are releasing programme segments and a whole lot of metadata around their programming. Audio and video segmented, tagged with DBpedia terms, and aggregated into a URI structure based on natural history concepts: species, habitats, adaptations, etc. Gorgeous!
  4. Yahoo! Term Extraction API to Close -- Internally, both services share a backend data source that is closing down, so the publicly-facing YDN services will be closing as well. I think it's the most significant casualty of Y! outsourcing search to MSFT, as this API was used by a lot of projects. (via Simon Willison)

tags: android, apis, bbc, data, google, history, programming, semantic web, statistics, web, yahoocomments: 1
submit: Reddit Digg stumbleupon   

 

Tue

Jul 14
2009

James Turner

Making Government Transparent Using R

Danese Cooper thinks it will be an important tool in Open Gov

by James Turnercomments: 7

You may also download this file. Running time: 26:58

Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.

With Open Source now considered an accepted part of the software industry, some people are starting to wonder if we can't bring the same degree of openness and innovation into government. Danese Cooper, who is actively involved in the open source community through her work with the Open Source Initiative and Apache, as well as working as an R wonk for Revolution Computing, would love to see the government become more open. Part of that openness is being able to access and interpret the mass of data that the government collects, something Cooper thinks R would be a great tool for. She'll be talking about R and Open Government at OSCON, the O'Reilly Open Source Convention.

James Turner: Why don't you start by describing where you came from, and you're involved in, and what your interests are?

Danese Cooper: Okay. I'm Danese Cooper. I serve on the board of the Open Source Initiative. I have been serving for the last eight years. And I'm also currently employed by Revolution Computing, which is a start-up focusing on an open source language called R, as in the letter R, that is very useful for analytics and statistical analysis. I'm also an Apache member. And I also serve on an advisory board for Mozilla.

James Turner: One of the two panels you're going to be speaking on at OSCON is on open source and open government. If you could talk a little bit about what interests you about open government and also what open government means to you.

Danese Cooper: Sure. Well, along with a lot of open source people, I got interested in the Obama campaign and in helping President Obama get elected. And part of why he was so compelling was that the vision of how Washington needed to change is pretty close to the way that we think about working collaboratively in open source. The night that he was elected, there was a great little clip on CNET of a Republican commentator actually explaining open source as exactly what I just said. It was a really brilliant little two-minute clip. He pointed at The Cathedral and the Bazaar, that canonical document about how open source works. And he said, "Microsoft is the cathedral. It's their way or the highway. And the bazaar is a bunch of people working together grassroots to collaboratively build the things that they need. And so Obama's basically asking for the government to become open source, and the problem is Washington isn't really like that right now."

So anyway, that's the transformation that has to happen in order for government to really be transparent. To me, open source government is transparent government. There's been an awful lot of shenanigans in recent political history, like the last decade has been pretty crazy in terms of things happening that couldn't be traced back to any source. Even just the way we vote and the way that voting is managed, and the fact that the software that runs the machines that we vote on is not open source so it can't be inspected. And nobody knows quite what it does. There are all of these stories of weird updates to the software that happened right before major elections in states where there are strange results. Transparency, in the same way that it helped the software industry transform, could really help the government transform. So that's what I'm talking about. There's a bunch of other people on that panel. My good friend, Brian Behlendorf, and I co-proposed it. And he's actually taken the next step. He helped found Apache. And he's run off to Washington to work on projects that are interesting to the Obama government to try to figure out how to help them to more open source solutions. And he'll be talking about his progress on that panel. So I think it's a pretty exciting panel.

(continue reading)

tags: interviews, open government, open source, oscon, r, statisticscomments: 7
submit: Reddit Digg stumbleupon   

 

Tue

Jul 7
2009

Nat Torkington

Four short links: 7 July 2009

Motivation, R, Games, and Open Source Medicine

by Nat Torkington@gnatcomments: 1

  1. Announcing your plans makes you less motivated to accomplish them -- Tests done since 1933 show that people who talk about their intentions are less likely to make them happen. Announcing your plans to others satisfies your self-identity just enough that you’re less motivated to do the hard work needed. I have noticed this myself. It must be balanced against the other finding that public commitment increases probability of followthrough, which might work in sales but seems to fail miserably in getting me to do anything productive. (via migurski on Delicious)
  2. Rseek -- search engine for info on R. Necessary because of the non-unique project name. (via Benjamin Mako Hill)
  3. Treasure World (Offworld) -- Nintendo DS game that turns wifi spots into collectible treasure. You have to explore the real world as you play the game, another of these games that mix the online and offline worlds. (via waxy)
  4. 50 Successful Open Source Projects That Are Changing Medicine -- notice the large number of electronic health record (EHR) suites. What are the chances of any of them getting a slice of Obama's EHR money that the ex-RedHatters behind The Axial Project are going for? (via timoreilly on Twitter)

tags: brain, games, gaming, healthcare, medicine, open source, psychology, r, statisticscomments: 1
submit: Reddit Digg stumbleupon   

 

Thu

May 28
2009

Nat Torkington

Four short links: 28 May 2009

Mobile Viruses, Open Data, Twitter Bookmarks, Sexy Geek Skills

by Nat Torkington@gnatcomments: 0

  1. Viral Epidemics Poised to go Mobile -- Albert-Laszlo Barabasi (author of Linked: How Everything Is Connected To Everything Else) modelled mobile phone virus epidemiology for NSF and concluded that (in accordance with experience) no single OS has critical mass for viruses to break-out. I wonder: will Android or iPhone reach that point first? (via ACM TechNews)
  2. Socrata -- formerly "Blist", the first of what will undoubtedly be many startups "refocusing" attempting to profit from the new US administration's fondness for Web 2.0. The business model, however, is "we'll offer your data to citizens in a useful form" and it seems to me that this is a responsibility that Government should embrace rather than outsource. (via Jesse)
  3. Tag This -- tweet @tagthis with a link and keywords to post the link as bookmark in your Delicious/Magnolia account.
  4. Three Sexy Skills of Geeks -- statistics, data munging, and visualization. I'm reading Visualizing Data right now and expect the universe to bury me in bootie before the day is out. "Processing: it's cheaper than couple's therapy and you can post pictures of it on the Internet without being fired." (via mattb on Twitter)

tags: delicious, gov2.0, government, mobile, open data, security, statistics, twitter, visualizationcomments: 0
submit: Reddit Digg stumbleupon   

 

Mon

May 4
2009

Ben Lorica

Big Data: SSD's, R, and Linked Data Streams

by Ben Lorica@dlimancomments: 4

The Solid State Storage Revolution: If you haven't seen it, I recommend you watch Andy Bechtolsheim's keynote at the recent Mysqlconf. We covered SSD's in our just published report on Big Data management technologies. Since then, we've gotten additional signals from our network of alpha geeks and our interest in them remains high.

R and Linked Data Streams: I had a chance to visit with Dataspora founder and blogger Mike Driscoll, an enthusiastic advocate for the use of the open source statistical computing language, R. After founding and leading online retailer CustomInk.com, Mike went back to grad school and earned a doctorate in Bioinformatics. He has applied data analysis and programming in a variety of domains including retail, biotech, academia, and government projects.

Having been an avid user of S/S-Plus in the 1990's, I seamlessly switched over to R in the early 2000's. To this day, I consider the S/S-Plus user manuals to be the best reference and introductory books on the R programming language. (Mike wholeheartedly agrees.) R has been popular in the statistics community for many years, but I've been noticing that its visualization and analytic capabilities are attracting interest from developers. Moreover, recent efforts by the R community to improve its ability to scale large data sets (see brief update from Jay Emerson), will strengthen R's place in the Big Data stack.

(continue reading)

tags: analytics, big data, r, ssd, statistics, videocomments: 4
submit: Reddit Digg stumbleupon