Entries tagged with “sql” from O'Reilly Radar
Counting Unique Users in Real-time with Streaming Databases
by Ben Lorica | @dliman | comments: 6As the web increasingly becomes real-time, marketers and publishers need analytic tools that can produce real-time reports. As an example, the basic task of calculating the number of unique users is typically done in batch mode (e.g. daily) and in many cases using a random sample from relevant log files. If unique user counts can be accurately computed in real-time, publishers and marketers can mount A/B tests or referral analysis to dynamically adjust their campaigns.
In a previous post I described SQL databases designed to handle data streams. In their latest release, Truviso announced technology that allows companies to track unique users in real-time. Truviso uses the same basic idea I described in my earlier post:
Recognizing that "data is moving until it gets stored", the idea behind many real-time analytic engines is to start applying the same analytic techniques to moving (streams) and static (stored) data.Truviso uses (compressed) bitmaps and set theory to compute the number of unique customers in real-time. In the process they are able to handle the standard SQL queries associated with these types of problems: counting the number of distinct users, for any given set of demographic filters. Bitmaps are built as data streams into the system and use the same underlying technology that allows Truviso to handle massive data sets from high-traffic web sites.
Once companies can do simple counts and averages in real-time, the next step is to use real-time information for more sophisticated analyses. Truviso has customers using their system for "on-the-fly predictive modeling".
The other main enhancement in this release is Truviso's move towards parallel processing. Their new execution engine processes runs or blocks of data in parallel in multi-core systems or multi-node environments. Using Truviso's parallel execution engine is straightforward on a single multi-core server, but on a multi-node cluster it may require considerable attention to configuration.
[For my previous posts on real-time analytic tools see here and here.]
tags: a/b testing, analytics, big data, real-time, sensors, sql, streams
| comments: 6
submit:
Four short links: 14 August 2009
EPub FTW, SQL Horror, Computer Vision Explained, and A Massive Dump of Twitter Stats
by Nat Torkington | @gnat | comments: 1
- Page2Pub -- harvest wiki content and turn it into EPub and PDF. See also Sony dropping its proprietary format and moving to EPub. Open standards rock. (via oreillylabs on Twitter)
- SQL Pie Chart -- an ASCII pie chart, drawn by SQL code. Horrifying and yet inspiring. Compare to PostgreSQL code to produce ASCII Mandelbrot set. (via jdub on Twitter and Simon Willison)
- How SudokuGrab Works -- the computer vision techniques behind an iPhone app that solves Sudoku puzzles that you take a photo of. Well explained! These CV techniques are an essential part of the sensor web. (via blackbeltjones on Delicious)
- Twitter by the Numbers -- massive dump of charts and stats on Twitter. I love that there's a section devoted to social media marketers, the Internet's head lice. (via Kevin Marks on Twitter)
tags: book related, computer vision, ebooks, fun, iphone app, publishing, sql, statistics, twitter
| comments: 1
submit:
Big Data: Technologies and Techniques for Large-Scale Data
by Ben Lorica | @dliman | comments: 3Our belief that proficiency in managing and analyzing large amounts of data distinguishes market leading companies, led to a recent report designed to help users understand the different large-scale data management techniques. Our report on Big Data Technologies was the result of interviews with over thirty experts, including research scientists, (open-source) hackers, vendors, data analysts, and entrepreneurs. Rather than endorse specific vendors and technologies, we provide a framework to help readers navigate the wide variety of options available. (NOTE: If you're interested in purchasing the report as a single-issue of Release 2.0, we can provide you with a DISCOUNT CODE. Contact information is at the end of the video clip below.)
I recently sat down with my co-author, Roger Magoulas (Director of Research at O'Reilly), who agreed talk about our report and Big Data in general. Roger begins by speaking passionately of the importance of data management and analysis. He proceeds to highlight what we believe to be the key technology dimensions for evaluating data management solutions. The video ends with a glimpse into future technologies and general advice to organizations interested in improving their proficiency in handling data.
The full program is available in four extended clips:
[ Head over to O'Reilly Media's Youtube channel for other interesting videos. ]


