Entries tagged with “data mining” from O'Reilly Radar

Fri

Nov 13
2009

Nat Torkington

Four short links: 13 November 2009

Open Source Design, Interesting NoSQL Use, Copyright Documentary, Location Intelligence

by Nat Torkington@gnatcomments: 1

  1. Open Source Enters The World of Atoms -- an academic statistical analysis of open design. We indicated that, in open design communities, tangible objects can be developed in very similar fashion to software; one could even say that people treat a design as source code to a physical object and change the object via changing the source.
  2. Why I Like Redis (Simon Willison) -- coherent explanation of why Simon likes and uses a particular nosql system. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that’s already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don’t have to think for more than a few seconds about how I’m going to represent my data.
  3. © kiwiright (Vimeo) -- short documentary about copyright, made to raise awareness of the issues in New Zealand. (just as applicable to the rest of the world)
  4. Your Movements Speak For Themselves (Jeff Jonas) -- Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. If the device is GPS-enabled and you’re using a location-based service your location is accurate to somewhere between 10 and 30 meters. Using Wi-Fi? It is accurate below10 meters. A thought-provoking roundup of the information leakage with modern locative systems. (via TomC on Twitter)

tags: collective intelligence, copyright, data mining, design, geo, location, nosql, open sourcecomments: 1
submit: Reddit Digg stumbleupon   

 

Thu

Nov 12
2009

Nat Torkington

Four short links: 12 November 2009

CRM on Rails, Data Mining on Hadoop, Disappointing Keynotes, The Teapot Effect

by Nat Torkington@gnatcomments: 1

  1. Fat Free CRM -- open source (Affero GPL) Ruby on Rails CRM system.
  2. Bixo -- open source data mining toolkit that runs as a series of pipes on top of Hadoop. Built on Cascading workflow system for Hadoop that hides MapReduce. (via kdnuggets)
  3. Andy Kessler's Keynote at Defrag Stank (Pete Warden) -- I'm sorry to hear it, because I loved Andy's book How We Got Here about the intersecting histories of economics, finance, and technology. Read the book instead of reading about the disappointing keynote.
  4. The Teapot Effect -- the thing I love about geeks is how their passion causes them to explore, ruthlessly and quantitatively, the everyday phenomena that the rest of us take for granted. Such as dribbling teapots: “Previous studies have shown that dribbling is the result of flow separation where the layer of fluid closest to the boundary becomes detached from it. When that happens, the fluid flows smoothly over the lip. But as the flow rate decreases, the boundary layer re-attaches to the surface causing dribbling.” Read the post and the research it talks about to learn how to prevent Dribbling Teapot Syndrome ....

tags: CRM, data mining, economics, finance, hadoop, history, open source, rails, research, sciencecomments: 1
submit: Reddit Digg stumbleupon   

 

Mon

Oct 26
2009

Nat Torkington

Four short links: 26 October 2009

Data Exploration, Evidence-Based Coding, API to the English Language, Dual Licensing

by Nat Torkington@gnatcomments: 4

  1. Toiling in the Data Mines -- Tom Armitage describes the process that Berg calls "material exploration". Programmers very rarely talk about what their work feels like to do, and that's a shame. Material explorations are something I've really only done since I've joined BERG, and both times have felt very similar - in that they were very, very different to writing production code for an understood product. They demand code to be used as a sculpting tool, rather than as an engineering material, and I wanted to explain the knock-on effects of that: not just in terms of what I do, and the kind of code that's appropriate for that, but also in terms of how I feel as I work on these explorations. Even if the section on the code itself feels foreign, I hope that the explanation of what it feels like is understandable.
  2. Bits of Evidence -- Slides for a talk, "What we actually know about software development and why we believe it is true". (via Simon Willison)
  3. Wordnik API -- definitions, frequencies, examples APIs. See the announcement from the Web 2.0 Summit.
  4. The Peculiar Institution of Dual Licensing -- Brian Aker eloquently describes why he feels that dual licensing is anti-open source. Brian obviously has considerable experience informing this opinion--his years as Director of Technology for MySQL.

tags: apis, business, data mining, language, mysql, open source, programming, sciencecomments: 4
submit: Reddit Digg stumbleupon   

 

Tue

Jun 23
2009

Nat Torkington

Four short links: 23 June 2009

by Nat Torkington@gnatcomments: 2

  1. Easter Eggs for Real Life (Neil Gaiman) -- ok, I know easter eggs are already part of real life, but this is still cool. Gaiman recommends a restaurant run by a friend, and the friend has set up a special phrase that to mention to the server, at which point something good and special will happen for them to eat or drink. Think of it as a restaurant Easter Egg. I love language, I love Gaiman's books, I love surprises, and I love that here Gaiman's using the digital sense of Easter Egg (surprise hidden in a program) rather than the analog sense (because there's no searching involved).
  2. ASCAP Wants To Be Paid When Your Phone Rings (EFF) -- what the title suggests. You are lost in a twisty maze of rights, all policed by vampires. From ASCAP's point of view, this is a legitimate claim. From anyone else's point of view, it's ridiculous.
  3. Tooling Up The Body (MInd Hacks) -- using tools has lots of interesting effects on our perception is the general gist of an intriguing study that provides further evidence for the theory that the brain treats tools as temporary body parts. We talk about using the Internet as our "offsite brain", so it tickles me to learn that the brain treats tools as offsite body parts.
  4. Email Patterns Can Predict Impending Doom (New Scientist) -- when Enron was about to collapse, email patterns changed: the number of active email cliques, defined as groups in which every member has had direct email contact with every other member, jumped from 100 to almost 800 around a month before the December 2001 collapse. Messages were also increasingly exchanged within these groups and not shared with other employees. Menezes thinks he and Collingsworth may have identified a characteristic change that occurs as stress builds within a company: employees start talking directly to people they feel comfortable with, and stop sharing information more widely. (via BoingBoing)

tags: brain, collective intelligence, copyright, data mining, emailcomments: 2
submit: Reddit Digg stumbleupon   

 

Thu

May 14
2009

Andy Oram

Credit card company data mining makes us all instances of a type

by Andy Oram@praxagoracomments: 19

The New York Times has recently published one of their in-depth, riveting descriptions of how credit card companies use everything they can learn about us. Any detail can be meaningful: what time of day you buy things, or the quality of the objects you choose.

The way credit collectors use psychology reminds me of CIA interogators (without the physical aspects of pressure). In fact, they're probably more effective than CIA interogators because they stick to the basic insight that kindness elicits more cooperation than threats.

So who gave them permission to use our purchase information against us? What law could possibly address this kind of power play?

There's another disturbing aspect to the data mining: it treats us all as examples of a pattern rather than as individuals. Almost eleven years I wrote an article criticizing this trend. The New York Times article shows how much we've lost from what we consider essential to our identity--our individuality.

Update

This article drew six comments in a few hours--thoughtful and valid comments, which have made me set down attitudes into words. Now we can look put the attitudes under a light and see what makes sense, or doesn't, to readers.

The article contained two levels of criticism: a criticism of data mining to build up composite pictures of individuals, and a criticism of the use of data accumulated from routine transactions to manipulate those individuals.

Building up a composite picture

Of course, a company that reaches out and does any marketing has to target people. Someone who bought the O'Reilly book Even Faster Web Sites (sorry about the plug) might appreciate a notification about our upcoming Velocity conference, which was founded by the book's author and covers the same topics. Someone who bought a book on a totally different subject wouldn't want or respond to the notification. O'Reilly does this kind of targeting, like most companies, and until everybody participates in truly frictionless information exchanges, companies will have to continue doing it.

Aggregated information is useful too. Organizations that mine public data for evidence of health epidemics can identify likely sites and investigate them further. The data mining is understood to provide an approximation of the truth.

Where I see a problem is when the increasing quantity of constant information refinement shades over into a qualitative change. There's a difference between a campaign targeted to 500 likely customers and a campaign targeted to one.

At some point the composite portrait starts to look so much like a person that corporate decision makers can begin to believe it is the person. The portrait becomes like a replicant, or like the statues that came to life in myths from Pygmalion to Pinocchio.

Joseph Weizenbaum, creator of the classic Eliza program, was shocked to see that people treated his "doctor" program like a human interviewer. There were plenty of computer programs that prompted the user with questions and gave varied responses based on the answers, but none had imitated a person so realistically.

Nowadays, nobody would be drawn in by Eliza. And perhaps companies and customers alike will get used to composite portraits. Perhaps the companies will send their composite to each of us and we can update it to make it more accurate. That will be a very different world, though.

Now we can turn to the next level, manipulation.

Manipulation

I've read numerous accounts in biographies and articles about interrogations, and talked to a couple people who have undergone interrogations. I haven't been on either side of an interrogation, but I've been deposed for a court case. All these situations remind me vividly of the exchanges reported in the New York Times article.

In these exchanges, a well-armed caller is laying, like a silkscreen, a composite over the real person and trying to manipulate the result. It's not exactly a case of asymmetric knowledge (because at least in theory, a customer could also learn a lot about a company and use that knowledge to manipulate it). It's more insidious: an employee carrying out a precise initiative on behalf of a company--a machine in the service of a goal--approaching the targeted customer in an informal manner that brings out a natural, human, empathetic reaction in customer.

Interrogation always takes place in the context of an open or implied threat--there would be no reason for making the contact otherwise--but as I mentioned in the article, the interrogation goes best when the threat is raised only rarely and strategically. A feigned sympathy and heart-to-heart engagement is the path to the most desired outcome.

In a sense, now, the employee has become the replicant. He is using a careful counterfeit of human responses to induce the behavior he or she is paid to induce. This is ethical when dealing with a criminal, although even then US law limits (based on the Fourth Amendment) the gathering of relevant information by the interrogator beforehand. I question how ethical it is in a business situation, especially when exploiting information given by the customer for entirely different purposes.

tags: bill collectors, credit cards, data mining, data retention, mining, privacycomments: 19
submit: Reddit Digg stumbleupon