Entries tagged with “mechanical turk” from O'Reilly Radar

Tue

Oct 13
2009

Ben Lorica

Mechanical Turk app on the iPhone Provides Work for Refugees

by Ben Lorica@dlimancomments: 7

Mechanical Turk service provider CrowdFlower and microwork non-profit Samasource have teamed up to make their services available to iPhone users. Users of CrowdFlower's mechanical turk platform can now opt to send their tasks to iPhone users. Previously, CrowdFlower users could choose between Amazon mechanical turks or CrowdFlower's stable of turks.

The Give Work iPhone app takes tasks (created by real companies) and sends it to iPhone users who volunteer to complete them. Meanwhile, workers in a Kenyan refugee camp perform the same tasks using CrowdFlower's regular web interface. In essence, Kenyan refugees work to increase the accuracy of the results provided by the army of volunteer iPhone mechanical turks. In a previous post on Mechanical Turk Best Practices, I highlighted recent research that suggested that for a large set of tasks, the aggregate work of 4-6 turks compare favorably with a single (domain) expert.

pathint

The payment for tasks sent to CrowdFlower's iPhone app goes entirely to the workers in the Kenyan refugee camp. In addition, Samasource has negotiated with money transfer services, so the payment goes through with zero transaction costs.

The turks in the refugee camps are recent graduates of Samasource's computer training program. Rather than sitting idly while they wait to be employed, they earn money performing simple computer tasks for real companies. On the other hand, Give Work app users volunteer to perform simple tasks on their iPhone knowing that refugees in Africa are benefiting. CrowdFlower founder Lukas Biewald notes that their work with Samasource opens up their platform to companies who want to tap into and help micro-workers in developing countries.

There are other mechanical turk services that employ workers in developing countries (see for example txteagle). What distinguishes CrowdFlower is an innovative web interface that lets companies easily upload/define their projects and choose the set of turks they want to use: Amazon, CrowdFlower, and now iPhone users + Kenyan refugees. CrowdFlower has many other features worth noting including analytics and reporting, tools to increase accuracy, and a services team that works with companies interested in custom solutions.

When I talk to companies about using mechanical turks, many are still unaware†† of what they even are, and most don't quite know how to use them. In our work, we routinely use turks to build machine-learning training sets, and for tasks that require the levels of accuracy that algorithms are unable to deliver. Thanks to companies like CrowdFlower, it's now really easy for companies to dip their toes, and experiment with integrating mechanical turks. And with the launch of their Give Work iPhone app, companies can simultaneously opt to provide income to workers in developing countries.

(†) We are users of CrowdFlower's mechanical turk platform.
(††) Actually nervous laughter is a common response!

tags: africa, developing world, iphone_app, mechanical turkcomments: 7
submit: Reddit Digg stumbleupon   

 

Thu

Jun 11
2009

Ben Lorica

Mechanical Turk Best Practices

by Ben Lorica@dlimancomments: 8

Last night, Dolores Labs hosted what was billed as the first-ever Mechanical Turk meetup, and I was fortunate enough to have been able to squeeze into what turned out to be a great series of presentations. While Amazon was the pioneer and remains the largest provider in the space, other services like Dolores Labs and Nathan Eagle's txteagle have emerged to expand the pool of users and turks.

In the past, we've turned to Dolores Labs when we needed (machine-learning) training sets and were unable to quickly find reliable ones. To increase the quality of the output we receive from turks, we try to get multiple turks to perform an individual task and aggregate their work into a single answer. (We jokingly refer to this as the wisdom of micro-crowds.) Working on problems quite different from the ones we tackle, the first set of speakers presented research results confirming that this form of aggregation actually works. Rion Snow of Stanford's AI Lab presented results that suggest that for a large set of tasks, the aggregate work of 4-6 turks compare favorably to the work of a single (domain) expert. Working primarily in the area of NLP and computational linguistics, Bob Carpenter of alias-i presented similar results when evaluating turk-generated against gold standard training sets. (It's hard enough when turks disagree, but as Bob Carpenter highlighted, disagreements among experts makes it difficult to arrive at a gold standard.) Bob has found that in certain situations an iterative approach works best ("code-a-little", "learn-a-little") and tools that allow you to start suggesting "answers" to a new set of turks would help immensely. Coincidentally, one of the speakers presented a toolkit that allows users to do just that: Greg Little's TurKit is a JavaScript API for running iterative tasks in mechanical turk.

Another set of speakers talked about the emergence of mechanical turks as a research tool. Social scientists Aaron Shaw and John Horton spoke of favorably of their experience using turks for research experiments in economics and paired surveys. Among other things, they've conducted studies on the turk labor market by testing demand for tasks of varying difficulty (something Bob Carpenter also talked about), and by evaluating demand for follow-on tasks at lower wages. Alexander Sorokin of UIUC, presented work on using turks to annotate training sets for computer vision and robotics. For those interested in using turks to annotate images, Alex has a toolkit ready to go.

For most users of mechanical turk (us included), it has become an API call that fits smoothly within their workflow. (Or as someone at the meetup wryly suggested, turk is a Remote Person Call.) The last pair of speakers, Lilly Irani and Six Silberman, reminded us that behind mechanical turk lies thousands of workers ("the crowd in the cloud") working without (health care) benefits, oftentimes at extremely low hourly wages. Irani and Silberman suggested that rather than abstracting mechanical turk services as mere API calls, users should start thinking of the plight of the turks ("Mechanical Turk Bill of Rights") behind the service. As a first step they have a released a Firefox plugin that aims to narrow the information assymetry between turks (those performing tasks) and requesters (those posting tasks). While requesters can see ratings for turks, requesters aren't rated: Turkopticon lets turks rate requesters. They need more turks to download and start using Turkopticon, so if you know any mechanical turks please enourage them do so.

(†) According to Amazon representatives in the audience, a majority of turks are in the U.S. That may change in the future, once Amazon is able to get approval for other payment systems. Because of the possibility of money-laundering, services like AMT are subject to strict KYC controls.

tags: big data, machine learning, mechanical turk, meetupcomments: 8
submit: Reddit Digg stumbleupon   

 

Sat

Dec 27
2008

Tim O'Reilly

Google, WalMart, and MyBarackObama.com: The Power of the Real Time Enterprise

by Tim O'Reilly@timoreillycomments: 22

What do Google, WalMart, and MyBarackObama.com have in common, besides their extraordinary success? They are organizations that are infused with IT in such a way that it leads to a qualitative change in their entire business.

I get frustrated when I see people highlighting use of social media--blogging, wikis, twitter, customer feedback systems like Dell IdeaStorm or MyStarbucksIdea--as if they were exemplars of what has come to be called "Enterprise 2.0."

As I said in my keynote at the Web 2.0 Expo NY (and in a followup radar post), WalMart is a better example of Enterprise 2.0 than any of these more trendy examples of user contribution systems. If Google's key innovation with PageRank was to recognize that a link was a vote, which could be counted and measured to get better search results, so too, WalMart recognized early on that a purchase was a vote. Each company built real-time information systems to capture and respond to that vote. WalMart built a supply chain in which goods are automatically re-ordered as they go out the door, with algorithms based on rate of sale controlling the reorders. Google built a better search engine, in which pages that were "better linked" were given priority over the ones produced by pure keyword matches. They went on to build real-time systems to measure what John Battelle called the database of intentions, as expressed by people's queries and subsequent clickstream data, as well as an ad auction system that prices ads in real-time based on the predicted likelihood of the ad being clicked on.

I came to see just how closely MyBarackObama.com emulated these ideas of the real-time enterprise in accounts of the Houdini project, a bold program in which poll watchers eliminated the names from voters who had actually made it to the polling station from the "get out the vote" call lists:

While the hot line was too overwhelmed to be of much use, the source said the program itself still proved a smashing success....the campaign was able to clean 1.6 million voters from the call lists they distributed to canvassers that afternoon, making those lists 25 percent shorter on average.
While the infrastructure for data reporting broke down under the pressure of the election, the general trend is clear here: competitive advantage comes from capturing data more quickly, and building systems to respond automatically to that data.

Consider MyBarackObama.com as a kind of vast machine, with humans as extensions of the programmatic brain: volunteers log in to get their get-out-the-vote call lists. They place their calls, then use the web to report back their results. Those results modify the call lists for the next volunteer. At the other end, the Houdini volunteers are taking note of who is actually coming out to vote, allowing the system to dispatch additional attention to hot spots, for example where there is an undervote compared to the campaign's projections. Meanwhile, the pruned call lists make the volunteers more effective. Inside the machine, programmers are tuning the algorithms, while top campaign staffers are making key decisions to adjust the resource mix.

Now put these three examples, Google, WalMart, and MyBarackObama together, and ask yourself what they tell you about the future of business, military operations, or any large organization.

Sensing, processing, and responding (based on pre-built models of what matters, "the database of expectations," so to speak) is arguably the hallmark of living things. We're now starting to build computers that work the same way. And we're building enterprises around this new kind of sense-and-respond computing infrastructure. In this sense, you can argue that Microsoft's term "Live Software" is the best name yet for the kind of software-infused enterprise we're building.

It's essential to recognize that each of these systems is a hybrid human-machine system, in which human actions are part of the computational loop. Back in 1998, when I was trying to understand just how people were using Perl and other scripting languages on the web, I came to recognize that web applications, unlike desktop applications, still have the programmers inside them. Perl was called "the duct tape of the internet" precisely because it was used for programming that was only expected to last a short time; the programmers were still there, constantly tweaking the application. (I first began using the image of "the Mechanical Turk" in my talks about this aspect of web applications in 2003.)

What became clear in the ensuing decade is that humans are not just part of the programming, but also sensors and actuators for computers. Our aggregate behavior is measured, monitored, and becomes feedback that improves the overall intelligence of the system. That is why I've said that the defining characteristic of Web 2.0 applications is that they "harness collective intelligence."

Aside: I seem to have lost the battle to define Web 2.0 as" the use of the network as platform to build systems that get better the more people use them. Perhaps its the lure of the obvious: companies and products that harness explicit user contribution are easier to recognize than those that pursue the more subtle and difficult task of harnessing implicit contribution. Or perhaps it's the persistent gravitational tug of the idea that the heart of Web 2.0 is ad-supported business models; therefore, enterprise features that look like those of well-known companies featuring user contribution and ad-supported business models must by definition also be "2.0." For me, the far more profound and powerful systems come from harnessing both explicit and implicit human contribution.

Again, consider MyBarackObama.com. It definitely harnessed explicit contribution, providing a platform for volunteers to organize and host local calling parties, to blog, or perform other campaign activities. But ultimately, Obama's ground game--old fashioned precinct-level organizing, amped up to a new level by an army of distributed volunteers armed with mobile phones and coordinated via a web application--was the key to his victory. The "explicit" social media elements of MyBarackObama.com paled in impact compared to the development of a next generation electronic nervous system, in which volunteers were trained, deployed, and managed by a web application who used them, in John Sean McMullen's memorable phrase, as "souls in the great machine."

tags: amazon, barack obama, google, mechanical turk, walmart, web 2.0comments: 22
submit: Reddit Digg stumbleupon