SPECIAL FEATURE
Computing Research that Changed the World: Reflections and Perspectives
March 25, 2009 | 8:45 am - 5:00 pm | Members' Room, Thomas Jefferson Building, Library of Congress
Human Computation
LUIS VON AHN - CMU
Slides - 1 MB
Download - 168 MB
Watch the Talk (11:50)
At the height of its construction, 44,733 people worked on the Panama Canal. The Great Pyramid of Giza required 50,000 workers and the Apollo Project 400,000. No matter what you put on this list, humanity's largest achievements have been accomplished with less than a few hundred thousand workers because it has been impossible to assemble (let alone pay!) more people to work together - until now. With the Internet, we can coordinate the efforts of billions of humans. If 400,000 people put a man on the moon, what can we do with 100 million? Luis von Ahn's research aims to develop a theory of how to assemble myriads of people and computers to work for the benefit of humanity, as well as to build computer systems that enable such massive collaboration. He is pioneering a new area of computing called human computation, which harnesses the power of humans and computers to solve problems that would be impossible for either to solve alone.
An example is his new reCAPTCHA project, which has enlisted hundreds of millions of people to help digitize books by solving CAPTCHAs on the Internet. CAPTCHAs are widespread security measures that von Ahn helped invent almost ten years ago. You've seen them: images of squiggly characters on the Web that you must type to obtain free email accounts and access to other sites. By asking humans to do a task that computers cannot, CAPTCHAs prevent automated programs from abusing online services. It is estimated that over 200 million CAPTCHAs are typed every day, each taking roughly ten seconds of human effort—that's 500,000 hours a day.
Von Ahn's new reCAPTCHA project channels this effort into a dual purpose: transcribing books. Physical books and other texts written before the computer age are currently being digitized en masse (e.g., by Google Books and the Internet Archive) to preserve human knowledge and make information more accessible. The pages are photographically scanned and the resulting bitmap images are transformed into text files using optical character recognition (OCR) software. This transformation into text enables the books to be indexed and searched. Unfortunately, OCR is not perfect. In older prints where the ink has faded, OCR cannot recognize about 30% of the words.
On the other hand, humans are far more accurate at transcribing such print. ReCAPTCHA takes advantage of this, making it possible for old print material to be transcribed, one word at a time, by people typing CAPTCHAs on the Internet. Whereas the original CAPTCHAs displayed images of random characters rendered by a computer, reCAPTCHA displays words taken from scanned texts that OCR could not decipher. The solutions entered by humans are then used to improve the digitization process.
To meet the goal of a CAPTCHA (differentiating humans from computers) the system must be able to verify the user's answer. To do that, reCAPTCHA gives the user two words, one for which the answer is not known and a second "control" word for which the answer is known. If the user correctly types the control word, the system assumes the user is human and gains confidence that the user also typed the other word correctly. To date, over 400 million people - 6% of humanity! - have helped digitize at least one word through reCAPTCHA, making it perhaps the largest example of massive collaboration.
There are other examples of human computation in action. At the University of Washington, biochemist David Baker and computer scientist Zoran Popovic have turned a particularly challenging aspect of protein structure calculation - a key to tackling many medical mysteries - into a web-based videogame. Humans and computers work together to solve this problem. The game is heavily instrumented, in the hope that human strategies can be incorporated into the computer algorithms. This sort of symbiosis - humans + computers working together to solve problems - can be expected to become widespread in the future.