| Images on the Web present a major accessibility issue | | | | this responsibility off their hands. We set our goal to |
| for the visually impaired, mainly because the majority | | | | assign proper descriptions to arbitrary images. A |
| of them do not have proper captions. This paper | | | | "proper" description is correct if it makes sense with |
| addresses the problem of attaching proper | | | | respect to the image, and sufficient if it gives enough |
| explanatory text descriptions to arbitrary images on | | | | information about its contents. Rather than designing a |
| the Web. To this end, we introduce Phetch, an | | | | computer vision algorithm that generates natural |
| enjoyable computer game that collectsexplanatory | | | | language descriptions for arbitrary images (a feat still |
| descriptions of images. People play the game because | | | | far from attainable), we opt for harnessing humans. It is |
| it is fun, and as a side effect of game play wecollect | | | | common knowledge that humans have little difficulty in |
| valuable information. Given any image from the World | | | | describing the contents of images, although typically |
| Wide Web, Phetch can output a correct annotationfor | | | | they do not find this task particularly engaging. On the |
| it. The collected data can be applied towards | | | | other hand, many people would spend a considerable |
| significantly improving Web accessibility. In addition | | | | amount of time involved in an activity they consider |
| toimproving accessibility, Phetch is an example of a | | | | "fun." Thus, like the ESP Game, we achieve our goal |
| new class of games that provide entertainment in | | | | by working around the problem, and creating a fun |
| exchange forhuman processing power. In essence, we | | | | game that produces the data we aim to collect. |
| solve a typical computer vision problem with HCI tools | | | | We therefore introduce Phetch, a game which, as a |
| alone. | | | | side effect, generates explanatory sentences for |
| The Web is not built for the blind. Only a small fraction | | | | randomly chosen images. As with the ESP Game, we |
| of major corporate websites are fully accessible to | | | | show that if our game is played as much as other |
| the disabled, let alone those of smaller organizations or | | | | popular online games, we can assign captions to all |
| individuals . However, millions of blind people surf the | | | | images on the Web in a matter of months. Using the |
| Web every day, and Internet use by those with | | | | output of the game, we mention how to build a system |
| disabilities grows at twice the rate of the non-disabled . | | | | to improve the accessibility of the Web. |
| One of the major accessibility problems is the lack of | | | | Design of a Useful Game |
| descriptive captions for images. Visually impaired | | | | A traditional algorithm is a series of steps that may be |
| individuals commonly surf the Web using "screen | | | | taken to solve a problem. We consider Phetch as a |
| readers," programs that convert the text of a | | | | kind of algorithm. Analogous to one, Phetch has |
| webpage into synthesized speech. Although screen | | | | well-defined input and output: an arbitrary image from |
| readers are helpful, they cannot determine the | | | | the Web and its proper description, respectively. |
| contents of images on the Web that do not have | | | | Because it is designed as a game, Phetch needs to be |
| descriptive captions. Unfortunately the vast majority of | | | | proven enjoyable. We do so by showing usage |
| images are not accompanied by proper captions and | | | | statistics of a oneweek trial period. Because it is |
| therefore are inaccessible to the blind (as we show | | | | designed to collect a specific kind of data, Phetch's |
| below, less than 25% of the images on the Web have | | | | output needs to be proven both correct and sufficient. |
| an HTML ALT caption). Today, it is the responsibility of | | | | We prove this through a specifically designed |
| Web designers to caption images. We want to take | | | | experiment. |