The Story

Data collection is central to any project concerned with human rights monitoring, epidemiology, population censuses, opinion polls, economic studies and many more. No project can be declared successful if it lacks reliable statistics to at least justify its existence and to demonstrate its eventual consequences. Therefore project managers will often have to plan multiple tasks of gathering information. Examples include:

Until recently each of those tasks was carried out using pens and paper. Survey reports were sent back for collation, which took time and was often unreliable. With the advent of ICTs, things changed drastically as reports could be sent over the wire or the air, making data collection much more efficient and resulting in an increased likelihood of a project being relevant and making a positive impact.

These days, front line workers can be seen using various devices like laptops, PDAs, notebooks (the digital kind), GPS units or cameras to carry out their surveying tasks. The data can be sent back to the project’s central system in many ways: from bringing the device back to base and downloading its contents using a cable, to wirelessly sending the data through high-speed internet links. Program managers can design and conduct votes or polls among peers or interested parties using specific software or web-based services. For virtually any data collection task one can think of within the context of a given project, a tool can be found to facilitate the job, taking advantage of whatever technology is available to the project.

But a multiplicity of tools can quickly become a logistical nightmare. Trying to collate data collected from different sources or different devices can be so complex that the old pen-and-paper method may well be considered again. The problem is that, as great as all existing tools are, each is designed to work in its own way. A web-based polling tool will often only work on desktop browsers, or a crisis mapping application will only let data be input by SMS. This will force surveyors to use multiple tools in order to cover all available channels (Web, SMS, voice, etc.) and will lead to multiple data streams that have to be painstakingly merged. Furthermore, no one knows how much money is invested by foundations, governments, NGOs and corporations to re-invent the survey tool wheel.

However, with the advent of the multimodal Web, a unified data collection tool can be envisaged. Because voice applications, SMS applications, mobile websites and mobile applications are all designed to use Web technology, it is conceivable that a single data collection framework can be designed to work across all of them. For instance, a health questionnaire to be filled out by a widespread population could be made available via a Web site (accessible from the local internet café), via an SMS application (accessible from any in-range mobile phone), as well as via a voice application (accessible to illiterate people). The results would automatically be collected across all access methods and be made available quickly and painlessly. Because the Web is the technology that is available to the largest number of people world-wide through its multiple access methods, its multiple language support and its accessibility features, it is the best platform to design a data collection tool upon.

The Web Foundation is currently carrying out an investigation of the topic, studying the many existing tools and their shortcomings, gathering a community of specialists in data collection (users as well as implementers), and collecting requirements for a better and more unified approach to the problem. We believe that gathering the community around common principles and standards will lead to the creation of modular tools which solves the current shortcomings, mainly by being accessible through a wide range of modalities including speech applications, mobile phone messaging (SMS, USSD, MMS), sites accessible across browsers, and software applications running online or offline.

Data collection has three main components: the definition of the data to be collected (eg, temperatures, voting options or locations), the collection itself and the analysis of the data. The scope of the tool will specifically be the first two. The analysis, generally depending on the context of the project itself, will be left to more specific tools, except the most common types which we will support.

Work Plan

The Foundation will drive the implementation of this tool, making it available as free and open-source software. We expect it to become the dominant data collection application, as it will have been designed by an exhaustive group of experts in the field. To achieve that goal, we plan to:

The Hewlett foundation has offered to support the first three steps in this plan, which ran for eight months until July 2011. After successful completion we are now seeking further resources, in order to carry out the remaining tasks.