Using Data-Driven Algorithms to Detect False Data Entries
Mustafa Naseem of the University of Michigan in the U.S. will apply machine-learning algorithms to identify potentially falsified digital vaccination records in Pakistan. Pakistan is one of only three remaining countries where polio is still endemic. Particularly rural healthcare facilities are struggling to provide enough vaccinations due to highly populous provinces and a lack of resources and staff, and there is a risk that records are falsified to save time or bias the results. They will first perform fieldwork to identify any putative recently falsified records by auditing 2,000 recorded vaccination events across 200 randomly-selected villages. These data will be used to generate an algorithm by using features such as record patterns that can then detect if a data-point is likely to be true or false. They will test their approach by auditing another 1,000 vaccination events that the algorithm predicted were falsified compared to 1,000 randomly selected vaccinations.