UPDATE: The Vancouver Police Department issued a news release yesterday advising that there have been some discrepancies in how sexual assault data were collected and reported in a recently released Report on Sexual Assault Incidents, 2008-2010, resulting in an unusually high increase in these crimes reported from 2009 to 2010 (18%). The data were reported based on the date the incidents were reported rather than when they actually occurred. When revised (including the removal of unfounded incidents), the increase in sexual assaults over the year was 9.6%. This is still a concern and I hope that the increase doesn’t get lost in the media hype.
While this reporting mishap may be mildly embarrassing for the VPD, I think this is a perfect example of how easy it is to accidentally misrepresent statistics and I applaud the VPD for their quick reaction in going public with a simple and clear explanation. That they called a press release illustrates that the VPD practices what they preach: Integrity and Transparency.
With respect to the issue of how the data were collected for the analysis, I have to admit as an analyst that it is easy to forget the meaning behind the numbers. When your days are spent combining, cajoling and dragging data through numerous systems and processes (transformation, cleaning, addressing missing data) before you even begin the analysis piece, you stop looking at the real world meaning of the numbers and focus on accuracy.
As for the VPD’s report, I suspect that a lot of people worked on pieces of the retrieval and analysis, reviewed or edited the report, and unfortunately no one noticed the meaning of the numbers for the public until it was too late. It does happen. Had this report been created as a workload measurement report it would have been fine. The number of crimes reported in a given time period could help them manage officer deployment.
This misstep illustrates the need for a comprehensive plan prior to the collection of data, beginning with a detailed operationalization of the problem to be addressed and the indicators required to measure the issue. Even a report such as this, which is simply a presentation of descriptive results, and does not attempt to infer any statistically significant causation, could be helped along with some more advanced analysis.
I know that some poor analyst has had many a sleepless night over this and I hope that he or she knows that we've all been there and can sympathize. Mistakes get made, and sometimes many eyes don’t help. So, buck up. Move on. Don't beat yourself up too badly.
BTW: Sexual assaults are included in “Assaults” in the VPD data I’ve used in previous posts. I believe these incidents are the number of assaults reported on the date the report was made by officers, not the date when the assault occurred. See VPD Notes.