Felix Ritchie
- 7 August 2017
- STATISTICS PAPER SERIES - No. 24Details
- Abstract
- Social scientists increasingly expect to have access to detailed data for research purposes. As the level of detail increases, data providers worry about “spontaneous recognition”, the likelihood that a microdata user believes that he or she has accidentally identified one of the data subjects in the dataset, and may share that information. This concern, particularly in respect of microdata on businesses, leads to excessive restrictions on data use. We argue that spontaneous recognition presents no meaningful risk to confidentiality. The standard models of deliberate attack on the data cover re-identification risk to an acceptable standard under most current legislation. If spontaneous recognition did occur, the user is very unlikely to be in breach of any law or condition of access. Any breach would only occur as a result of further actions by the user to confirm or assert identity, and these should be seen as a managerial problem. Nevertheless, a consideration of spontaneous recognition does highlight some of the implicit assumptions made in data access decisions. It also shows the importance of the data provider’s culture and attitude. For data providers focused on users, spontaneous recognition is a useful check on whether all relevant risks have been addressed. For data providers primarily concerned with the risks of release, it provides a way to place insurmountable barriers in front of those wanting to increase data access. We present a case study on a business dataset to show how rejecting the concept of spontaneous recognition led to a substantial change in research outcomes.
- JEL Code
- C19 : Mathematical and Quantitative Methods→Econometric and Statistical Methods and Methodology: General→Other
C81 : Mathematical and Quantitative Methods→Data Collection and Data Estimation Methodology, Computer Programs→Methodology for Collecting, Estimating, and Organizing Microeconomic Data, Data Access