The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individualwhen disseminating data, and it is one of the main concerns of national statistical agencies. OperationsResearch (OR) techniques were widely used in the past for the protection of tabular data, but not formicrodata (i.e., files of individuals and attributes). This work presents (as far as we know, for the firsttime) an application of OR techniques for the microaggregation problem, which ...
The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individualwhen disseminating data, and it is one of the main concerns of national statistical agencies. OperationsResearch (OR) techniques were widely used in the past for the protection of tabular data, but not formicrodata (i.e., files of individuals and attributes). This work presents (as far as we know, for the firsttime) an application of OR techniques for the microaggregation problem, which is considered one thebest methods for microdata protection and it is known to be NP-hard.The new heuristic approach is based on a column generation scheme and, unlike previous (primal)heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Com-putational results on real data typically used in the literature show that solutions with small gaps areoften achieved and that dramatic improvements are obtained with respect to the most popular heuristicsin the literature.
Citation
Gentile, C.; Spagnolo-Arrizabalaga, E.; Castro, J. An algorithm for the microaggregation problem using column generation. 2020.