Data confidentiality is a relatively recent field of application of mathematical programming, and in previous years provided real-world, large and challenging optimization problems (both continuous and discrete). The research group is internationally recognized by the application of optimization methods to tabular data protection. International conferences such as EURO-XXV (Lithuania, July 2012) include a stream Statistical Data Confidentiality, coorganized by the research group. The research group recently suggested and worked on a perturbative approach named “controlled tabular adjustment” (CTA). This method formulates a linear or quadratic optimization problem (depending on the distance used), with binary variables, and often structured depending on the format of the tables. This approach has been implemented by the group for Eurostat using commercial state-of-the-art optimization solvers. Heuristic approximations developed by the group provided suboptimal solutions within a reasonable time for large mixed integer linear problems (of one million continuous and 50000 binary variables); some of the best commercial solvers could not even provide a feasible solution with these instances (some solvers, such as CPLEX, asked for some of these instances for benchmarking). Decent results have also been obtained solving CTA with Euclidean distances, which results in a mixed integer quadratic optimization problem. Up to now, perturbative data protection methods were applied to a predefined set of “static” tables. This allowed the (suboptimal) solution of the mixed integer problem within a time limit. In the recently started project “DwB. Data without Boundaries” (INFRA-2010-262608 VII Mark Program of the EU), we being one of the partners, a new paradigm is considered: the protection of tables obtained through on-line servers. Protection has to be performed in real-time in this new context. This means that binary variables have to be a priori fixed, and a fast solution to the resulting convex continuous optimization problem is required (guaranteeing feasibility). Interior-point methods are a good choice in this case because (i) they showed to be more efficient than simplex-like methods for these problems; (ii) the primal-dual path following algorithm, the most efficient approach in practice, may provide a fast suboptimal feasible solution with information of the optimality gap. Moreover, iterative methods for the Newton’s equation at each interiorpoint iteration can be used for a fast solution of very large instances, both for problems with and without structure. Good results have been obtained for three-dimensional tables solving problems of up to 40 million variables in reasonable times. These results should be extended to more general tables. The main objectives of the project are thus: 1) Development of interior-point methods for large scale problems, with and without structure. 2) Development and application of iterative methods (preconditioners, regularizations) in interior-point methods. 3) Application of interior-point methods to the continuous CTA problem in on-line servers. 4) Development of approaches for the (likely suboptimal) solution of large scale mixed integer linear problems. 5) Application to the mixed CTA problem in static environments.
Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica 2008-2011
Proyectos de investigación y acciones complementarias-MTM
Gobierno De España. Ministerio De Economía Y Competitividad, Mineco
Baena, D; Castro, J.; Gonzalez, J. Congreso Nacional de Estadística e Investigación Operativa y de las Jornadas de Estadística Pública p. 104 Presentation's date: 2015-05-28 Presentation of work at congresses