PRIN PNRR Grant P20229PBZR (2023-2025) “When deep learning is not enough: creating hard benchmarks and physics-inspired algorithms for combinatorial optimization problems"

ABSTRACT

Combinatorial optimization problems (COPs) such as the traveling salesman problem and boolean satisfiability problems are of
fundamental importance in theoretical computer science and in countless industrial applications.
Recently, machine learning (ML) tools such as transformers and graph neural networks, have been applied to COPs in an effort to
overcome the limitations of classical solvers. Despite the outstanding results obtained in computer vision and natural language processing, ML models are not yet competitive with classical algorithms for COPs.
Part of the success of modern ML is due to the creation of large training datasets and to a benchmark-driven culture where each new
model is compared to competing ones on a variety of datasets. We argue that current datasets and benchmarks for COPs could be
largely improved; Moreover claims of effectiveness against classical solvers are often overstated. In some scenarios, simple greedy
algorithms return solutions of the same quality as ML models while being several orders of magnitude faster.
The statistical physics community has analyzed COPs for several decades, developing powerful and generic solvers such as
simulated annealing and parallel tempering, and identifying ensembles of synthetic problems whose hardness can be tuned by a
single parameter in a controlled way.
In this proposal we will look at COPs through the lens of statistical physics with the goal of i) creating synthetic datasets and hard
benchmarks for testing classical and learned algorithms ii) using physical knowledge to understand what hardness is for ML
algorithms.
We will highlight the hardest problems for existing classical algorithms and outline the current and future challenges for models. We
will provide datasets of various degrees of hardness, both supervised and unsupervised, that can be used for training and testing
models. We will open-source and make publicly available the datasets along with libraries containing the best algorithms designed
by different communities, in order to bootstrap future research and provide high-quality baselines.
In this proposal, we will also consider the problem of unbiased sampling solutions to constraint satisfaction problems. We will build
on recent advances in deep learning, mathematics, and statistical physics to develop two efficient and scalable algorithms, a
classical one and a learned one. Their common underpinning is in the fact that they will be both expressed as stochastic diffusive
processes, namely stochastic localization and denoising probabilistic diffusion (the one behind recent text-to-image models). We will
test the algorithms on the paradigmatic benchmarks developed during the project. Moreover, we will offer a characterization of their
strengths and limitations in terms of a geometrical description of the solution space sing tools from spin glass theory. The resulting
theoretical control of ML systems will improve their safety and explainability.