# Benchmark problems

To help choose between the different minimizers, we have made some curated problems available to use with FitBenchmarking. It is also straightforward to add custom data sets to the benchmark, if that is more appropriate; see Problem Definition Files for specifics of how to add additional problems in a supported file format.

We supply some standard nonlinear least-squares test problems in the form of the NIST nonlinear regression set and the relevant problems from the CUTEst problem set, together with some real-world data sets that have been extracted from Mantid and SASView usage examples and system tests. We’ve made it possible to extend this list by following the steps in Adding Fitting Problem Definition Types.

Each of the test problems contain:

a data set consisting of points \((x_i, y_i)\) (with optional errors on \(y_i\), \(\sigma_i\));

a definition of the fitting function, \(f({\boldsymbol{\beta}};x)\); and

(at least) one set of initial values for the function parameters \({\boldsymbol{\beta}}_0\).

If a problem doesn’t have observational errors (e.g., the NIST problem set), then FitBenchmarking can approximate errors by taking \(\sigma_i = \sqrt{y_i}\). Alternatively, there is an option to disregard errors and solve the unweighted nonlinear least-squares problem, setting \(\sigma_i = 1.0\) irrespective of what has been passed in with the problem data.

As we work with scientists in other areas, we will extend the problem suite to encompass new categories. The FitBenchmarking framework has been designed to make it easy to integrate new problem sets, and any additional data added to the framework can be tested with any and all of the available fitting methods.

Currently FitBenchmarking ships with data from the following sources:

## CrystalField Data (Mantid)

This folder (also found in examples/benchmark_problems/CrystalField) contains a test set for inelastic neutron scattering measurements of transitions between crystal field energy levels.

This problem has 8 parameters, and fits around 200 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

## CUTEst (NIST files)

This folder (also found in examples/benchmark_problems/CUTEst) contains several problems from the CUTEst continuous optimization testing environment which have been converted to the NIST format.

These problems all have 8 unknown parameters, and fit around 15 data points
with the exception of `VESUVIOLS`

which fits around 1000.

## Data Assimilation

This folder (also found in examples/benchmark_problems/Data_Assimilation) contains two examples using the data assimilation problem definition in fitbenchmarking. These examples follow the method set out in this paper.

These data files are synthetic and have been generated as an initial test of the minimizers. We plan to extend this with time series data which is more representative of the expectations for data assimilation in future updates.

These problems have either 2 or 3 unknown parameters, and fit either 100 or
1000 data points for `Simplified ANAC`

and `Lorentz`

problems respectively.

## Powder Diffraction Data (SIF files)

These problems (also found in the folder examples/benchmark_problems/DIAMOND_SIF) contain data from powder diffraction experiments. The data supplied comes from the I14 Hard X-Ray Nanoprobe beamline at the Diamond Light source, and has been supplied in the SIF format used by CUTEst.

These problems have either 66 or 99 unknown parameters, and fit around 5,000 data points.

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

## MultiFit Data (Mantid)

These problems (also found in the folder examples/benchmark_problems/MultiFit) contain data for testing the MultiFit functionality of Mantid. This contains a simple data set, on which two fits are done, and a calibration dataset from the MuSR spectrometer at ISIS, on which there are four fits available. See The MultiFit documentation for more details.

Basic Multifit has 3 unknown parameters, and fits 40 data points. MUSR62260 has 18 unknown parameters, and fits around 8000 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

This will also only work using the Mantid Minimizers.

## Muon Data (Mantid)

These problems (also found in the folder examples/benchmark_problems/Muon) contain data from Muon spectrometers. The data supplied comes from the HiFi and EMU instruments at STFC’s ISIS Neutron and Muon source, and has been supplied in the format that Mantid uses to process the data.

These problems have between 5 and 13 unknown parameters, and fit around 1,000 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

## Neutron Data (Mantid)

These problems (also found in the folder examples/benchmark_problems/Neutron) contain data from Neutron scattering experiments. The data supplied comes from the Engin-X, GEM, eVS, and WISH instruments at STFC’s ISIS Neutron and Muon source, and has been supplied in the format that Mantid uses to process the data.

The size of these problems differ massively. The Engin-X calibration problems find 7 unknown parameters, and fit to 56-67 data points. The Engin-X vanadium problems find 4 unknown parameters, and fit to around 14,168 data points. The eVS problems find 8 unknown parameters, and fit to 1,025 data points. The GEM problem finds 105 unknown parameters, and fits to 1,314 data points. The WISH problems find 5 unknown parameters, and fit to 512 data points.

Warning

## NIST

These problems (also found in the folder examples/benchmark_problems/NIST) contain data from the NIST Nonlinear Regression test set.

These problems are split into low, average and high difficulty. They have between 2 and 9 unknown parameters, and fit between 6 and 250 data points.

## Poisson Data

These problems (also found in the folder examples/benchmark_problems/Poisson) contain both simulated and real data measuring particle counts. The real data is ISIS muon data, and the simulated datasets have been made to represent counts using models provided by both Mantid and Bumps.

These problems have between 4 and 6 unknown parameters, and around 350, 800, and 2000 data points for simulated bumps, HIFI_160973, and simulated mantid respectively.

Warning

## Small Angle Scattering (SASView)

These problems (also found in the folder examples/benchmark_problems/SAS_modelling/1D) are two data sets from small angle scattering experiments. These are from fitting data to a cylinder, and have been supplied in the format that SASView uses to process the data.

These have 6 unknown parameters, and fit to either 20 or 54 data points.

Warning

The external package `sasmodels`

must be installed to run this data
set. See Installing External Software for details.

## CUTEst (SIF files)

This directory (also found in the folder examples/benchmark_problems/SIF) contain SIF files encoding least squares problems from the CUTEst continuous optimization testing environment.

These are from a wide range of applications. They have between 2 and 9 unknown parameters, and for the most part fit between 6 and 250 data points, although the VESUVIO examples (from the VESUVIO instrument at ISIS) have 1,025 data points (with 8 unknown parameters).

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

## SIF_GO

This directory (also found in the folder examples/benchmark_problems/SIF_GO) contains SIF files encoding least squares problems from the CUTEst continuous optimization testing environment.

All of these problems have been modified, with finite bounds added for all parameters, making the problems appropriate for testing global optimization solvers. The bounds that have been added to each problem are the same as those used in SciPy’s global optimization benchmark functions.

These problems have between 3 and 7 unknown parameters, and fit between 9 and 37 data points.

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

## HOGBEN Samples

These problems (also found in the folder examples/benchmark_problems/HOGBEN_samples) contain simulated reflectometry data. The data supplied has been generated using the `HOGBEN sample suite <https://github.com/jfkcooper/HOGBEN/blob/main/hogben/models/samples.py>.

These problems have between 4 and 10 unknown parameters, and fit around 180 data points.

## Simple tests

This folder (also found in examples/benchmark_problems/simple_tests) contains a number of simple tests with known, and easy to obtain, answers. We recommend that this is used to test any new minimizers that are added, and also that any new parsers reimplement these data sets and models (if possible).

These problems have 3 or 4 unknown parameters, and around 100 data points.