Benchmark problems

To help choose between the different minimizers, we have made some curated problems available to use with FitBenchmarking. It is also straightforward to add custom data sets to the benchmark, if that is more appropriate; see Problem Definition Files for specifics of how to add additional problems in a supported file format.

We supply some standard nonlinear least-squares test problems in the form of the NIST nonlinear regression set and the relevant problems from the CUTEst problem set, together with some real-world data sets that have been extracted from Mantid and SASView usage examples and system tests. We’ve made it possible to extend this list by following the steps in Adding Fitting Problem Definition Types.

Each of the test problems contain:

a data set consisting of points \((x_i, y_i)\) (with optional errors on \(y_i\), \(\sigma_i\));
a definition of the fitting function, \(f({\boldsymbol{\beta}};x)\); and
(at least) one set of initial values for the function parameters \({\boldsymbol{\beta}}_0\).

If a problem doesn’t have observational errors (e.g., the NIST problem set), then FitBenchmarking can approximate errors by taking \(\sigma_i = \sqrt{y_i}\). Alternatively, there is an option to disregard errors and solve the unweighted nonlinear least-squares problem, setting \(\sigma_i = 1.0\) irrespective of what has been passed in with the problem data.

As we work with scientists in other areas, we will extend the problem suite to encompass new categories. The FitBenchmarking framework has been designed to make it easy to integrate new problem sets, and any additional data added to the framework can be tested with any and all of the available fitting methods.

Currently FitBenchmarking ships with data from the following sources:

CrystalField Data (Mantid)

Download .zip or .tar.gz

This folder (also found in examples/benchmark_problems/CrystalField) contains a test set for inelastic neutron scattering measurements of transitions between crystal field energy levels.

This problem has 8 parameters, and fits around 200 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

CUTEst (NIST files)

Download .zip or .tar.gz

This folder (also found in examples/benchmark_problems/CUTEst) contains several problems from the CUTEst continuous optimization testing environment which have been converted to the NIST format.

These problems all have 8 unknown parameters, and fit around 15 data points with the exception of VESUVIOLS which fits around 1000.

Data Assimilation

Download .zip or .tar.gz

This folder (also found in examples/benchmark_problems/Data_Assimilation) contains two examples using the data assimilation problem definition in fitbenchmarking. These examples follow the method set out in this paper.

These data files are synthetic and have been generated as an initial test of the minimizers. We plan to extend this with time series data which is more representative of the expectations for data assimilation in future updates.

These problems have either 2 or 3 unknown parameters, and fit either 100 or 1000 data points for Simplified ANAC and Lorentz problems respectively.

Powder Diffraction Data (SIF files)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/DIAMOND_SIF) contain data from powder diffraction experiments. The data supplied comes from the I14 Hard X-Ray Nanoprobe beamline at the Diamond Light source, and has been supplied in the SIF format used by CUTEst.

These problems have either 66 or 99 unknown parameters, and fit around 5,000 data points.

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

MultiFit Data (Mantid)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/MultiFit) contain data for testing the MultiFit functionality of Mantid. This contains a simple data set, on which two fits are done, and a calibration dataset from the MuSR spectrometer at ISIS, on which there are four fits available. See The MultiFit documentation for more details.

Basic Multifit has 3 unknown parameters, and fits 40 data points. MUSR62260 has 18 unknown parameters, and fits around 8000 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

This will also only work using the Mantid (mantid) minimizers.

Muon Data (Mantid)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/Muon) contain data from Muon spectrometers. The data supplied comes from the HiFi and EMU instruments at STFC’s ISIS Neutron and Muon source, and has been supplied in the format that Mantid uses to process the data.

These problems have between 5 and 13 unknown parameters, and fit around 1,000 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

Neutron Data (Mantid)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/Neutron) contain data from Neutron scattering experiments. The data supplied comes from the Engin-X, GEM, eVS, and WISH instruments at STFC’s ISIS Neutron and Muon source, and has been supplied in the format that Mantid uses to process the data.

The size of these problems differ massively. The Engin-X calibration problems find 7 unknown parameters, and fit to 56-67 data points. The Engin-X vanadium problems find 4 unknown parameters, and fit to around 14,168 data points. The eVS problems find 8 unknown parameters, and fit to 1,025 data points. The GEM problem finds 105 unknown parameters, and fits to 1,314 data points. The WISH problems find 5 unknown parameters, and fit to 512 data points.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

NIST

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/NIST) contain data from the NIST Nonlinear Regression test set.

These problems are split into low, average and high difficulty. They have between 2 and 9 unknown parameters, and fit between 6 and 250 data points.

Poisson Data

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/Poisson) contain both simulated and real data measuring particle counts. The real data is ISIS muon data, and the simulated datasets have been made to represent counts using models provided by both Mantid and Bumps.

These problems have between 4 and 6 unknown parameters, and around 350, 800, and 2000 data points for simulated bumps, HIFI_160973, and simulated mantid respectively.

Warning

The external package Mantid must be installed to run this data set. See Installing External Software for details.

Small Angle Scattering (SASView)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/SAS_modelling/1D) are two data sets from small angle scattering experiments. These are from fitting data to a cylinder, and have been supplied in the format that SASView uses to process the data.

These have 6 unknown parameters, and fit to either 20 or 54 data points.

Warning

The external package sasmodels must be installed to run this data set. See Installing External Software for details.

CUTEst (SIF files)

Download .zip or .tar.gz

This directory (also found in the folder examples/benchmark_problems/SIF) contain SIF files encoding least squares problems from the CUTEst continuous optimization testing environment.

These are from a wide range of applications. They have between 2 and 9 unknown parameters, and for the most part fit between 6 and 250 data points, although the VESUVIO examples (from the VESUVIO instrument at ISIS) have 1,025 data points (with 8 unknown parameters).

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

SIF_GO

Download .zip or .tar.gz

This directory (also found in the folder examples/benchmark_problems/SIF_GO) contains SIF files encoding least squares problems from the CUTEst continuous optimization testing environment.

All of these problems have been modified, with finite bounds added for all parameters, making the problems appropriate for testing global optimization solvers. The bounds that have been added to each problem are the same as those used in SciPy’s global optimization benchmark functions.

These problems have between 3 and 7 unknown parameters, and fit between 9 and 37 data points.

Warning

The external packages CUTEst and pycutest must be installed to run this data set. See Installing External Software for details.

HOGBEN Samples

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/HOGBEN_samples) contain simulated reflectometry data. The data supplied has been generated using the HOGBEN sample suite.

These problems have between 4 and 10 unknown parameters, and fit around 180 data points.

Bundle Adjustment in the Large (BAL)

Download .zip or .tar.gz

These problems (also found in the folder examples/benchmark_problems/Bundle_Adjustment) contain image data, either captured at a regular rate using a Ladybug camera, or downloaded from Flickr.com. Please see the GRAIL webpage for more information on these datasets.

These problems have between ~20,000 and ~190,000 unknown parameters, and fit between ~60,000 and ~170,000 data points.

Note

These problems can currently only be run using the scipy_ls software, which supports sparse jacobians. When running these problems with the nlls cost function, we would suggest adding the options ftol=1e-4 and x_scale=’jac’ to the call to scipy.optimize.least_squares.

Simple tests

Download .zip or .tar.gz

This folder (also found in examples/benchmark_problems/simple_tests) contains a number of simple tests with known, and easy to obtain, answers. We recommend that this is used to test any new minimizers that are added, and also that any new parsers reimplement these data sets and models (if possible).

These problems have 3 or 4 unknown parameters, and around 100 data points.

Mantid System Test Data

Download .zip or .tar.gz

This folder (found in examples/benchmark_problems/Mantid_System_Test_Data) contains data from the Mantid System Tests. The data was taken from the OSIRISIqtAndIqtFit test. The spectrums come from osi97935_graphite002_red.nxs and is used in ISIS indirect inelastic calibration tests.

The plots of the 42 spectrums from osi97935_graphite002_red.nxs

Synthetic Datasets

Download .zip or .tar.gz

This folder (found in examples/benchmark_problems/synthetic_data) contains synthetic data to test the minimizers of Mantid. The data was generated to particularly test the BackToBackExponential and the Gaussian fitting functions.

The data for testing the BackToBackExponential fitting can be found within the backtobackexp subfolder. It contains a dataset with 15 different starting conditions for the parameters.

The data for testing the Gaussian fitting can be found within the gaussian subfolder. It contains a dataset with 16 different starting conditions for the parameters.

SpinW 2D Powder Data

1D cuts example

Download .zip or .tar.gz

This problem (also found in the folder examples/benchmark_problems/SpinW_powder_data) contains 2D powder data simulated using SpinW, approach outlined in this tutorial . In this case 1D cuts of the data are taken at user-specified Q values which are then fitted simultaneously. It is assumed that data has been cropped before being parsed into FitBenchmarking.

2D data

Download .zip or .tar.gz

This set of problems (also found in the folder examples/benchmark_problems/SpinW_powder_data_2d) contains 2D powder data, with one dataset simulated using SpinW (using the approach outlined in this tutorial), and three versions of the \(CrCl_2(pym)\) dataset, which was collected at ISIS (one is temperature subtracted, one has high energy values cropped out, and one has no cropping). It is assumed any required data cropping has been carried out before being parsed into FitBenchmarking.

These problems have 8 unknown parameters and between ~200 and ~9500 data points.

In this case, as the fitting is done on the whole 2D data, the problem summary page (Problem Summary Page) presents 2D plots showing the fit for the best minimizer for each cost function.