Chi-squared estimator

Chi-squared estimator#

One of the Estimators in the estimator module is ChiSquared. This estimator is useful if you have a set of \(n\) measured values \(\mathbf{y}=\left\{y_1, y_2, \dots, y_n\right\}\) for \(m\)-dimensional data points \(\mathbf{x}=\left\{x_{j,1}, x_{j,2}, \dots, x_{j,n}\right\}\), with \(j\in\left\{1, \dots, m\right\}\).

To illustrate how to fit an expression to some data sample \(\mathbf{x},\mathbf{y}\), we’ll generate ‘observed’ values \(\mathbf{y}\) for the following 1-dimensional polynomial.

import sympy as sp

a, b, c, x = sp.symbols("a b c x")
expression = a + b * x + c * x**2
expression
\[\displaystyle a + b x + c x^{2}\]

From this expression, we create a ParametrizedFunction, where the symbols \(a,b,c\) are interpreted as its parameters. The values here are chosen arbitrarily to generate the \(\mathbf{y}\)-values in the next step.

from tensorwaves.function.sympy import create_parametrized_function

function = create_parametrized_function(
    expression,
    parameters={a: 17, b: -2, c: -0.8},
    backend="jax",
)

Next, we uniformly generate data points \(\mathbf{x}\) a line segment. The corresponding values for \(\mathbf{y}\) are computed with the above polynomial function and smeared with a small, normally distributed offset.

import numpy as np


def smear_gaussian(array, sigma, rng):
    return array + rng.normal(scale=sigma, size=len(array))


rng = np.random.default_rng(seed=0)
sample_size = 500
x_min, x_max = -5, +5
x_values = rng.uniform(x_min, x_max, sample_size)
data = {"x": x_values}
observed_y = function(data)
observed_y = smear_gaussian(observed_y, sigma=5, rng=rng)

To make the fit a bit more interesting, we give the parameters a different value than the ones we used to generate the \(\mathbf{y}\)-values with.

original_parameters = function.parameters
initial_parameters = {"a": -25, "b": 1.5, "c": 2.6}
function.update_parameters(initial_parameters)
Hide code cell source
%config InlineBackend.figure_formats = ['svg']

import matplotlib.pyplot as plt


def compare_model(function, x_values, observed_y):
    _, ax = plt.subplots(figsize=(8, 4))
    linear_domain = {"x": np.linspace(x_min, x_max, 100)}
    ax.plot(
        linear_domain["x"],
        function(linear_domain),
        c="red",
        linewidth=3,
        label="initial fit model",
    )
    ax.scatter(x_values, observed_y, s=2, label="generated data")
    ax.set_xlabel("$x$")
    ax.set_ylabel("$y$")
    ax.set_ylim([-30, 50])
    ax.legend(loc="upper left")
    plt.show()


compare_model(function, x_values, observed_y)
../../_images/8b717c7d56c960203cbaf5a46358248b04b44210d47e81f6611e59f2c27bd950.svg

Finally, we construct a ChiSquared estimator and use it to optimize the ParametrizedFunction with regard to the \(\mathbf{x},\mathbf{y}\) data points.

from tensorwaves.estimator import ChiSquared
from tensorwaves.optimizer import Minuit2

estimator = ChiSquared(function, data, observed_y, backend="jax")
optimizer = Minuit2()
fit_result = optimizer.optimize(estimator, initial_parameters)
fit_result
FitResult(
 minimum_valid=True,
 execution_time=0.27720046043395996,
 function_calls=57,
 estimator_value=10927.456819966694,
 parameter_values={
  'a': 16.397928086915478,
  'b': -1.9790943578671951,
  'c': -0.7697584942433522,
 },
 parameter_errors={
  'a': 0.04722220079747241,
  'b': 0.011006856699333373,
  'c': 0.004187273652065098,
 },
)

The optimized parameters in the FitResult are indeed comparable to the original parameter values with which the data was generated:

original_parameters
{'a': 17, 'b': -2, 'c': -0.8}
Hide code cell source
compare_model(function, x_values, observed_y)
../../_images/cb36206c02eb7a533d971562064c8795c1014a688e30103d6cb69d77381420ea.svg