Chi-squared estimator#
One of the Estimator
s in the estimator
module is ChiSquared
. This estimator is useful if you have a set of \(n\) measured values \(\mathbf{y}=\left\{y_1, y_2, \dots, y_n\right\}\) for \(m\)-dimensional data points \(\mathbf{x}=\left\{x_{j,1}, x_{j,2}, \dots, x_{j,n}\right\}\), with \(j\in\left\{1, \dots, m\right\}\).
To illustrate how to fit an expression to some data sample \(\mathbf{x},\mathbf{y}\), we’ll generate ‘observed’ values \(\mathbf{y}\) for the following 1-dimensional polynomial.
import sympy as sp
a, b, c, x = sp.symbols("a b c x")
expression = a + b * x + c * x**2
expression
From this expression, we create a ParametrizedFunction
, where the symbols \(a,b,c\) are interpreted as its parameters. The values here are chosen arbitrarily to generate the \(\mathbf{y}\)-values in the next step.
from tensorwaves.function.sympy import create_parametrized_function
function = create_parametrized_function(
expression,
parameters={a: 17, b: -2, c: -0.8},
backend="jax",
)
Next, we uniformly generate data points \(\mathbf{x}\) a line segment. The corresponding values for \(\mathbf{y}\) are computed with the above polynomial function and smeared with a small, normally distributed offset.
import numpy as np
def smear_gaussian(array, sigma, rng):
return array + rng.normal(scale=sigma, size=len(array))
rng = np.random.default_rng(seed=0)
sample_size = 500
x_min, x_max = -5, +5
x_values = rng.uniform(x_min, x_max, sample_size)
data = {"x": x_values}
observed_y = function(data)
observed_y = smear_gaussian(observed_y, sigma=5, rng=rng)
To make the fit a bit more interesting, we give the parameters
a different value than the ones we used to generate the \(\mathbf{y}\)-values with.
original_parameters = function.parameters
initial_parameters = {"a": -25, "b": 1.5, "c": 2.6}
function.update_parameters(initial_parameters)
Show code cell source
%config InlineBackend.figure_formats = ['svg']
import matplotlib.pyplot as plt
def compare_model(function, x_values, observed_y):
_, ax = plt.subplots(figsize=(8, 4))
linear_domain = {"x": np.linspace(x_min, x_max, 100)}
ax.plot(
linear_domain["x"],
function(linear_domain),
c="red",
linewidth=3,
label="initial fit model",
)
ax.scatter(x_values, observed_y, s=2, label="generated data")
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.set_ylim([-30, 50])
ax.legend(loc="upper left")
plt.show()
compare_model(function, x_values, observed_y)
Finally, we construct a ChiSquared
estimator and use it to optimize the ParametrizedFunction
with regard to the \(\mathbf{x},\mathbf{y}\) data points.
from tensorwaves.estimator import ChiSquared
from tensorwaves.optimizer import Minuit2
estimator = ChiSquared(function, data, observed_y, backend="jax")
optimizer = Minuit2()
fit_result = optimizer.optimize(estimator, initial_parameters)
fit_result
FitResult(
minimum_valid=True,
execution_time=0.11722111701965332,
function_calls=57,
estimator_value=10927.456819966694,
parameter_values={
'a': 16.397928086915478,
'b': -1.9790943578671951,
'c': -0.7697584942433522,
},
parameter_errors={
'a': 0.04722220079747241,
'b': 0.011006856699333373,
'c': 0.004187273652065098,
},
)
The optimized parameters in the FitResult
are indeed comparable to the original parameter values with which the data was generated:
original_parameters
{'a': 17, 'b': -2, 'c': -0.8}
Show code cell source
compare_model(function, x_values, observed_y)