Unbinned fit

Unbinned fit#

Imagine we have the following data distribution over \(x,y\):

import numpy as np

sample_size = 50_000
rng = np.random.default_rng(seed=0)
data = {
    "x": rng.rayleigh(size=sample_size),
    "y": rng.normal(size=sample_size),
}

../_images/a9ddd89a3e276b97bf8a140087a59a21003f2e2eda7519d8abbd6944e11e75c5.png

The data distribution has been generated by numpy.random.normal() and numpy.random.rayleigh() and can therefore be described by the following expression:

import sympy as sp

def rayleigh(x, sigma):
    return x / sigma**2 * sp.exp(-(x**2) / (2 * sigma**2))

def gaussian(x, mu, sigma):
    return sp.exp(-((x - mu) ** 2) / (2 * sigma**2)) / sp.sqrt(2 * sp.pi * sigma**2)

x, y, mu, sigma_x, sigma_y = sp.symbols("x y mu sigma_x sigma_y")
expression = rayleigh(x, sigma_x) * gaussian(y, mu, sigma_y)
expression

\[\displaystyle \frac{\sqrt{2} x e^{- \frac{x^{2}}{2 \sigma_{x}^{2}}} e^{- \frac{\left(- \mu + y\right)^{2}}{2 \sigma_{y}^{2}}}}{2 \sqrt{\pi} \sigma_{x}^{2} \sqrt{\sigma_{y}^{2}}}\]

We would like to find values for \(\mu, \sigma_x, \sigma_y\), so that the expression describe this distribution as best as possible. For this, we first formulate this expression as a ParametrizedFunction in a specific computational backend, so that we can use it to quickly compute values over a number of data points. We also provide some initial guesses for the parameter values:

from tensorwaves.function.sympy import create_parametrized_function

function = create_parametrized_function(
    expression,
    parameters={mu: -0.3, sigma_x: 0.3, sigma_y: 2.7},
    backend="jax",
)
initial_parameters = function.parameters

The function can be used to visualize the expression with this choice of parameter values over a certain \(xy\)-domain and compare it to the original data distribution.

../_images/64e4d47dda11f2e5a40b6c9b86b63d1785aa3e17016c5c16bad11dbdcf4d15ed.png

Next, we use a UnbinnedNLL optimize the parameters with regard to the data distribution. Note that a UnbinnedNLL requires a domain over which to integrate the ParametrizedFunction, in order to normalize the log likelihood.

from tensorwaves.estimator import UnbinnedNLL
from tensorwaves.optimizer import Minuit2

integration_domain = {
    "x": rng.uniform(0, 4, size=200_000),
    "y": rng.uniform(-3, +3, size=200_000),
}
estimator = UnbinnedNLL(function, data, integration_domain, backend="jax")
optimizer = Minuit2()
fit_result = optimizer.optimize(estimator, initial_parameters)
fit_result

FitResult(
 minimum_valid=True,
 execution_time=0.7102274894714355,
 function_calls=254,
 estimator_value=-41104.12400237404,
 parameter_values={
  'mu': -0.003288285721396106,
  'sigma_x': 1.0015447225895224,
  'sigma_y': -1.0094427512940627,
 },
 parameter_errors={
  'mu': 0.0045646126456405445,
  'sigma_x': 0.0022676565526039377,
  'sigma_y': 0.0034326069432988014,
 },
)

The values are indeed close to the default values for numpy.random.rayleigh() (\(\sigma_x=1\)) numpy.random.normal() (\(\mu=0, \sigma_y=1\)), with which the data distribution was generated.

../_images/13e07fe46b1ea3d9e5d2aaf3653bd5bf3df505680b24808e3151b36bbcc82246.png