Step 2: Generate data samples

In this section, we will use the HelicityModel that we created with ampform in the previous step to generate a data sample via hit & miss Monte Carlo. We do this with the data module.

First, we load() the HelicityModel that was created in the previous step:

import pickle

with open("helicity_model.pickle", "rb") as model_file:
    model = pickle.load(model_file)
reaction_info = model.adapter.reaction_info
initial_state = next(iter(reaction_info.initial_state.values()))
print("Initial state:")
print(" ", initial_state.name)
print("Final state:")
for i, p in reaction_info.final_state.items():
    print(f"  {i}: {p.name}")
Initial state:
  J/psi(1S)
Final state:
  0: gamma
  1: pi0
  2: pi0

2.1 Generate phase space sample

The ReactionInfo class defines the constraints of the phase space. As such, we have enough information to generate a phase-space sample for this particle reaction. We do this with the generate_phsp() function. By default, this function uses TFPhaseSpaceGenerator as a, well… phase-space generator (using tensorflow and the phasespace package as a back-end) and generates random numbers with TFUniformRealNumberGenerator. You can use other generators with the arguments of generate_phsp().

As opposed to the main Generate data sample of the main usage example page, we will generate a deterministic data sample. This can be done by feeding a UniformRealNumberGenerator with a specific UniformRealNumberGenerator.seed and feeding that generator to the generate_phsp() function:

import numpy as np
import pandas as pd

from tensorwaves.data import TFUniformRealNumberGenerator, generate_phsp

rng = TFUniformRealNumberGenerator(seed=0)
reaction_info = model.adapter.reaction_info
initial_state_mass = reaction_info.initial_state[-1].mass
final_state_masses = {i: p.mass for i, p in reaction_info.final_state.items()}
phsp_sample = generate_phsp(
    size=100_000,
    initial_state_mass=initial_state_mass,
    final_state_masses=final_state_masses,
    random_generator=rng,
)
pd.DataFrame(
    {
        (k, label): np.transpose(v)[i]
        for k, v in phsp_sample.items()
        for i, label in enumerate(["E", "px", "py", "pz"])
    }
)
0 1 2
E px py pz E px py pz E px py pz
0 0.811317 0.787621 0.144327 0.130609 1.092959 -0.019148 -0.531113 -0.945459 1.192624 -0.768473 0.386786 0.814849
1 0.602303 -0.343019 -0.367933 0.331259 1.431153 1.173457 -0.078558 -0.804244 1.063444 -0.830437 0.446491 0.472984
2 1.165704 0.115778 0.666757 -0.949155 1.443427 0.192011 -0.593016 1.294884 0.487770 -0.307789 -0.073741 -0.345729
3 0.593783 0.392870 0.407210 -0.180034 1.246726 0.037716 0.106090 1.234273 1.256391 -0.430586 -0.513300 -1.054238
4 0.864272 -0.668511 0.516082 0.183623 0.754404 -0.716281 -0.135528 0.139574 1.478224 1.384792 -0.380554 -0.323197
... ... ... ... ... ... ... ... ... ... ... ... ...
99995 0.472639 0.250787 0.395665 0.062791 1.126885 0.041148 0.795625 0.785454 1.497377 -0.291935 -1.191290 -0.848245
99996 1.330392 1.185399 0.538998 -0.272492 0.419241 -0.033333 0.139318 0.370167 1.347267 -1.152066 -0.678316 -0.097675
99997 0.864709 0.258491 0.294117 -0.770973 1.444051 -0.804658 0.065531 1.189662 0.788140 0.546167 -0.359648 -0.418689
99998 0.607279 0.563725 0.219317 -0.053868 0.968601 0.899961 0.061514 0.325968 1.521020 -1.463686 -0.280831 -0.272100
99999 1.148659 -0.481566 0.715177 0.758969 1.291588 0.469209 -1.158960 -0.294342 0.656653 0.012357 0.443783 -0.464627

100000 rows × 12 columns

The resulting phase space sample is a dict of final state IDs to an array of four-momenta. In the last step, we converted this sample in such a way that it is rendered as an understandable pandas.DataFrame.

2.2 Generate intensity-based sample

‘Data samples’ are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a Function object that expresses an intensity distribution as well as a phase space over which to generate that distribution. We call such a data sample an intensity-based sample.

An intensity-based sample is generated with the function generate_data(). Its usage is similar to generate_phsp(), but now you have to provide a Function as well as a DataTransformer that is used to transform the four-momentum phase space sample to a data sample that can be understood by the Function.

Now, recall that in Step 1: Create amplitude model, we used the helicity formalism to mathematically express the reaction in terms of an amplitude model. TensorWaves needs to convert this HelicityModel to an Model object that it can then lambdify() to a Function object.

The HelicityModel was expressed in terms of sympy, so we express the model as a SympyModel and lambdify it to a LambdifiedFunction:

from tensorwaves.model import LambdifiedFunction, SympyModel

sympy_model = SympyModel(
    expression=model.expression.doit(),
    parameters=model.parameter_defaults,
    max_complexity=200,
)
%time intensity = LambdifiedFunction(sympy_model, backend="numpy")
CPU times: user 1.03 s, sys: 12.2 ms, total: 1.04 s
Wall time: 1.03 s

A problem is that LambdifiedFunction takes a DataSample as input, not a set of four-momenta. We therefore need to construct a DataTransformer to transform these four-momenta to function variables. In this case, we work with the helicity formalism, so we construct a HelicityTransformer:

from tensorwaves.data.transform import HelicityTransformer

data_converter = HelicityTransformer(model.adapter)

That’s it, now we have enough info to create an intensity-based data sample. Notice how the structure of the output data is the same as the phase-space sample we generated previously:

from tensorwaves.data import generate_data

data_sample = generate_data(
    size=10_000,
    initial_state_mass=initial_state_mass,
    final_state_masses=final_state_masses,
    data_transformer=data_converter,
    intensity=intensity,
    random_generator=rng,
)
pd.DataFrame(
    {
        (k, label): np.transpose(v)[i]
        for k, v in data_sample.items()
        for i, label in enumerate(["E", "px", "py", "pz"])
    }
)
0 1 2
E px py pz E px py pz E px py pz
0 1.185259 0.830812 -0.168623 0.828345 1.238715 -0.383917 0.626122 -0.988319 0.672927 -0.446895 -0.457499 0.159974
1 1.494125 -0.195880 -1.307833 -0.695422 0.824596 0.072240 0.798756 0.136049 0.778180 0.123640 0.509077 0.559374
2 1.187149 0.751684 -0.359411 -0.845646 0.862093 0.159626 -0.139034 0.824727 1.047659 -0.911310 0.498444 0.020920
3 1.326583 -1.116556 -0.650988 -0.298899 1.416935 0.996462 0.954615 0.291996 0.353382 0.120094 -0.303627 0.006904
4 1.144748 -0.136758 0.110361 1.131178 0.762357 0.483725 -0.565345 -0.096772 1.189795 -0.346967 0.454985 -1.034407
... ... ... ... ... ... ... ... ... ... ... ... ...
9995 1.191925 -0.120250 0.389471 1.120062 0.461045 -0.413191 0.011598 0.153240 1.443929 0.533441 -0.401070 -1.273302
9996 1.493273 1.011846 0.832871 -0.715791 0.353850 -0.265738 0.010368 0.190438 1.249778 -0.746108 -0.843239 0.525353
9997 0.930555 -0.480388 -0.190934 -0.773760 1.092046 -0.578604 0.204632 0.893134 1.074300 1.058992 -0.013698 -0.119374
9998 1.142834 -0.684490 0.729063 0.553183 0.844822 -0.400886 -0.592552 -0.428578 1.109243 1.085375 -0.136511 -0.124606
9999 1.074710 -0.842913 -0.582449 -0.324426 0.752693 -0.024504 0.454963 -0.583726 1.269497 0.867417 0.127486 0.908152

10000 rows × 12 columns

As before, we use a UniformRealNumberGenerator with a specific UniformRealNumberGenerator.seed to ensure we get a deterministic data sample.

2.3 Visualize kinematic variables

We now have a phase space sample and an intensity-based sample. Their data structure isn’t the most informative though: it’s just a collection of four-momentum tuples. But we can again use the HelicityTransformer to convert these four-momenta to (in the case of the helicity formalism) invariant masses and helicity angles:

phsp_set = data_converter.transform(phsp_sample)
data_set = data_converter.transform(data_sample)
list(data_set)
['phi_1+2',
 'theta_1+2',
 'phi_1,1+2',
 'theta_1,1+2',
 'm_012',
 'm_0',
 'm_1',
 'm_2',
 'm_12']

The DataSet is just a mapping of kinematic variables names to a sequence of values. The numbers you see here are final state IDs as defined in the HelicityModel member of the HelicityModel:

for (
    state_id,
    particle,
) in model.adapter.reaction_info.final_state.items():
    print(f"ID {state_id}:", particle.name)
ID 0: gamma
ID 1: pi0
ID 2: pi0

The DataSet can easily be converted it to a pandas.DataFrame:

import pandas as pd

data_frame = pd.DataFrame(data_set)
phsp_frame = pd.DataFrame(data_set)
data_frame
phi_1+2 theta_1+2 phi_1,1+2 theta_1,1+2 m_012 m_0 m_1 m_2 m_12
0 2.941350 2.344617 -0.984419 1.064114 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.499845
1 1.422127 1.086667 0.183725 1.535691 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 0.580070
2 2.695585 0.777978 3.063622 1.730394 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.495937
3 0.527850 1.343530 1.515685 0.602596 3.0969 0.000000e+00+2.580957e-08j 0.134977 0.134977 1.172263
4 -0.678981 2.987470 -2.951556 1.959462 3.0969 2.107342e-08+0.000000e+00j 0.134977 0.134977 1.581282
... ... ... ... ... ... ... ... ... ...
9995 -1.271331 2.792571 -1.387495 2.565453 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.486016
9996 -2.452912 1.070889 -1.957086 2.313677 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 0.584599
9997 0.378314 0.588987 2.711496 1.551541 3.0969 1.490116e-08+0.000000e+00j 0.134977 0.134977 1.956302
9998 -0.816920 2.076068 -1.166315 1.807813 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.585024
9999 0.604657 1.264140 0.553347 2.079405 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.712966

10000 rows × 9 columns

This also means that we can use all kinds of fancy plotting functionality of for instance matplotlib.pyplot to see what’s going on. Here’s an example:

from matplotlib import cm

reaction_info = model.adapter.reaction_info
intermediate_states = sorted(
    (
        p
        for p in model.particles
        if p not in reaction_info.final_state.values()
        and p not in reaction_info.initial_state.values()
    ),
    key=lambda p: p.mass,
)

evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]
import matplotlib.pyplot as plt

data_frame["m_12"].hist(bins=100, alpha=0.5, density=True, figsize=(9, 4))
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
    plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();
../_images/step2_29_0.svg

2.4 Export data sets

TensorWaves currently has no export functionality for data samples, so we just pickle.dump() these data samples as follows:

import pickle

with open("data_set.pickle", "wb") as stream:
    pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
    pickle.dump(phsp_set, stream)

In the next step, we illustrate how to optimize() the intensity model to these data samples.