Step 2: Generate data samples

In this section, we will use the HelicityModel that we created with ampform in the previous step to generate a data sample via hit & miss Monte Carlo. We do this with the data module.

First, we load() the HelicityModel that was created in the previous step:

import pickle

with open("helicity_model.pickle", "rb") as model_file:
    model = pickle.load(model_file)
reaction_info = model.adapter.reaction_info
initial_state = next(iter(reaction_info.initial_state.values()))
print("Initial state:")
print(" ", initial_state.name)
print("Final state:")
for i, p in reaction_info.final_state.items():
    print(f"  {i}: {p.name}")
Initial state:
  J/psi(1S)
Final state:
  0: gamma
  1: pi0
  2: pi0

2.1 Generate phase space sample

The ReactionInfo class defines the constraints of the phase space. As such, we have enough information to generate a phase-space sample for this particle reaction. We do this with the generate_phsp() function. By default, this function uses TFPhaseSpaceGenerator as a, well… phase-space generator (using tensorflow and the phasespace package as a back-end) and generates random numbers with TFUniformRealNumberGenerator. You can use other generators with the arguments of generate_phsp().

import pandas as pd
from tensorwaves.data import generate_phsp

phsp_sample = generate_phsp(300_000, model.adapter.reaction_info)
pd.DataFrame(phsp_sample.to_pandas())
0 1 2
E px py pz E px py pz E px py pz
0 0.849148 -0.834824 -0.042127 -0.149487 1.299030 0.967049 0.663470 -0.542109 0.948723 -0.132226 -0.621343 0.691597
1 1.364961 -1.153045 -0.151741 0.714549 0.847233 0.467906 0.579792 -0.380119 0.884706 0.685139 -0.428051 -0.334430
2 1.063658 0.235869 -0.819499 -0.635731 1.218805 -0.549662 1.077911 -0.056979 0.814437 0.313792 -0.258412 0.692710
3 1.349400 0.578408 0.778560 -0.938173 1.305332 -0.740914 -0.393704 0.990816 0.442168 0.162506 -0.384856 -0.052642
4 1.076290 -0.711023 0.740634 -0.322965 1.283460 1.078104 -0.618651 -0.289850 0.737150 -0.367081 -0.121983 0.612816
... ... ... ... ... ... ... ... ... ... ... ... ...
299995 1.298547 -0.189943 -0.356461 -1.234132 1.370417 0.563376 0.197101 1.226207 0.427936 -0.373433 0.159359 0.007926
299996 0.919802 -0.816913 0.416156 0.074186 1.081352 0.500423 0.169936 -0.933704 1.095746 0.316490 -0.586092 0.859517
299997 0.615511 -0.184149 -0.558669 0.181195 1.105553 -1.093144 0.085661 -0.041577 1.375836 1.277292 0.473008 -0.139618
299998 0.455336 0.239086 0.260717 0.286698 1.436444 -1.415904 -0.185080 -0.078193 1.205120 1.176818 -0.075637 -0.208505
299999 0.824300 -0.258526 -0.773821 -0.117626 1.036977 0.387557 0.160549 -0.938684 1.235622 -0.129031 0.613273 1.056310

300000 rows × 12 columns

The resulting phase space sample is a EventCollection of FourMomentumSequences for each particle in the final state. The to_pandas() method can be used to cast the EventCollection to a format that can be understood by pandas.DataFrame.

2.2 Generate intensity-based sample

‘Data samples’ are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a Function object that expresses an intensity distribution as well as a phase space over which to generate that distribution. We call such a data sample an intensity-based sample.

An intensity-based sample is generated with the function generate_data(). Its usage is similar to generate_phsp(), but now you have to provide a Function as well as a DataTransformer that is used to transform the four-momentum phase space sample to a data sample that can be understood by the Function.

Now, recall that in Step 1: Create amplitude model, we used the helicity formalism to mathematically express the reaction in terms of an amplitude model. TensorWaves needs to convert this HelicityModel to an Model object that it can then lambdify() to a Function object.

The HelicityModel was expressed in terms of sympy, so we express the model as a SympyModel and lambdify it to a LambdifiedFunction:

from tensorwaves.model import LambdifiedFunction, SympyModel

sympy_model = SympyModel(
    expression=model.expression,
    parameters=model.parameter_defaults,
)
intensity = LambdifiedFunction(sympy_model, backend="numpy")

A problem is that LambdifiedFunction takes a DataSample as input, not a set of four-momenta. We therefore need to construct a DataTransformer to transform these four-momenta to function variables. In this case, we work with the helicity formalism, so we construct a HelicityTransformer:

from tensorwaves.data.transform import HelicityTransformer

data_converter = HelicityTransformer(model.adapter)

That’s it, now we have enough info to create an intensity-based data sample. Notice how the structure of the output data is the same as the phase-space sample we generated previously:

from tensorwaves.data import generate_data

data_sample = generate_data(
    size=30_000,
    reaction_info=model.adapter.reaction_info,
    data_transformer=data_converter,
    intensity=intensity,
)
pd.DataFrame(data_sample.to_pandas())
0 1 2
E px py pz E px py pz E px py pz
0 1.027330 -0.207668 0.586297 0.817641 1.307621 0.415051 -1.168383 -0.392770 0.761949 -0.207383 0.582086 -0.424871
1 1.401233 -0.190281 1.339563 -0.364443 1.230524 -0.071035 -1.064803 0.597595 0.465143 0.261316 -0.274760 -0.233152
2 1.177840 -0.369484 -0.966886 0.562068 0.428137 -0.348863 -0.096765 0.184429 1.490923 0.718346 1.063650 -0.746497
3 0.851771 0.129568 0.580719 0.609501 1.421892 0.527849 -0.967737 -0.887929 0.823237 -0.657416 0.387018 0.278428
4 1.175464 1.066550 0.374763 0.322086 0.848054 -0.548519 0.555397 -0.302719 1.073382 -0.518031 -0.930161 -0.019368
... ... ... ... ... ... ... ... ... ... ... ... ...
29995 1.283927 1.041788 -0.109711 0.742368 1.235128 -1.199761 -0.093855 -0.243078 0.577845 0.157973 0.203566 -0.499291
29996 1.194523 -0.426192 -0.633642 -0.918556 0.643587 0.026303 -0.450713 0.438350 1.258790 0.399889 1.084355 0.480206
29997 1.417695 -1.287611 0.393785 0.443678 0.334659 0.193186 -0.032298 0.235401 1.344547 1.094424 -0.361487 -0.679079
29998 1.257184 0.754099 0.987514 -0.191471 0.896784 -0.647124 -0.357910 -0.489012 0.942932 -0.106975 -0.629604 0.680484
29999 1.129248 0.909690 0.657640 -0.123192 0.794472 0.253403 -0.725173 0.151253 1.173180 -1.163092 0.067534 -0.028061

30000 rows × 12 columns

2.3 Visualize kinematic variables

We now have a phase space sample and an intensity-based sample. Their data structure isn’t the most informative though: it’s just a collection of four-momentum tuples. But we can again use the HelicityTransformer to convert these four-momenta to (in the case of the helicity formalism) invariant masses and helicity angles:

phsp_set = data_converter.transform(phsp_sample)
data_set = data_converter.transform(data_sample)
list(data_set)
['phi_1+2',
 'theta_1+2',
 'phi_1,1+2',
 'theta_1,1+2',
 'm_012',
 'm_0',
 'm_1',
 'm_2',
 'm_12']

The DataSet is just a mapping of kinematic variables names to a sequence of values. The numbers you see here are final state IDs as defined in the HelicityModel member of the HelicityModel:

for state_id, particle in model.adapter.reaction_info.final_state.items():
    print(f"ID {state_id}:", particle.name)
ID 0: gamma
ID 1: pi0
ID 2: pi0

Just like EventCollection, the DataSet can easily be converted it to a pandas.DataFrame:

import numpy as np
import pandas as pd

data_frame = pd.DataFrame(data_set.to_pandas())
phsp_frame = pd.DataFrame(data_set.to_pandas())
data_frame
phi_1+2 theta_1+2 phi_1,1+2 theta_1,1+2 m_012 m_0 m_1 m_2 m_12
0 -1.230382 2.491272 3.140075 1.003617 3.0969 0.000000e+00+2.580957e-08j 0.134977 0.134977 1.796583
1 -1.429693 1.307684 -2.517218 0.964961 3.0969 2.580957e-08+0.000000e+00j 0.134977 0.134977 0.954898
2 1.205783 2.068265 1.772305 2.731266 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.515086
3 -1.790317 2.368234 1.534110 0.782936 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 2.077279
4 -2.803690 1.848354 -1.294963 1.766839 3.0969 0.000000e+00+2.107342e-08j 0.134977 0.134977 1.519935
... ... ... ... ... ... ... ... ... ...
29995 3.036669 2.187319 2.718148 1.019594 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.280000
29996 0.978693 0.693564 -2.685002 2.122102 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.480592
29997 -0.296793 1.889102 3.051759 2.413934 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 0.899929
29998 -2.222966 1.417900 -0.468913 1.608278 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.343147
29999 -2.515644 1.461487 1.740849 1.917895 3.0969 0.000000e+00+0.000000e+00j 0.134977 0.134977 1.611351

30000 rows × 9 columns

This also means that we can use all kinds of fancy plotting functionality of for instance matplotlib.pyplot to see what’s going on. Here’s an example:

from matplotlib import cm

reaction_info = model.adapter.reaction_info
intermediate_states = sorted(
    (
        p
        for p in model.particles
        if p not in reaction_info.final_state.values()
        and p not in reaction_info.initial_state.values()
    ),
    key=lambda p: p.mass,
)

evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]
import matplotlib.pyplot as plt

data_frame["m_12"].hist(bins=100, alpha=0.5, density=True, figsize=(8, 4))
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
    plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();
../_images/step2_26_0.svg

2.4 Export data sets

TensorWaves currently has no export functionality for data samples, so we just pickle.dump() these data samples as follows:

import pickle

with open("data_set.pickle", "wb") as stream:
    pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
    pickle.dump(phsp_set, stream)

In the next step, we illustrate how to optimize() the intensity model to these data samples.