Step 2: Generate data samples
Contents
Step 2: Generate data samples#
In this section, we will use the HelicityModel
that we created with ampform
in the previous step to generate a data sample via hit & miss Monte Carlo. We do this with the data
module.
First, we load()
the HelicityModel
that was created in the previous step:
import pickle
from ampform.helicity import HelicityModel
with open("helicity_model.pickle", "rb") as model_file:
model: HelicityModel = pickle.load(model_file)
2.1 Generate phase space sample#
The ReactionInfo
class defines the constraints of the phase space. As such, we have enough information to generate a phase-space sample for this particle reaction. We do this with the generate_phsp()
function. By default, this function uses TFPhaseSpaceGenerator
as a, well… phase-space generator (using tensorflow
and the phasespace
package as a back-end) and generates random numbers with TFUniformRealNumberGenerator
. You can use other generators with the arguments of generate_phsp()
.
import pandas as pd
from tensorwaves.data import generate_phsp
phsp_sample = generate_phsp(100_000, model.adapter.reaction_info)
pd.DataFrame(phsp_sample.to_pandas())
The resulting phase space sample is a EventCollection
of FourMomentumSequence
s for each particle in the final state. The to_pandas()
method can be used to cast the EventCollection
to a format that can be understood by pandas.DataFrame
.
2.2 Generate intensity-based sample#
‘Data samples’ are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a Function
object that expresses an intensity distribution as well as a phase space over which to generate that distribution. We call such a data sample an intensity-based sample.
An intensity-based sample is generated with the function generate_data()
. Its usage is similar to generate_phsp()
, but now you have to provide a Function
as well as a DataTransformer
that is used to transform the four-momentum phase space sample to a data sample that can be understood by the Function
.
Now, recall that in Step 1: Create amplitude model, we used the helicity formalism to mathematically express the reaction in terms of an amplitude model. TensorWaves needs to convert this HelicityModel
to an Model
object that it can then lambdify()
to a Function
object.
The HelicityModel
was expressed in terms of sympy
, so we express the model as a SympyModel
and lambdify it to a LambdifiedFunction
:
from tensorwaves.model import LambdifiedFunction, SympyModel # noqa: F401
sympy_model = SympyModel(
expression=model.expression,
parameters=model.parameter_defaults,
max_complexity=200,
)
%time intensity = LambdifiedFunction(sympy_model, backend="numpy")
A problem is that LambdifiedFunction
takes a DataSample
as input, not a set of four-momenta. We therefore need to construct a DataTransformer
to transform these four-momenta to function variables. In this case, we work with the helicity formalism, so we construct a HelicityTransformer
:
from tensorwaves.data.transform import HelicityTransformer
data_converter = HelicityTransformer(model.adapter)
That’s it, now we have enough info to create an intensity-based data sample. Notice how the structure of the output data is the same as the phase-space sample we generated previously:
from tensorwaves.data import generate_data
data_sample = generate_data(
size=10_000,
reaction_info=model.adapter.reaction_info,
data_transformer=data_converter,
intensity=intensity,
)
pd.DataFrame(data_sample.to_pandas())
2.3 Visualize kinematic variables#
We now have a phase space sample and an intensity-based sample. Their data structure isn’t the most informative though: it’s just a collection of four-momentum tuples. But we can again use the HelicityTransformer
to convert these four-momenta to (in the case of the helicity formalism) invariant masses and helicity angles:
phsp_set = data_converter.transform(phsp_sample)
data_set = data_converter.transform(data_sample)
list(data_set)
The DataSet
is just a mapping of kinematic variables names to a sequence of values. The numbers you see here are final state IDs as defined in the HelicityModel
member of the HelicityModel
:
for state_id, particle in model.adapter.reaction_info.final_state.items():
print(f"ID {state_id}:", particle.name)
Available kinematic variables
By default, tensorwaves
only generates invariant masses of the Topologies
that are of relevance to the decay problem. In this case, we only have resonances \(f_0 \to \pi^0\pi^0\). If you are interested in more invariant mass combinations, you can do so with the method register_topology()
.
Just like EventCollection
, the DataSet
can easily be converted it to a pandas.DataFrame
:
import numpy as np
import pandas as pd
data_frame = pd.DataFrame(data_set)
phsp_frame = pd.DataFrame(data_set)
data_frame
This also means that we can use all kinds of fancy plotting functionality of for instance matplotlib.pyplot
to see what’s going on. Here’s an example:
from matplotlib import cm
reaction_info = model.adapter.reaction_info
intermediate_states = sorted(
(
p
for p in model.particles
if p not in reaction_info.final_state.values()
and p not in reaction_info.initial_state.values()
),
key=lambda p: p.mass,
)
evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]
import matplotlib.pyplot as plt
data_frame["m_12"].hist(bins=100, alpha=0.5, density=True, figsize=(8, 4))
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();
See also
2.4 Export data sets#
TensorWaves currently has no export functionality for data samples, so we just pickle.dump()
these data samples as follows:
import pickle
with open("data_set.pickle", "wb") as stream:
pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
pickle.dump(phsp_set, stream)
In the next step, we illustrate how to optimize()
the intensity model to these data samples.