Step 2: Generate data samples¶
In this section, we will use the HelicityModel
that we created with ampform
in the previous step to generate a data sample via hit & miss Monte Carlo. We do this with the data
module.
First, we load()
the HelicityModel
that was created in the previous step:
import pickle
with open("helicity_model.pickle", "rb") as model_file:
model = pickle.load(model_file)
reaction_info = model.adapter.reaction_info
initial_state = next(iter(reaction_info.initial_state.values()))
print("Initial state:")
print(" ", initial_state.name)
print("Final state:")
for i, p in reaction_info.final_state.items():
print(f" {i}: {p.name}")
Initial state:
J/psi(1S)
Final state:
0: gamma
1: pi0
2: pi0
2.1 Generate phase space sample¶
The ReactionInfo
class defines the constraints of the phase space. As such, we have enough information to generate a phase-space sample for this particle reaction. We do this with the generate_phsp()
function. By default, this function uses TFPhaseSpaceGenerator
as a, well… phase-space generator (using tensorflow
and the phasespace
package as a back-end) and generates random numbers with TFUniformRealNumberGenerator
. You can use other generators with the arguments of generate_phsp()
.
As opposed to the main Generate data sample of the main usage example page, we will generate a deterministic data sample. This can be done by feeding a UniformRealNumberGenerator
with a specific UniformRealNumberGenerator.seed
and feeding that generator to the generate_phsp()
function:
import numpy as np
import pandas as pd
from tensorwaves.data import TFUniformRealNumberGenerator, generate_phsp
rng = TFUniformRealNumberGenerator(seed=0)
reaction_info = model.adapter.reaction_info
initial_state_mass = reaction_info.initial_state[-1].mass
final_state_masses = {i: p.mass for i, p in reaction_info.final_state.items()}
phsp_sample = generate_phsp(
size=100_000,
initial_state_mass=initial_state_mass,
final_state_masses=final_state_masses,
random_generator=rng,
)
pd.DataFrame(
{
(k, label): np.transpose(v)[i]
for k, v in phsp_sample.items()
for i, label in enumerate(["E", "px", "py", "pz"])
}
)
0 | 1 | 2 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
E | px | py | pz | E | px | py | pz | E | px | py | pz | |
0 | 0.811317 | 0.787621 | 0.144327 | 0.130609 | 1.092959 | -0.019148 | -0.531113 | -0.945459 | 1.192624 | -0.768473 | 0.386786 | 0.814849 |
1 | 0.602303 | -0.343019 | -0.367933 | 0.331259 | 1.431153 | 1.173457 | -0.078558 | -0.804244 | 1.063444 | -0.830437 | 0.446491 | 0.472984 |
2 | 1.165704 | 0.115778 | 0.666757 | -0.949155 | 1.443427 | 0.192011 | -0.593016 | 1.294884 | 0.487770 | -0.307789 | -0.073741 | -0.345729 |
3 | 0.593783 | 0.392870 | 0.407210 | -0.180034 | 1.246726 | 0.037716 | 0.106090 | 1.234273 | 1.256391 | -0.430586 | -0.513300 | -1.054238 |
4 | 0.864272 | -0.668511 | 0.516082 | 0.183623 | 0.754404 | -0.716281 | -0.135528 | 0.139574 | 1.478224 | 1.384792 | -0.380554 | -0.323197 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
99995 | 0.472639 | 0.250787 | 0.395665 | 0.062791 | 1.126885 | 0.041148 | 0.795625 | 0.785454 | 1.497377 | -0.291935 | -1.191290 | -0.848245 |
99996 | 1.330392 | 1.185399 | 0.538998 | -0.272492 | 0.419241 | -0.033333 | 0.139318 | 0.370167 | 1.347267 | -1.152066 | -0.678316 | -0.097675 |
99997 | 0.864709 | 0.258491 | 0.294117 | -0.770973 | 1.444051 | -0.804658 | 0.065531 | 1.189662 | 0.788140 | 0.546167 | -0.359648 | -0.418689 |
99998 | 0.607279 | 0.563725 | 0.219317 | -0.053868 | 0.968601 | 0.899961 | 0.061514 | 0.325968 | 1.521020 | -1.463686 | -0.280831 | -0.272100 |
99999 | 1.148659 | -0.481566 | 0.715177 | 0.758969 | 1.291588 | 0.469209 | -1.158960 | -0.294342 | 0.656653 | 0.012357 | 0.443783 | -0.464627 |
100000 rows × 12 columns
The resulting phase space sample is a dict
of final state IDs to an array
of four-momenta. In the last step, we converted this sample in such a way that it is rendered as an understandable pandas.DataFrame
.
2.2 Generate intensity-based sample¶
‘Data samples’ are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a Function
object that expresses an intensity distribution as well as a phase space over which to generate that distribution. We call such a data sample an intensity-based sample.
An intensity-based sample is generated with the function generate_data()
. Its usage is similar to generate_phsp()
, but now you have to provide a Function
as well as a DataTransformer
that is used to transform the four-momentum phase space sample to a data sample that can be understood by the Function
.
Now, recall that in Step 1: Create amplitude model, we used the helicity formalism to mathematically express the reaction in terms of an amplitude model. TensorWaves needs to convert this HelicityModel
to an Model
object that it can then lambdify()
to a Function
object.
The HelicityModel
was expressed in terms of sympy
, so we express the model as a SympyModel
and lambdify it to a LambdifiedFunction
:
from tensorwaves.model import LambdifiedFunction, SympyModel
sympy_model = SympyModel(
expression=model.expression.doit(),
parameters=model.parameter_defaults,
max_complexity=200,
)
%time intensity = LambdifiedFunction(sympy_model, backend="numpy")
CPU times: user 1.03 s, sys: 12.2 ms, total: 1.04 s
Wall time: 1.03 s
A problem is that LambdifiedFunction
takes a DataSample
as input, not a set of four-momenta. We therefore need to construct a DataTransformer
to transform these four-momenta to function variables. In this case, we work with the helicity formalism, so we construct a HelicityTransformer
:
from tensorwaves.data.transform import HelicityTransformer
data_converter = HelicityTransformer(model.adapter)
That’s it, now we have enough info to create an intensity-based data sample. Notice how the structure of the output data is the same as the phase-space sample we generated previously:
from tensorwaves.data import generate_data
data_sample = generate_data(
size=10_000,
initial_state_mass=initial_state_mass,
final_state_masses=final_state_masses,
data_transformer=data_converter,
intensity=intensity,
random_generator=rng,
)
pd.DataFrame(
{
(k, label): np.transpose(v)[i]
for k, v in data_sample.items()
for i, label in enumerate(["E", "px", "py", "pz"])
}
)
0 | 1 | 2 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
E | px | py | pz | E | px | py | pz | E | px | py | pz | |
0 | 1.185259 | 0.830812 | -0.168623 | 0.828345 | 1.238715 | -0.383917 | 0.626122 | -0.988319 | 0.672927 | -0.446895 | -0.457499 | 0.159974 |
1 | 1.494125 | -0.195880 | -1.307833 | -0.695422 | 0.824596 | 0.072240 | 0.798756 | 0.136049 | 0.778180 | 0.123640 | 0.509077 | 0.559374 |
2 | 1.187149 | 0.751684 | -0.359411 | -0.845646 | 0.862093 | 0.159626 | -0.139034 | 0.824727 | 1.047659 | -0.911310 | 0.498444 | 0.020920 |
3 | 1.326583 | -1.116556 | -0.650988 | -0.298899 | 1.416935 | 0.996462 | 0.954615 | 0.291996 | 0.353382 | 0.120094 | -0.303627 | 0.006904 |
4 | 1.144748 | -0.136758 | 0.110361 | 1.131178 | 0.762357 | 0.483725 | -0.565345 | -0.096772 | 1.189795 | -0.346967 | 0.454985 | -1.034407 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9995 | 1.191925 | -0.120250 | 0.389471 | 1.120062 | 0.461045 | -0.413191 | 0.011598 | 0.153240 | 1.443929 | 0.533441 | -0.401070 | -1.273302 |
9996 | 1.493273 | 1.011846 | 0.832871 | -0.715791 | 0.353850 | -0.265738 | 0.010368 | 0.190438 | 1.249778 | -0.746108 | -0.843239 | 0.525353 |
9997 | 0.930555 | -0.480388 | -0.190934 | -0.773760 | 1.092046 | -0.578604 | 0.204632 | 0.893134 | 1.074300 | 1.058992 | -0.013698 | -0.119374 |
9998 | 1.142834 | -0.684490 | 0.729063 | 0.553183 | 0.844822 | -0.400886 | -0.592552 | -0.428578 | 1.109243 | 1.085375 | -0.136511 | -0.124606 |
9999 | 1.074710 | -0.842913 | -0.582449 | -0.324426 | 0.752693 | -0.024504 | 0.454963 | -0.583726 | 1.269497 | 0.867417 | 0.127486 | 0.908152 |
10000 rows × 12 columns
As before, we use a UniformRealNumberGenerator
with a specific UniformRealNumberGenerator.seed
to ensure we get a deterministic data sample.
2.3 Visualize kinematic variables¶
We now have a phase space sample and an intensity-based sample. Their data structure isn’t the most informative though: it’s just a collection of four-momentum tuples. But we can again use the HelicityTransformer
to convert these four-momenta to (in the case of the helicity formalism) invariant masses and helicity angles:
phsp_set = data_converter.transform(phsp_sample)
data_set = data_converter.transform(data_sample)
list(data_set)
['phi_1+2',
'theta_1+2',
'phi_1,1+2',
'theta_1,1+2',
'm_012',
'm_0',
'm_1',
'm_2',
'm_12']
The DataSet
is just a mapping of kinematic variables names to a sequence of values. The numbers you see here are final state IDs as defined in the HelicityModel
member of the HelicityModel
:
for (
state_id,
particle,
) in model.adapter.reaction_info.final_state.items():
print(f"ID {state_id}:", particle.name)
ID 0: gamma
ID 1: pi0
ID 2: pi0
Available kinematic variables
By default, tensorwaves
only generates invariant masses of the Topologies
that are of relevance to the decay problem. In this case, we only have resonances \(f_0 \to \pi^0\pi^0\). If you are interested in more invariant mass combinations, you can do so with the method register_topology()
.
The DataSet
can easily be converted it to a pandas.DataFrame
:
import pandas as pd
data_frame = pd.DataFrame(data_set)
phsp_frame = pd.DataFrame(data_set)
data_frame
phi_1+2 | theta_1+2 | phi_1,1+2 | theta_1,1+2 | m_012 | m_0 | m_1 | m_2 | m_12 | |
---|---|---|---|---|---|---|---|---|---|
0 | 2.941350 | 2.344617 | -0.984419 | 1.064114 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 1.499845 |
1 | 1.422127 | 1.086667 | 0.183725 | 1.535691 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 0.580070 |
2 | 2.695585 | 0.777978 | 3.063622 | 1.730394 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 1.495937 |
3 | 0.527850 | 1.343530 | 1.515685 | 0.602596 | 3.0969 | 0.000000e+00+2.580957e-08j | 0.134977 | 0.134977 | 1.172263 |
4 | -0.678981 | 2.987470 | -2.951556 | 1.959462 | 3.0969 | 2.107342e-08+0.000000e+00j | 0.134977 | 0.134977 | 1.581282 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9995 | -1.271331 | 2.792571 | -1.387495 | 2.565453 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 1.486016 |
9996 | -2.452912 | 1.070889 | -1.957086 | 2.313677 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 0.584599 |
9997 | 0.378314 | 0.588987 | 2.711496 | 1.551541 | 3.0969 | 1.490116e-08+0.000000e+00j | 0.134977 | 0.134977 | 1.956302 |
9998 | -0.816920 | 2.076068 | -1.166315 | 1.807813 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 1.585024 |
9999 | 0.604657 | 1.264140 | 0.553347 | 2.079405 | 3.0969 | 0.000000e+00+0.000000e+00j | 0.134977 | 0.134977 | 1.712966 |
10000 rows × 9 columns
This also means that we can use all kinds of fancy plotting functionality of for instance matplotlib.pyplot
to see what’s going on. Here’s an example:
from matplotlib import cm
reaction_info = model.adapter.reaction_info
intermediate_states = sorted(
(
p
for p in model.particles
if p not in reaction_info.final_state.values()
and p not in reaction_info.initial_state.values()
),
key=lambda p: p.mass,
)
evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]
import matplotlib.pyplot as plt
data_frame["m_12"].hist(bins=100, alpha=0.5, density=True, figsize=(9, 4))
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();
See also
2.4 Export data sets¶
TensorWaves currently has no export functionality for data samples, so we just pickle.dump()
these data samples as follows:
import pickle
with open("data_set.pickle", "wb") as stream:
pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
pickle.dump(phsp_set, stream)
In the next step, we illustrate how to optimize()
the intensity model to these data samples.