In [None]:
%%capture
%config Completer.use_jedi = False
%config InlineBackend.figure_formats = ['svg']

# Install on Google Colab
import subprocess
import sys

install_packages = "google.colab" in str(get_ipython())
if install_packages:
    for package in ["tensorwaves", "graphviz"]:
        subprocess.check_call(
            [sys.executable, "-m", "pip", "install", package]
        )

# Step 2: Generate data samples

In this section, we will use the {class}`~ampform.helicity.HelicityModel` that we created with {mod}`ampform` in [the previous step](step1) to generate a data sample via hit & miss Monte Carlo. We do this with the {mod}`.data` module.

First, we {func}`~pickle.load` the {class}`~ampform.helicity.HelicityModel` that was created in the previous step:

In [None]:
import pickle

with open("helicity_model.pickle", "rb") as model_file:
    model = pickle.load(model_file)

In [None]:
reaction_info = model.adapter.reaction_info
initial_state = next(iter(reaction_info.initial_state.values()))
print("Initial state:")
print(" ", initial_state.name)
print("Final state:")
for i, p in reaction_info.final_state.items():
    print(f"  {i}: {p.name}")

## 2.1 Generate phase space sample

The {class}`~ampform.kinematics.ReactionInfo` class defines the constraints of the phase space. As such, we have enough information to generate a **phase-space sample** for this particle reaction. We do this with the {func}`.generate_phsp` function. By default, this function uses {class}`.TFPhaseSpaceGenerator` as a, well... phase-space generator (using {obj}`tensorflow <tf.Tensor>` and the [`phasespace`](https://phasespace.readthedocs.io) package as a back-end) and generates random numbers with {class}`.TFUniformRealNumberGenerator`. You can use other generators with the arguments of {func}`.generate_phsp`.

In [None]:
import pandas as pd
from tensorwaves.data import generate_phsp

phsp_sample = generate_phsp(300_000, model.adapter.reaction_info)
pd.DataFrame(phsp_sample.to_pandas())

The resulting phase space sample is a {class}`~ampform.data.EventCollection` of {class}`~ampform.data.FourMomentumSequence`s for each particle in the final state. The {meth}`~ampform.data.EventCollection.to_pandas` method can be used to cast the {class}`~ampform.data.EventCollection` to a format that can be understood by {class}`pandas.DataFrame`.

## 2.2 Generate intensity-based sample

'Data samples' are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a {class}`.Function` object that expresses an intensity distribution as well as a phase space over which to generate that distribution. We call such a data sample an **intensity-based sample**.

An intensity-based sample is generated with the function {func}`.generate_data`. Its usage is similar to {func}`.generate_phsp`, but now you have to provide a {obj}`.Function` as well as a {obj}`.DataTransformer` that is used to transform the four-momentum phase space sample to a data sample that can be understood by the {obj}`.Function`.

Now, recall that in {doc}`step1`, we used the helicity formalism to mathematically express the reaction in terms of an amplitude model. TensorWaves needs to convert this {obj}`~ampform.helicity.HelicityModel` to an {class}`.Model` object that it can then {meth}`~.Model.lambdify` to a {obj}`.Function` object.

The {obj}`~ampform.helicity.HelicityModel`  was expressed in terms of {mod}`sympy`, so we express the model as a {class}`.SympyModel` and lambdify it to a {class}`.LambdifiedFunction`:

In [None]:
from tensorwaves.model import LambdifiedFunction, SympyModel

sympy_model = SympyModel(
    expression=model.expression,
    parameters=model.parameter_defaults,
)
intensity = LambdifiedFunction(sympy_model, backend="numpy")

A problem is that {class}`.LambdifiedFunction` takes a {obj}`.DataSample` as input, not a set of four-momenta. We therefore need to construct a {class}`.DataTransformer` to transform these four-momenta to function variables. In this case, we work with the helicity formalism, so we construct a {class}`.HelicityTransformer`:

In [None]:
from tensorwaves.data.transform import HelicityTransformer

data_converter = HelicityTransformer(model.adapter)

That's it, now we have enough info to create an intensity-based data sample. Notice how the structure of the output data is the same as the {ref}`phase-space sample we generated previously <usage/step2:2.1 Generate phase space sample>`:

In [None]:
from tensorwaves.data import generate_data

data_sample = generate_data(
    size=30_000,
    reaction_info=model.adapter.reaction_info,
    data_transformer=data_converter,
    intensity=intensity,
)
pd.DataFrame(data_sample.to_pandas())

## 2.3 Visualize kinematic variables

We now have a phase space sample and an intensity-based sample. Their data structure isn't the most informative though: it's just a collection of four-momentum tuples. But we can again use the {class}`.HelicityTransformer` to convert these four-momenta to (in the case of the helicity formalism) invariant masses and helicity angles:

In [None]:
phsp_set = data_converter.transform(phsp_sample)
data_set = data_converter.transform(data_sample)
list(data_set)

The {obj}`~ampform.data.DataSet` is just a mapping of kinematic variables names to a sequence of values. The numbers you see here are final state IDs as defined in the {class}`~ampform.helicity.HelicityModel` member of the {class}`~ampform.helicity.HelicityModel`:

In [None]:
for state_id, particle in model.adapter.reaction_info.final_state.items():
    print(f"ID {state_id}:", particle.name)

````{admonition} Available kinematic variables
---
class: dropdown
---
By default, {mod}`tensorwaves` only generates invariant masses of the {class}`Topologies <qrules.topology.Topology>` that are of relevance to the decay problem. In this case, we only have resonances $f_0 \to \pi^0\pi^0$. If you are interested in more invariant mass combinations, you can do so with the method {meth}`~ampform.kinematics.HelicityAdapter.register_topology`.
````

Just like {obj}`~ampform.data.EventCollection`, the {obj}`~ampform.data.DataSet` can easily be converted it to a {class}`pandas.DataFrame`:

In [None]:
import numpy as np
import pandas as pd

data_frame = pd.DataFrame(data_set.to_pandas())
phsp_frame = pd.DataFrame(data_set.to_pandas())
data_frame

This also means that we can use all kinds of fancy plotting functionality of for instance {mod}`matplotlib.pyplot` to see what's going on. Here's an example:

In [None]:
from matplotlib import cm

reaction_info = model.adapter.reaction_info
intermediate_states = sorted(
    (
        p
        for p in model.particles
        if p not in reaction_info.final_state.values()
        and p not in reaction_info.initial_state.values()
    ),
    key=lambda p: p.mass,
)

evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]

In [None]:
import matplotlib.pyplot as plt

data_frame["m_12"].hist(bins=100, alpha=0.5, density=True, figsize=(8, 4))
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
    plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();

:::{seealso}

{ref}`usage/step3:Intensity components`

:::

## 2.4 Export data sets

TensorWaves currently has no export functionality for data samples, so we just {func}`pickle.dump` these data samples as follows:

In [None]:
import pickle

with open("data_set.pickle", "wb") as stream:
    pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
    pickle.dump(phsp_set, stream)

In the {doc}`next step <step3>`, we illustrate how to {meth}`~.Minuit2.optimize` the intensity model to these data samples.