Creating and fitting cells

As set out in Kerr, we create "cells" by binning in time to form light curves. For each cell, we determine its likelihood function, which is then optimized to estimate the relative signal rate for that duration.

Eqn. 2 of Kerr presents the log likelihood as a function of the incremental relative flux $\alpha$ (with $\beta=0$). The log likelihood for a cell is

$$ \displaystyle\log\mathcal{L}(\alpha)\ = \sum_w \log \big( 1 + \alpha\ w \big) - \alpha\ S \tag{1} , $$

where the sum is over the photon weights $w$ for that cell, and $S$ is the expected value for sum of weights, determined from the total sum and the fraction of the energy-weighted exposure for the cell,

$$ S = \frac{f_{cell}}{ f_{total}}\ \sum{w}, $$ where $f$ represents the energy-weighted exposure.

Fermi data is broken into distinct runs, corresponding at most to a single orbit. The effective area, or $\frac{df}{dt}$, varies by a factor of 2 or 3 for each 90-min orbit, is typically $ 3 000\ \mathrm{cm}^2$, or $0.3\ \mathrm{m^2}$. This is measured in 30-s intervals. Exposure is the sum, for each interval in contained in the cell, of the effective area times livetime in the direcion of the source. Since it depends on energy, the Kerr strategy is to perform a weighted average with a spectrum. This can be improved on, see below.

The value of $\alpha$, $\hat{\alpha}$, which maximizes the likelihood for the cell is the solution to

$$ \sum_{w} \frac{w}{1+\hat{\alpha}\ w} = S $$

So that $\hat{\alpha}=0$ corresponds to $\sum w= S$, as expected if the cell's flux is the same as the average.

The Hessian, or inverse variance, is
$$ -\frac{d^2\ \log(\mathcal{L})}{d\ \alpha^2} = \sum_{w} \frac{w^2}{(1+\hat{\alpha}\ w)^2}$$

Approximate the log if $\alpha \times \omega$ is small

Kerr uses this for the frequency derivation. Let $W=\sum w$ and $U=\sum w^2$. Then the cell's likelihood is

$$ \displaystyle\log\mathcal{L}(\alpha)\ \approx \alpha\ \ (W-S) - \tfrac{1}{2} \alpha^2\ U , \tag{2}$$

and the maximum likelihood solution is analytic: $\hat{\alpha} = (W-S)/U$ with the inverse variance simply $U$.

Accounting for spectral dependence

The development above does not consider energy. The sum over weights includes photons of all energies, and the coresponding exposure uses a weighted average of the energy-dependent effective area. The Kerr application uses a power-law spectral shape for this--since we have the actual spectrum available, we can use it instead.

We bin energy into four bands per decade. This modification breaks up a call into eight sub-cells with the same $\alpha$ value.

Also, the development above determines the count flux per cell, that is, simply counting photons. An improvement wmeasures the energy flux as well, or perhaps a spectral index factor as a factor for the assumed spectrum.

Each photon has a band index, indicating its type as Front or Back, and its energy band, one of 16 from 100 MeV to 1 TeV. (Although we only use 8 bands, up to 10 GeV). To use this information, we need the corresponding exposure.

Currently the exposure DataFrame has, for each 30-s time interval, the exposure as calculated for the assumed spectrum. This is extended to provide the exposure per energy band.

Let $l$ be the energy index. It represents a logarithmic derivative of the energy. Replace $\alpha$ with $\alpha+\zeta l$ in Equ. (2): Then the solution for $\zeta$ is

$$\zeta = \frac{\sum_l(W_l-S_l-\alpha U_l)}{\sum_l l U_l} $$

Implementation

The class CellData is created with the list of photons and exposure history for the source. It is created with a time_bins argument to describe the binning, generating a list, accessible via the property cells with the cell information needed to calculate the likelihood function.

Its view member function is used to create a new binning, by making a copy of the CellData object with the new binning.

The default binning is sequential, with start, stop, and step values. Units for start and stop are MJD, so step is in days. The conventions for interpeting the three numbers (0,0,7), the default, is all weeks from the start of data taking. This is implemented by the function time_bin_edges.

The function partition_cells is used to create a set of unequal-sized cells from a BB partition.

Special phased bins

To study long-term, many days, variations in the flux, perhaps due to systematics in the exposure, we implement a folded binning version, specified by start, period, and number of cells.

  if < 0, back from stop
  otherwise, offset from exposure start

  if > 0, increment from start
  otherwise, offset from exposure stop

  if 0; return contiguous bins

  Optionally, if $t$ is present, generate t and tw

cells = None # global CellData object used below

@ipynb_doc
def load_cell_data(source_name='Geminga', weeks=(9,12), **kwargs) :
    """
    ## Study cell formation

    Set `config.full_exp=True` to get the energy-dependent exposure. Code that does is in `exposure.sc_data_selection`

    Load first 4 weeks.

    ### Load data
    {out1}
    {fig}
    {out2}
    """
    global cells
    
    with capture_hide(f'setup: load {weeks[1]-weeks[0]+1} weeks of data for source "{source_name}"') as out1:
        source = PointSource(source_name) #'Geminga')
        cells = CellData(source, config=Config(**kwargs), week_range=weeks,key=None)
    fig, ax = plt.subplots(figsize=(3,3))
    cells.plot_concatenated(ax=ax, title=source_name);
    with capture_show('Parmeters from Poisson fit to full data set') as out2:
        L = cells.full_likelihood()
        pr = PoissonRep(L)
        print(pd.Series(pr.info()))
    return locals()
if Config().valid:
    load_cell_data(use_kerr=False, full_exp=True, verbose=2)

Loading source data, week_range=(9, 12), key=NoneSourceData:  4FGL J0633.9+1746LoadData: Loading weeks[9:12:]    Processing 4 week files week_009.pkl - week_012.pkl , using 4 processes SourceData: Source 4FGL J0633.9+1746 with:   data:         6,148 photons from 2008-08-04 to 2008-08-27   exposure:    14,836 intervals,  average effective area 3174 cm^2 for 0.4 Ms     rates:  source 3.30e-06/s, background 1.07e-06/s, TS 2290923.9CellData.rebin: Bin photon data into 3 1-week bins from 54683.0 to 54704.0

flux                    1.0ts                  13396.4errors    (0.0169, -0.0168)limit                1.0281dtype: object

@ipynb_doc
def check_energy_exposure():
    """ ### Exposure vs. energy
    
    With the {name} data, 
    {out1}
    {out2}
    Note the `exp_fract` values, which are the fraction of the total (`exp`) in each band.
    {fig}
    """
    df = cells.exposure
    name =cells.source.name
    with capture_show('Exposure Dataframe from "df.iloc[0]"') as out1:
        print(df.iloc[0])
    with capture_hide('Exposure dataframe info') as out2:
        print(df.info())
    
    # generate array of the 8 
    t =df.apply(lambda r: np.array(r.exp * np.array(r.exp_fract), float), axis=1).values
    u = np.vstack(t)

    fig, (ax, ax2,) =plt.subplots(1,2, figsize=(10,4), sharex=True)
    plt.subplots_adjust(top=0.9, wspace=0.3)
    fig.suptitle('Exposure vs. Energy')
    ax.plot(u.sum(axis=0), 'o')
    ax.grid(alpha=0.5)
    ax.set(xlabel='Energy band', ylabel='Total Exposure')
    ax2.violinplot(u, np.arange(8), showmeans=True);
    ax2.grid(alpha=0.5)
    ax2.set(ylabel='Exposure per interval', xlabel='Energy band index')


    return locals()
if Config().valid:
    check_energy_exposure()

start                                             54682.656038stop                                              54682.656375livetime                                             25.041281cos_theta                                             0.818114exp                                                108394.3125exp_fract    [0.1093, 0.135, 0.2428, 0.2141, 0.1603, 0.0918...Name: 0, dtype: object

<class 'pandas.core.frame.DataFrame'>RangeIndex: 14836 entries, 0 to 14835Data columns (total 6 columns): #   Column     Non-Null Count  Dtype  ---  ------     --------------  -----   0   start      14836 non-null  float64 1   stop       14836 non-null  float64 2   livetime   14836 non-null  float32 3   cos_theta  14836 non-null  float32 4   exp        14836 non-null  float32 5   exp_fract  14836 non-null  object dtypes: float32(3), float64(2), object(1)memory usage: 521.7+ KBNone

@ipynb_doc
def flux_energy():
    """
    ### Weight sum vs. energy
    The sum of weights is proportional to the flux.
    Exmamine the photon energy dependence for {nickname}.
    
    {out1}
    {fig1}
    
    ### Check flux per energy band
    Make the ratio of the sum of weights to exposure, a measure of the flux relative to the spectrum.
    {out2}
    
    {fig2}
    Loods pretty good.
    """
    df = cells.photons.copy()
    name, nickname = cells.source.name, cells.source.nickname
    with capture_hide('Photon dataframe info') as out1:
        print(df.info())
    df.loc[:,'eindex'] = (df.band.values//2).astype(int)
    # plt.subplots_adjust(wspace=0.3)
    # ax.hist(df.eindex, np.linspace(-0.5,7.5,9), log=False, histtype='step', lw=2);
    # ax.set(xlabel='Energy index', ylim=(2,None));
    
    # get the moments of the weight per energy
    gb = df.groupby('eindex').agg(
            wcount = ('weight', len),
            wsum = ('weight', np.sum), #column="B", aggfunc="min"
            wsumsq =('weight', lambda x: np.sum(x**2))
        )

    gb.loc[:,'unc'] = gb.apply(lambda r: np.sqrt(r.wsumsq)/r.wsum, axis=1)
    # wtsums = gb.sum().weight.values

    fig1, ax1 = plt.subplots(1,1, figsize=(6,4))

    ax1.plot(gb.wsum.values, 'o', label='weight sum');
    ax1.plot(gb.wcount, 'D', label='counts');
    ax1.set(xlabel='Energy band index', yscale='log');
    ax1.legend(); ax1.grid(alpha=0.5)
    
    # get exposure vs enegy
    t =cells.exposure.apply(lambda r: np.array(r.exp * np.array(r.exp_fract), float), axis=1).values
    ee = np.vstack(t).sum(axis=0)*1e-6
    with capture_hide('Table of weights and Total exposure per band') as out2:
        dfx = pd.DataFrame(dict(wtsum=gb.wsum.round(1), exp=ee.round(1), ratio=(gb.wsum/ee).round(2)))
        print(dfx)
   
        
    fig2, ax2 = plt.subplots(1,1)
    ax2.set(ylim=(0,5), xlabel='Energy band index', ylabel='flux/exposure')
    ax2.grid(alpha=0.5)
    ratio = gb.wsum/ee
    ax2.errorbar(x=range(len(ratio)), y=ratio, yerr=ratio*gb.unc, fmt='o', ms=10)
    # ax2.plot(gb.wsum/ee, 'o');
    
    return locals()
if Config().valid:
    flux_energy()

<class 'pandas.core.frame.DataFrame'>RangeIndex: 6148 entries, 0 to 6147Data columns (total 4 columns): #   Column      Non-Null Count  Dtype  ---  ------      --------------  -----   0   band        6148 non-null   uint8   1   nest_index  6148 non-null   uint32  2   weight      6148 non-null   float64 3   time        6148 non-null   float64dtypes: float64(2), uint32(1), uint8(1)memory usage: 126.2 KBNone

         wtsum    exp  ratioeindex                      0        458.8  143.9   3.191        580.7  181.6   3.202       1121.4  344.2   3.263        980.8  307.7   3.194        810.4  231.3   3.505        448.9  131.8   3.416        182.6   52.9   3.457         60.9   14.1   4.33

Manage cell data

Creating and fitting cells

Approximate the log if $\alpha \times \omega$ is small

Accounting for spectral dependence

Implementation

Special phased bins

`time_bin_edges`[source]

`contiguous_bins`[source]

`class` `CellData`[source]

`CellData.view`[source]

`concatenate_cells`[source]

`partition_cells`[source]

Study cell formation

Load data

Exposure vs. energy

Weight sum vs. energy

Check flux per energy band

Manage cell data

Creating and fitting cells

Approximate the log if $\alpha \times \omega$ is small

Accounting for spectral dependence

Implementation

Special phased bins

time_bin_edges[source]

contiguous_bins[source]

class CellData[source]

CellData.view[source]

concatenate_cells[source]

partition_cells[source]

Study cell formation

Load data

Exposure vs. energy

Weight sum vs. energy

Check flux per energy band

`time_bin_edges`[source]

`contiguous_bins`[source]

`class` `CellData`[source]

`CellData.view`[source]

`concatenate_cells`[source]

`partition_cells`[source]