Create cells from source data and define the likelihood

Creating and fitting cells

As set out in Kerr, we create "cells" by binning in time to form light curves. For each cell, we determine its likelihood function, which is then optimized to estimate the relative signal rate for that duration.

Eqn. 2 of Kerr presents the log likelihood as a function of the incremental relative flux $\alpha$ (with $\beta=0$). The log likelihood for a cell is

$$ \displaystyle\log\mathcal{L}(\alpha)\ = \sum_w \log \big( 1 + \alpha\ w \big) - \alpha\ S \tag{1} , $$

where the sum is over the photon weights $w$ for that cell, and $S$ is the expected value for sum of weights, determined from the total sum and the fraction of the energy-weighted exposure for the cell,

$$ S = \frac{f_{cell}}{ f_{total}}\ \sum{w}, $$ where $f$ represents the energy-weighted exposure.

Fermi data is broken into distinct runs, corresponding at most to a single orbit. The effective area, or $\frac{df}{dt}$, varies by a factor of 2 or 3 for each 90-min orbit, is typically $ 3 000\ \mathrm{cm}^2$, or $0.3\ \mathrm{m^2}$. This is measured in 30-s intervals. Exposure is the sum, for each interval in contained in the cell, of the effective area times livetime in the direcion of the source. Since it depends on energy, the Kerr strategy is to perform a weighted average with a spectrum. This can be improved on, see below.

The value of $\alpha$, $\hat{\alpha}$, which maximizes the likelihood for the cell is the solution to

$$ \sum_{w} \frac{w}{1+\hat{\alpha}\ w} = S $$

So that $\hat{\alpha}=0$ corresponds to $\sum w= S$, as expected if the cell's flux is the same as the average.

The Hessian, or inverse variance, is
$$ -\frac{d^2\ \log(\mathcal{L})}{d\ \alpha^2} = \sum_{w} \frac{w^2}{(1+\hat{\alpha}\ w)^2}$$

Approximate the log if $\alpha \times \omega$ is small

Kerr uses this for the frequency derivation. Let $W=\sum w$ and $U=\sum w^2$. Then the cell's likelihood is

$$ \displaystyle\log\mathcal{L}(\alpha)\ \approx \alpha\ \ (W-S) - \tfrac{1}{2} \alpha^2\ U , \tag{2}$$

and the maximum likelihood solution is analytic: $\hat{\alpha} = (W-S)/U$ with the inverse variance simply $U$.

Accounting for spectral dependence

The development above does not consider energy. The sum over weights includes photons of all energies, and the coresponding exposure uses a weighted average of the energy-dependent effective area. The Kerr application uses a power-law spectral shape for this--since we have the actual spectrum available, we can use it instead.

We bin energy into four bands per decade. This modification breaks up a call into eight sub-cells with the same $\alpha$ value.

Also, the development above determines the count flux per cell, that is, simply counting photons. An improvement wmeasures the energy flux as well, or perhaps a spectral index factor as a factor for the assumed spectrum.

Each photon has a band index, indicating its type as Front or Back, and its energy band, one of 16 from 100 MeV to 1 TeV. (Although we only use 8 bands, up to 10 GeV). To use this information, we need the corresponding exposure.

Currently the exposure DataFrame has, for each 30-s time interval, the exposure as calculated for the assumed spectrum. This is extended to provide the exposure per energy band.

Let $l$ be the energy index. It represents a logarithmic derivative of the energy. Replace $\alpha$ with $\alpha+\zeta l$ in Equ. (2): Then the solution for $\zeta$ is

$$\zeta = \frac{\sum_l(W_l-S_l-\alpha U_l)}{\sum_l l U_l} $$

Implementation

The class CellData is created with the list of photons and exposure history for the source. It is created with a time_bins argument to describe the binning, generating a list, accessible via the property cells with the cell information needed to calculate the likelihood function.

Its view member function is used to create a new binning, by making a copy of the CellData object with the new binning.

The default binning is sequential, with start, stop, and step values. Units for start and stop are MJD, so step is in days. The conventions for interpeting the three numbers (0,0,7), the default, is all weeks from the start of data taking. This is implemented by the function time_bin_edges.

The function partition_cells is used to create a set of unequal-sized cells from a BB partition.

Special phased bins

To study long-term, many days, variations in the flux, perhaps due to systematics in the exposure, we implement a folded binning version, specified by start, period, and number of cells.

time_bin_edges[source]

time_bin_edges(config, exposure, tbin=None)

Return an interleaved array of time bin, or cell start/stop values

  • exposure -- the weighted exposure table derived from the spacecraft info and the source. Used only to establish nominal start/stop
  • tbin: an array (a,b,d), default config.time_bins to specify binning

    interpretation of a, b, d:

    a: if > 50000, interpret as MJD for start

      if < 0, back from stop
      otherwise, offset from exposure start
    
    

    b: if > 50000, interpret MJD value for stop

      if > 0, increment from start
      otherwise, offset from exposure stop
    
    

    d : if positive, the day bin size

      if 0; return contiguous bins

contiguous_bins[source]

contiguous_bins(exposure, min_gap=20, min_duration=600)

return a start/stop interleaved array for contiguous intervals

class CellData[source]

CellData(*pars, **kwargs) :: SourceData

Manage a set of cells generated from a data set

Invoke superclass to load photon data and exposure for the source.

  • time_bins, default config.time_bins. If specified, set week_range and bypass cache.

The e cell entry is the weighted exposure for the cell in units $cm^2\ Ms$.

CellData.view[source]

CellData.view(*pars, exp_min=None, no_update=False)

Return a "view", a copy of this instance with a perhaps a different set of cells

  • pars -- start, stop, step to define new binning. Or start, step, or just step start and stop are either MJD values, or offsets from the start or stop. step -- the cell size in days, or if zero, orbit-based binning

  • exp_min [None] -- If specified, a different minimum exposure, in cm^2 Ms units to use for fitting from.

  • no_update -- avoid fitting the cells if invoked by LightCurve or WtLike

concatenate_cells[source]

concatenate_cells(cells)

Combine a group of cells to one

  • cells: dataframe with cells containing n, w, S, B
      Optionally, if $t$ is present, generate t and tw
    
    Return a dict with summed n, S, B, and concatenated w

partition_cells[source]

partition_cells(config, cells, edges)

Partition a set of cells

  • cells -- A DataFrame of cells
  • edges -- a list of edge times delimiting boundaries between cells

Returns a DataFrame of combined cells, with times and widths adjusted to account for missing cells

cells = None # global CellData object used below

@ipynb_doc
def load_cell_data(source_name='Geminga', weeks=(9,12), **kwargs) :
    """
    ## Study cell formation

    Set `config.full_exp=True` to get the energy-dependent exposure. Code that does is in `exposure.sc_data_selection`

    Load first 4 weeks.

    ### Load data
    {out1}
    {fig}
    {out2}
    """
    global cells
    
    with capture_hide(f'setup: load {weeks[1]-weeks[0]+1} weeks of data for source "{source_name}"') as out1:
        source = PointSource(source_name) #'Geminga')
        cells = CellData(source, config=Config(**kwargs), week_range=weeks,key=None)
    fig, ax = plt.subplots(figsize=(3,3))
    cells.plot_concatenated(ax=ax, title=source_name);
    with capture_show('Parmeters from Poisson fit to full data set') as out2:
        L = cells.full_likelihood()
        pr = PoissonRep(L)
        print(pd.Series(pr.info()))
    return locals()
if Config().valid:
    load_cell_data(use_kerr=False, full_exp=True, verbose=2)

Study cell formation

Set config.full_exp=True to get the energy-dependent exposure. Code that does is in exposure.sc_data_selection

Load first 4 weeks.

Load data

setup: load 4 weeks of data for source "Geminga"
Loading source data, week_range=(9, 12), key=None
SourceData: 4FGL J0633.9+1746
LoadData: Loading weeks[9:12:]
Processing 4 week files week_009.pkl - week_012.pkl , using 4 processes

SourceData: Source 4FGL J0633.9+1746 with:
data: 6,148 photons from 2008-08-04 to 2008-08-27
exposure: 14,836 intervals, average effective area 3174 cm^2 for 0.4 Ms
rates: source 3.30e-06/s, background 1.07e-06/s, TS 2290923.9
CellData.rebin: Bin photon data into 3 1-week bins from 54683.0 to 54704.0

Figure 1

Parmeters from Poisson fit to full data set
flux                    1.0
ts 13396.4
errors (0.0169, -0.0168)
limit 1.0281
dtype: object

@ipynb_doc
def check_energy_exposure():
    """ ### Exposure vs. energy
    
    With the {name} data, 
    {out1}
    {out2}
    Note the `exp_fract` values, which are the fraction of the total (`exp`) in each band.
    {fig}
    """
    df = cells.exposure
    name =cells.source.name
    with capture_show('Exposure Dataframe from "df.iloc[0]"') as out1:
        print(df.iloc[0])
    with capture_hide('Exposure dataframe info') as out2:
        print(df.info())
    
    # generate array of the 8 
    t =df.apply(lambda r: np.array(r.exp * np.array(r.exp_fract), float), axis=1).values
    u = np.vstack(t)

    fig, (ax, ax2,) =plt.subplots(1,2, figsize=(10,4), sharex=True)
    plt.subplots_adjust(top=0.9, wspace=0.3)
    fig.suptitle('Exposure vs. Energy')
    ax.plot(u.sum(axis=0), 'o')
    ax.grid(alpha=0.5)
    ax.set(xlabel='Energy band', ylabel='Total Exposure')
    ax2.violinplot(u, np.arange(8), showmeans=True);
    ax2.grid(alpha=0.5)
    ax2.set(ylabel='Exposure per interval', xlabel='Energy band index')


    return locals()
if Config().valid:
    check_energy_exposure()

Exposure vs. energy

With the 4FGL J0633.9+1746 data,

Exposure Dataframe from "df.iloc[0]"
start                                             54682.656038
stop 54682.656375
livetime 25.041281
cos_theta 0.818114
exp 108394.3125
exp_fract [0.1093, 0.135, 0.2428, 0.2141, 0.1603, 0.0918...
Name: 0, dtype: object

Exposure dataframe info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14836 entries, 0 to 14835
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 start 14836 non-null float64
1 stop 14836 non-null float64
2 livetime 14836 non-null float32
3 cos_theta 14836 non-null float32
4 exp 14836 non-null float32
5 exp_fract 14836 non-null object
dtypes: float32(3), float64(2), object(1)
memory usage: 521.7+ KB
None
Note the exp_fract values, which are the fraction of the total (exp) in each band.

Figure 1
@ipynb_doc
def flux_energy():
    """
    ### Weight sum vs. energy
    The sum of weights is proportional to the flux.
    Exmamine the photon energy dependence for {nickname}.
    
    {out1}
    {fig1}
    
    ### Check flux per energy band
    Make the ratio of the sum of weights to exposure, a measure of the flux relative to the spectrum.
    {out2}
    
    {fig2}
    Loods pretty good.
    """
    df = cells.photons.copy()
    name, nickname = cells.source.name, cells.source.nickname
    with capture_hide('Photon dataframe info') as out1:
        print(df.info())
    df.loc[:,'eindex'] = (df.band.values//2).astype(int)
    # plt.subplots_adjust(wspace=0.3)
    # ax.hist(df.eindex, np.linspace(-0.5,7.5,9), log=False, histtype='step', lw=2);
    # ax.set(xlabel='Energy index', ylim=(2,None));
    
    # get the moments of the weight per energy
    gb = df.groupby('eindex').agg(
            wcount = ('weight', len),
            wsum = ('weight', np.sum), #column="B", aggfunc="min"
            wsumsq =('weight', lambda x: np.sum(x**2))
        )

    gb.loc[:,'unc'] = gb.apply(lambda r: np.sqrt(r.wsumsq)/r.wsum, axis=1)
    # wtsums = gb.sum().weight.values

    fig1, ax1 = plt.subplots(1,1, figsize=(6,4))

    ax1.plot(gb.wsum.values, 'o', label='weight sum');
    ax1.plot(gb.wcount, 'D', label='counts');
    ax1.set(xlabel='Energy band index', yscale='log');
    ax1.legend(); ax1.grid(alpha=0.5)
    
    # get exposure vs enegy
    t =cells.exposure.apply(lambda r: np.array(r.exp * np.array(r.exp_fract), float), axis=1).values
    ee = np.vstack(t).sum(axis=0)*1e-6
    with capture_hide('Table of weights and Total exposure per band') as out2:
        dfx = pd.DataFrame(dict(wtsum=gb.wsum.round(1), exp=ee.round(1), ratio=(gb.wsum/ee).round(2)))
        print(dfx)
   
        
    fig2, ax2 = plt.subplots(1,1)
    ax2.set(ylim=(0,5), xlabel='Energy band index', ylabel='flux/exposure')
    ax2.grid(alpha=0.5)
    ratio = gb.wsum/ee
    ax2.errorbar(x=range(len(ratio)), y=ratio, yerr=ratio*gb.unc, fmt='o', ms=10)
    # ax2.plot(gb.wsum/ee, 'o');
    
    return locals()
if Config().valid:
    flux_energy()

Weight sum vs. energy

The sum of weights is proportional to the flux. Exmamine the photon energy dependence for PSR J0633+1746.

Photon dataframe info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6148 entries, 0 to 6147
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 band 6148 non-null uint8
1 nest_index 6148 non-null uint32
2 weight 6148 non-null float64
3 time 6148 non-null float64
dtypes: float64(2), uint32(1), uint8(1)
memory usage: 126.2 KB
None

Figure 1

Check flux per energy band

Make the ratio of the sum of weights to exposure, a measure of the flux relative to the spectrum.

Table of weights and Total exposure per band
         wtsum    exp  ratio
eindex
0 458.8 143.9 3.19
1 580.7 181.6 3.20
2 1121.4 344.2 3.26
3 980.8 307.7 3.19
4 810.4 231.3 3.50
5 448.9 131.8 3.41
6 182.6 52.9 3.45
7 60.9 14.1 4.33

Figure 2
Loods pretty good.