Tutorial¶

Basic SSINS Construction¶

There are three main classes in the software: SS (sky_subtract), INS (incoherent_noise_spectrum), and MF (match_filter). We will use these classes to navigate the various steps in the SSINS process detailed in the paper (https://arxiv.org/abs/1906.01093).

Generating the sky-subtracted visibilities¶

(a) Initializing an SS object and reading in raw data¶

::

>>> from SSINS import SS

>>> # The SS object is a subclass of a UVData object, and therefore has all of its attributes and methods
>>> # We initialize it identically
>>> # See UVData documentation on https://pyuvdata.readthedocs.io/en/latest/
>>> ss = SS()

>>> # Read data by specifying a filepath as an argument to the read method
>>> filepath = 'SSINS/data/1061313128_99bl_1pol_half_time.uvfits'
>>> # By default, the visibilities are NOT differenced in time on read (see paper). This is for compatibility with multi-file reading.
>>> ss.read(filepath, diff=True)

(b) Passing keyword arguments to SS.read¶

::

>>> import numpy as np

>>> # SS.read is actually a small wrapper around UVData.read; they share keywords
>>> # In particular, select on read and reading only metadata function as usual (see UVData.select documentation)
>>> ss = SS()
>>> ss.read(filepath, read_data=False)

>>> # The following lines make use of the time_array attribute (metadata) to
>>> # read in all but the first and last integrations
>>> times = np.unique(ss.time_array)[1:-1]
>>> ss.read(filepath, read_data=True, times=times, diff=True)

(c) Applying flags¶

::

>>> # SS.data_array is a numpy masked array. To "apply flags" is to change the mask of the data_array.
>>> # The proper way to apply flags to the sky-subtracted data is to use the apply_flags method
>>> # To apply the original flags in the raw data file, make the following call
>>> ss.apply_flags(flag_choice='original')
>>> # Note that the original flags are always stored in the flag_array attribute
>>> # The flag_choice keyword is stored in an attribute
>>> print(ss.flag_choice)
original

>>> # You can apply flags from a custom flag array that is the same shape as the data
>>> custom = np.zeros(ss.data_array.shape, dtype=bool)
>>> # Let us make it so that only the first frequency channel is flagged and nothing else
>>> custom[:, 0, 0, :] = True
>>> # Apply these flags in the following way
>>> ss.apply_flags(flag_choice='custom', custom=custom)
>>> print(ss.flag_choice)
custom

>>> # Unflag the data by setting flag_choice=None (note this is actually the default!!)
>>> ss.apply_flags(flag_choice=None)
>>> # Check if anything is flagged, for demonstration purposes
>>> print(np.any(ss.data_array.mask))
False

(d) Plotting using Catalog_Plot¶

::

>>> from SSINS import Catalog_Plot as cp
>>> import os

>>> # The Catalog_Plot library contains wrappers around plot_lib functions for basic plotting needs
>>> # See the documentation: https://ssins.readthedocs.io/en/latest/Catalog_Plot.html
>>> # Each function in Catalog_Plot requires a class instance and a filename prefix as arguments (a suffix is appended by the wrapper)
>>> # Whatever unique identifying information for the plot should be specified in the prefix
>>> prefix = 'SSINS/data/test_data'

>>> # To make a Histogram of the Visibility Differences (a VDH, figure 1 of paper), and save it as a pdf, do the following
>>> # This also plots a fit estimated from the data
>>> cp.VDH_plot(ss, prefix, file_ext='pdf', post_flag=False)
>>> # Check to see that the file exists
>>> print(os.path.exists('%s_VDH.pdf' % (prefix)))
True

>>> # Let's apply flags and plot the flagged data alongside the unflagged data, without fits
>>> # We also want legend labels and a legend
>>> ss.apply_flags('original')
>>> new_prefix = '%s_flag_unflag_nofits' % prefix
>>> cp.VDH_plot(ss, new_prefix, file_ext='pdf', pre_flag=True,
...             post_flag=True, pre_model=False, post_model=False,
...             post_label='Post-Flag Data', pre_label='Pre-Flag Data',
...             legend=True)
>>> print(os.path.exists('%s_VDH.pdf' % (new_prefix)))
True

Making and writing an incoherent noise spectrum¶

(a) Making an incoherent noise spectrum from sky-subtracted data¶

::

>>> from SSINS import INS

>>> # Making an INS from sky-subtracted data is as simple as passing an SS instance as an argument
>>> ins = INS(ss)
>>> # This averages the amplitudes of the sky-subtracted data over the baselines, taking into account flags that were applied

(b) Making an incoherent noise spectrum out of autocorrelations¶

::

>>> auto_ss = SS()

>>> # Read data by specifying a filepath as an argument to the read method
>>> auto_filepath = 'SSINS/data/1061312640_autos.uvfits'
>>> auto_ss.read(auto_filepath, diff=True)
>>> auto_ins = INS(auto_ss, spectrum_type="auto")

(c) Plotting using Catalog_Plot¶

::

>>> # Plotting INS is similar to plotting a VDH, just with a different function
>>> # This plots all polarizations present in the file separately
>>> # The first column are the baseline-averaged amplitudes, while the second column shows the mean-subtracted data (z-scores)
>>> cp.INS_plot(ins, prefix, file_ext='pdf')
>>> print(os.path.exists('%s_SSINS.pdf' % prefix))
True

>>> # You can specify various plotting nuances with keywords
>>> # Let's set some frequency ticks every 50 channels
>>> xticks = np.arange(0, len(ins.freq_array), 50)
>>> xticklabels = ['%.1f' % (ins.freq_array[tick]* 10 ** (-6)) for tick in xticks]
>>> tick_prefix = '%s_ticks' % prefix
>>> cp.INS_plot(ins, tick_prefix, file_ext='pdf', xticks=xticks, xticklabels=xticklabels)
>>> print(os.path.exists('%s_SSINS.pdf' % tick_prefix))
True

(d) Plotting using the plot_lib library¶

::

>>> import matplotlib.pyplot as plt
>>> from matplotlib import cm
>>> from SSINS import plot_lib
>>> # Let's plot the first polarization data and z-scores
>>> fig, ax = plt.subplots(nrows=2, figsize=(16, 9))
>>> # The averaged amplitudes are stored in the metric_array parameter
>>> plot_lib.image_plot(fig, ax[0], ins.metric_array[:, :, 0],
...                     title='XX Amplitudes', xticks=xticks,
...                     xticklabels=xticklabels)
>>> # The z-scores are stored in the metric_ms parameter.
>>> # Let's choose a diverging colorbar and center it on zero using the cmap and midpoint keywords.
>>> plot_lib.image_plot(fig, ax[1], ins.metric_ms[:, :, 0],
...                     title='XX z-scores', xticks=xticks,
...                     xticklabels=xticklabels, cmap='coolwarm',
...                     midpoint=True)
>>> fig.savefig('%s_plot_lib_SSINS.pdf' % prefix)
>>> print(os.path.exists('%s_plot_lib_SSINS.pdf' % prefix))
True

(e) Saving out and reading in a spectrum¶

::

>>> # The INS.write method saves out h5 files that can be read both by INS objects and UVFlag objects
>>> # By default it saves out the metric_array in the file, z-scores must be saved separately
>>> # Set clobber=True to overwrite files with the same prefix (default is False)
>>> ins.write(prefix, clobber=True)
>>> ins.write(prefix, output_type='z_score', clobber=True)
>>> print(os.path.exists('%s_SSINS_data.h5' % prefix))
True
>>> print(os.path.exists('%s_SSINS_z_score.h5' % prefix))
True

>>> # This file can later be read upon instantiation of a new object
>>> # The z-scores will be recalculated on instantiation, so no need to read in the z-scores
>>> new_ins = INS('%s_SSINS_data.h5' % prefix)
>>> # Check equality
>>> print(np.all(ins.metric_array == new_ins.metric_array))
True

Flagging an INS using a match_filter (MF)¶

(a) Constructing a filter with no additional sub-bands¶

::

>>> from SSINS import MF

>>> # The MF class requires a frequency array and significance threshold as positional arguments
>>> # We will disable searching for broadband streaks and provide no additional sub-bands for the filter
>>> # First we need to define a sig_thresh dictionary for our only shape (narrowband)
>>> sig_thresh = 5
>>> mf = MF(ins.freq_array, sig_thresh, streak=False, narrow=True, shape_dict={})

(b) Constructing a filter for streaks and Western Australian DTV in MWA EoR Highband¶

::

>>> # Use the shape_dict keyword to provide custom sub-bands to search over during flagging
>>> # The input should be a dictionary, where the key is the name of the shape and the value are the lower/upper frequencies in hz
>>> shape_dict = {'TV6': [1.74e8, 1.81e8],
...               'TV7': [1.81e8, 1.88e8],
...               'TV8': [1.88e8, 1.95e8],
...               'TV9': [1.95e8, 2.02e8]}
>>> # We also need to apply significance thresholds for each shape, including 'narrow' and 'streak'
>>> # In principle, these can be different values per shape, see advanced techniques.
>>> sig_thresh = 5
>>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True, narrow=True)

(c) Constructing a filter for streaks and South African DTV in HERA below 200 Mhz¶

::

>>> # Use the shape_dict keyword to provide custom sub-bands to search over during flagging
>>> # The input should be a dictionary, where the key is the name of the shape and the value are the lower/upper frequencies in hz
>>> shape_dict = {'TV4': [1.74e8, 1.82e8],
...               'TV5': [1.82e8, 1.9e8],
...               'TV6': [1.9e8, 1.98e8]}
>>> # Technically 2 Mhz of channel 7 should appear, but we omit that in this example
>>> # We also need to apply significance thresholds for each shape, including 'narrow' and 'streak'
>>> # In principle, these can be different values per shape
>>> sig_thresh = 5
>>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True)

(d) Using the filter to flag the noise spectrum¶

::

>>> # Construct the filter that you want to use.
>>> # For the test data we will use the MWA DTV example above
>>> shape_dict = {'TV6': [1.74e8, 1.81e8],
...               'TV7': [1.81e8, 1.88e8],
...               'TV8': [1.88e8, 1.95e8],
...               'TV9': [1.95e8, 2.02e8]}
>>> sig_thresh = 5
>>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True)

>>> # Use the apply_match_test method to flag the INS (this applies the flags to the mask of the metric array)
>>> mf.apply_match_test(ins) 

(e) Saving the INS mask out to an h5 file¶

::

>>> # Just use the write method as above, with the right output_type
>>> ins.write(prefix, output_type='mask', clobber=True)
>>> print(os.path.exists('%s_SSINS_mask.h5' % prefix))
True

(f) Getting time propagated flags from the INS mask¶

::

>>> from pyuvdata import UVData, UVFlag
>>> # Each integration in the SSINS is a result of a difference of paired integrations
>>> # To get flags for the raw data, we have to propagate flagged INS samples in time to all possible contributing times
>>> # The mask_to_flags method returns an array where we have done this. This is useful for comparing to other UVFlag objects
>>> flags = ins.mask_to_flags()

>>> # We can write these out to an h5 file as well, but we need to make a UVFlag object from the original data
>>> uvd = UVData()
>>> uvd.read(filepath, times=times)
>>> uvf = UVFlag(uvd, waterfall=True, mode='flag')
>>> ins.write(prefix, output_type='flags', clobber=True, uvf=uvf)
>>> print(os.path.exists('%s_SSINS_flags.h5' % prefix))
True

(g) Applying time-propagated flags from INS to a UVData object and write new file¶

::

>>> from pyuvdata import utils as uvutils
>>> uvf = ins.flag_uvf(uvf)
>>> uvutils.apply_uvflag(uvd, uvf) # This is a pyuvdata utility function that safely applies flags to UVData from UVFlag
>>> uvd.write_uvfits('SSINS/data/tutorial_test_writeout.uvfits')

(h) Writing flags to an mwaf file¶

::

>>> # We can add or replace flags from an existing mwaf file
>>> # An mwaf file is a special fits file for storing flags of raw MWA data
>>> # A special keyword option in ins.write() helps write them
>>> # You must supply a list of existing mwaf files from which to gather the header data
>>> # Currently you must flag at the same time/freq resolution as the data in the existing mwaf_files

>>> # For instance if you wanted to flag just the first two coarse bands for an obsid
>>> mwaf_files = ['/path/to/obsid_01.mwaf', '/path/to/obsid/obsid_02.mwaf'] 

>>> # As usual you must supply a prefix for the file.
>>> # You can choose to add flags to the file from SSINS flagging, or totally replace them
>>> prefix_add = '/path/to/obsid_SSINS_add' 
>>> prefix_replace = '/path/to/obsid_SSINS_replace' 
>>> # Can use Ncoarse keyword if input data does not have 24 coarse channels in it (default is 24)
>>> ins.write(prefix_add, output_type='mwaf', mwaf_files=mwaf_files, 
...           mwaf_method='add', Ncoarse=24) 
>>> ins.write(prefix_replace, output_type='mwaf', mwaf_files=mwaf_files, 
...           mwaf_method='replace', Ncoarse=24) 

>>> # Be sure to set clobber=False (default) if using the same prefix
>>> # as the original file and you don't want to overwrite

Getting Version Info¶

SSINS uses setuptools_scm to get the version from git tags.

::

>>> from SSINS import __version__ as SSINS_version
>>> # This string will be recorded in the history string of written outputs
>>> print(SSINS_version) 

Advanced Techniques¶

The techniques below are for users who are already familiar with the basic tutorials above.

Using INS.mean_subtract¶

(a) Basic functionality¶

::

>>> from SSINS import INS
>>> ins = INS('SSINS/data/1061313128_99bl_1pol_half_time_SSINS.h5')

>>> # The mean_subtract method returns the result of mean_subtraction
>>> # It does NOT automatically change the metric_ms attribute
>>> ms_arr = ins.mean_subtract()

>>> # You can do mean subtraction on just a subset of the frequencies to get a smaller output
>>> # This functionality is used to speed up match filtering
>>> # Let's just do the first ten frequency channels
>>> ms_arr = ins.mean_subtract(freq_slice=slice(0, 10))

(b) Subtracting a polynomial fit instead of the mean¶

::

>>> # If the noise levels are expected to change over the course of the obs (due to a refrigeration cycle for instance)
>>> # then may want to subtract a polynomial fit that describes the drift
>>> # The mean_subtract method uses INS.order to determine what degree of polynomial to subtract
>>> # Default order is 0, which just does mean subtraction
>>> ins.order = 1
>>> ms_arr_ord_1 = ins.mean_subtract()
>>> ins.order = 0
>>> ms_arr_ord_0 = ins.mean_subtract()

>>> # Can ask for the fit coefficients on a per-frequency basis
>>> ins.order = 2
>>> ms_arr_ord_2, coeffs_ord_2 = ins.mean_subtract(return_coeffs=True)
>>> # The shape is (INS.order + 1, Nfreqs, Npols) where Nfreqs is the number of frequencies in the slice
>>> # It goes from higher degree coefficients to lower degree

Extra Flagging Bits¶

(a) Flagging all times for highly contaminated channels¶

::

>>> # Suppose you want to flag any channels with less than 40% clean data
>>> # Construct a MF as follows
>>> sig_thresh = {'narrow': 5, 'streak': 5}
>>> mf = MF(ins.freq_array, sig_thresh, tb_aggro=0.4)
>>> mf.apply_match_test(ins, time_broadcast=True) 

(b) Broadcasting flags over subbands¶

::

>>> # Suppose you want to spread flags over certain subbands if RFI is found in those subbands
>>> # For instance: maybe you want to flag a whole TV band if anything is found in it
>>> # Make a broadcast_dict (this is a South Africa example)
>>> broadcast_dict = {'TV4': [174e6, 182e6], 'TV5': [182e6, 190e6], 'TV6': [190e6, 192e6]}
>>> mf = MF(ins.freq_array, 5, broadcast_dict=broadcast_dict)
>>> # Note that intervals in SSINS are INCLUSIVE on both ends

(c) Broadcasting flags over subbands, with guard bands¶

>>> # Depending on the channelization, these subbands may overlap
>>> # This means events found at the very edge of one subbands may induce flags in the other, unless a guard band is thrown in
>>> # An example 100 kHz guard band program might look like this
>>> guard_width = 100e3
>>> broadcast_dict = {}
>>> broadcast_dict['TV4'] = [174e6, 182e6 - guard_width]
>>> broadcast_dict['guard_4_5'] = [182e6 - guard_width, 182e6 + guard_width]
>>> broadcast_dict['TV5'] = [182e6 + guard_width, 190e6 - guard_width]
>>> mf = MF(ins.freq_array, 5, broadcast_dict=broadcast_dict)

(d) Calculating occupancy¶

::

>>> # A dictionary that reports the occupancy of the shapes found by the flagger can be calculated
>>> # See util.calc_occ docs
>>> occ_dict = util.calc_occ(ins, mf, 0) 
>>> # This can then be written to a yaml
>>> with open("SSINS/data/test_occ.yml", "w") as yaml_file: 
...    yaml.safe_dump(occ_dict, yaml_file) 

(e) Setting different significance thresholds per shape¶

::

>>> # One may pass a dictionary of significance thresholds to set different
>>> # thresholds per shape. All desired shapes, including narrow and broad if
>>> # desired, must be included.
>>> sig_thresh = {'narrow': 5, 'streak': 20, 'TV6': 5, 'TV7': 5, 'TV8': 5, 'TV9': 5}
>>> shape_dict = {'TV6': [1.74e8, 1.81e8],
...               'TV7': [1.81e8, 1.88e8],
...               'TV8': [1.88e8, 1.95e8],
...               'TV9': [1.95e8, 2.02e8]}
>>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict)

(f) Writing out flags to a visibility file from a UVData object¶

::

>>> ss = SS()
>>> ss.read(filepath, times=times, flag_choice='original', diff=True)
>>> ss.write('SSINS/data/tutorial_test_writeout_2.uvfits', 'uvfits', UV=uvd)

Tutorial¶

Basic SSINS Construction¶

Generating the sky-subtracted visibilities¶

(a) Initializing an SS object and reading in raw data¶

(b) Passing keyword arguments to SS.read¶

(c) Applying flags¶

(d) Plotting using Catalog_Plot¶

Making and writing an incoherent noise spectrum¶

(a) Making an incoherent noise spectrum from sky-subtracted data¶

(b) Making an incoherent noise spectrum out of autocorrelations¶

(c) Plotting using Catalog_Plot¶

(d) Plotting using the plot_lib library¶

(e) Saving out and reading in a spectrum¶

Flagging an INS using a match_filter (MF)¶

(a) Constructing a filter with no additional sub-bands¶

(b) Constructing a filter for streaks and Western Australian DTV in MWA EoR Highband¶

(c) Constructing a filter for streaks and South African DTV in HERA below 200 Mhz¶

(d) Using the filter to flag the noise spectrum¶

(e) Saving the INS mask out to an h5 file¶

(f) Getting time propagated flags from the INS mask¶

(g) Applying time-propagated flags from INS to a UVData object and write new file¶

(h) Writing flags to an mwaf file¶

Getting Version Info¶

Advanced Techniques¶

Using INS.mean_subtract¶

(a) Basic functionality¶

(b) Subtracting a polynomial fit instead of the mean¶

Extra Flagging Bits¶

(a) Flagging all times for highly contaminated channels¶

(b) Broadcasting flags over subbands¶

(c) Broadcasting flags over subbands, with guard bands¶

(d) Calculating occupancy¶

(e) Setting different significance thresholds per shape¶

(f) Writing out flags to a visibility file from a UVData object¶

SSINS

Navigation

Related Topics