Tutorial¶
Basic SSINS Construction¶
There are three main classes in the software: SS (sky_subtract), INS (incoherent_noise_spectrum), and MF (match_filter). We will use these classes to navigate the various steps in the SSINS process detailed in the paper (https://arxiv.org/abs/1906.01093).
Generating the sky-subtracted visibilities¶
(a) Initializing an SS object and reading in raw data¶
- ::
>>> from SSINS import SS
>>> # The SS object is a subclass of a UVData object, and therefore has all of its attributes and methods >>> # We initialize it identically >>> # See UVData documentation on https://pyuvdata.readthedocs.io/en/latest/ >>> ss = SS()
>>> # Read data by specifying a filepath as an argument to the read method >>> filepath = 'SSINS/data/1061313128_99bl_1pol_half_time.uvfits' >>> # By default, the visibilities are NOT differenced in time on read (see paper). This is for compatibility with multi-file reading. >>> ss.read(filepath, diff=True)
(b) Passing keyword arguments to SS.read¶
- ::
>>> import numpy as np
>>> # SS.read is actually a small wrapper around UVData.read; they share keywords >>> # In particular, select on read and reading only metadata function as usual (see UVData.select documentation) >>> ss = SS() >>> ss.read(filepath, read_data=False)
>>> # The following lines make use of the time_array attribute (metadata) to >>> # read in all but the first and last integrations >>> times = np.unique(ss.time_array)[1:-1] >>> ss.read(filepath, read_data=True, times=times, diff=True)
(c) Applying flags¶
- ::
>>> # SS.data_array is a numpy masked array. To "apply flags" is to change the mask of the data_array. >>> # The proper way to apply flags to the sky-subtracted data is to use the apply_flags method >>> # To apply the original flags in the raw data file, make the following call >>> ss.apply_flags(flag_choice='original') >>> # Note that the original flags are always stored in the flag_array attribute >>> # The flag_choice keyword is stored in an attribute >>> print(ss.flag_choice) original
>>> # You can apply flags from a custom flag array that is the same shape as the data >>> custom = np.zeros(ss.data_array.shape, dtype=bool) >>> # Let us make it so that only the first frequency channel is flagged and nothing else >>> custom[:, 0, 0, :] = True >>> # Apply these flags in the following way >>> ss.apply_flags(flag_choice='custom', custom=custom) >>> print(ss.flag_choice) custom
>>> # Unflag the data by setting flag_choice=None (note this is actually the default!!) >>> ss.apply_flags(flag_choice=None) >>> # Check if anything is flagged, for demonstration purposes >>> print(np.any(ss.data_array.mask)) False
(d) Plotting using Catalog_Plot¶
- ::
>>> from SSINS import Catalog_Plot as cp >>> import os
>>> # The Catalog_Plot library contains wrappers around plot_lib functions for basic plotting needs >>> # See the documentation: https://ssins.readthedocs.io/en/latest/Catalog_Plot.html >>> # Each function in Catalog_Plot requires a class instance and a filename prefix as arguments (a suffix is appended by the wrapper) >>> # Whatever unique identifying information for the plot should be specified in the prefix >>> prefix = 'SSINS/data/test_data'
>>> # To make a Histogram of the Visibility Differences (a VDH, figure 1 of paper), and save it as a pdf, do the following >>> # This also plots a fit estimated from the data >>> cp.VDH_plot(ss, prefix, file_ext='pdf', post_flag=False) >>> # Check to see that the file exists >>> print(os.path.exists('%s_VDH.pdf' % (prefix))) True
>>> # Let's apply flags and plot the flagged data alongside the unflagged data, without fits >>> # We also want legend labels and a legend >>> ss.apply_flags('original') >>> new_prefix = '%s_flag_unflag_nofits' % prefix >>> cp.VDH_plot(ss, new_prefix, file_ext='pdf', pre_flag=True, ... post_flag=True, pre_model=False, post_model=False, ... post_label='Post-Flag Data', pre_label='Pre-Flag Data', ... legend=True) >>> print(os.path.exists('%s_VDH.pdf' % (new_prefix))) True
Making and writing an incoherent noise spectrum¶
(a) Making an incoherent noise spectrum from sky-subtracted data¶
- ::
>>> from SSINS import INS
>>> # Making an INS from sky-subtracted data is as simple as passing an SS instance as an argument >>> ins = INS(ss) >>> # This averages the amplitudes of the sky-subtracted data over the baselines, taking into account flags that were applied
(b) Making an incoherent noise spectrum out of autocorrelations¶
- ::
>>> auto_ss = SS()
>>> # Read data by specifying a filepath as an argument to the read method >>> auto_filepath = 'SSINS/data/1061312640_autos.uvfits' >>> auto_ss.read(auto_filepath, diff=True) >>> auto_ins = INS(auto_ss, spectrum_type="auto")
(c) Plotting using Catalog_Plot¶
- ::
>>> # Plotting INS is similar to plotting a VDH, just with a different function >>> # This plots all polarizations present in the file separately >>> # The first column are the baseline-averaged amplitudes, while the second column shows the mean-subtracted data (z-scores) >>> cp.INS_plot(ins, prefix, file_ext='pdf') >>> print(os.path.exists('%s_SSINS.pdf' % prefix)) True
>>> # You can specify various plotting nuances with keywords >>> # Let's set some frequency ticks every 50 channels >>> xticks = np.arange(0, len(ins.freq_array), 50) >>> xticklabels = ['%.1f' % (ins.freq_array[tick]* 10 ** (-6)) for tick in xticks] >>> tick_prefix = '%s_ticks' % prefix >>> cp.INS_plot(ins, tick_prefix, file_ext='pdf', xticks=xticks, xticklabels=xticklabels) >>> print(os.path.exists('%s_SSINS.pdf' % tick_prefix)) True
(d) Plotting using the plot_lib library¶
- ::
>>> import matplotlib.pyplot as plt >>> from matplotlib import cm >>> from SSINS import plot_lib >>> # Let's plot the first polarization data and z-scores >>> fig, ax = plt.subplots(nrows=2, figsize=(16, 9)) >>> # The averaged amplitudes are stored in the metric_array parameter >>> plot_lib.image_plot(fig, ax[0], ins.metric_array[:, :, 0], ... title='XX Amplitudes', xticks=xticks, ... xticklabels=xticklabels) >>> # The z-scores are stored in the metric_ms parameter. >>> # Let's choose a diverging colorbar and center it on zero using the cmap and midpoint keywords. >>> plot_lib.image_plot(fig, ax[1], ins.metric_ms[:, :, 0], ... title='XX z-scores', xticks=xticks, ... xticklabels=xticklabels, cmap='coolwarm', ... midpoint=True) >>> fig.savefig('%s_plot_lib_SSINS.pdf' % prefix) >>> print(os.path.exists('%s_plot_lib_SSINS.pdf' % prefix)) True
(e) Saving out and reading in a spectrum¶
- ::
>>> # The INS.write method saves out h5 files that can be read both by INS objects and UVFlag objects >>> # By default it saves out the metric_array in the file, z-scores must be saved separately >>> # Set clobber=True to overwrite files with the same prefix (default is False) >>> ins.write(prefix, clobber=True) >>> ins.write(prefix, output_type='z_score', clobber=True) >>> print(os.path.exists('%s_SSINS_data.h5' % prefix)) True >>> print(os.path.exists('%s_SSINS_z_score.h5' % prefix)) True
>>> # This file can later be read upon instantiation of a new object >>> # The z-scores will be recalculated on instantiation, so no need to read in the z-scores >>> new_ins = INS('%s_SSINS_data.h5' % prefix) >>> # Check equality >>> print(np.all(ins.metric_array == new_ins.metric_array)) True
Flagging an INS using a match_filter (MF)¶
(a) Constructing a filter with no additional sub-bands¶
- ::
>>> from SSINS import MF
>>> # The MF class requires a frequency array and significance threshold as positional arguments >>> # We will disable searching for broadband streaks and provide no additional sub-bands for the filter >>> # First we need to define a sig_thresh dictionary for our only shape (narrowband) >>> sig_thresh = 5 >>> mf = MF(ins.freq_array, sig_thresh, streak=False, narrow=True, shape_dict={})
(b) Constructing a filter for streaks and Western Australian DTV in MWA EoR Highband¶
- ::
>>> # Use the shape_dict keyword to provide custom sub-bands to search over during flagging >>> # The input should be a dictionary, where the key is the name of the shape and the value are the lower/upper frequencies in hz >>> shape_dict = {'TV6': [1.74e8, 1.81e8], ... 'TV7': [1.81e8, 1.88e8], ... 'TV8': [1.88e8, 1.95e8], ... 'TV9': [1.95e8, 2.02e8]} >>> # We also need to apply significance thresholds for each shape, including 'narrow' and 'streak' >>> # In principle, these can be different values per shape, see advanced techniques. >>> sig_thresh = 5 >>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True, narrow=True)
(c) Constructing a filter for streaks and South African DTV in HERA below 200 Mhz¶
- ::
>>> # Use the shape_dict keyword to provide custom sub-bands to search over during flagging >>> # The input should be a dictionary, where the key is the name of the shape and the value are the lower/upper frequencies in hz >>> shape_dict = {'TV4': [1.74e8, 1.82e8], ... 'TV5': [1.82e8, 1.9e8], ... 'TV6': [1.9e8, 1.98e8]} >>> # Technically 2 Mhz of channel 7 should appear, but we omit that in this example >>> # We also need to apply significance thresholds for each shape, including 'narrow' and 'streak' >>> # In principle, these can be different values per shape >>> sig_thresh = 5 >>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True)
(d) Using the filter to flag the noise spectrum¶
- ::
>>> # Construct the filter that you want to use. >>> # For the test data we will use the MWA DTV example above >>> shape_dict = {'TV6': [1.74e8, 1.81e8], ... 'TV7': [1.81e8, 1.88e8], ... 'TV8': [1.88e8, 1.95e8], ... 'TV9': [1.95e8, 2.02e8]} >>> sig_thresh = 5 >>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict, streak=True)
>>> # Use the apply_match_test method to flag the INS (this applies the flags to the mask of the metric array) >>> mf.apply_match_test(ins)
(e) Saving the INS mask out to an h5 file¶
- ::
>>> # Just use the write method as above, with the right output_type >>> ins.write(prefix, output_type='mask', clobber=True) >>> print(os.path.exists('%s_SSINS_mask.h5' % prefix)) True
(f) Getting time propagated flags from the INS mask¶
- ::
>>> from pyuvdata import UVData, UVFlag >>> # Each integration in the SSINS is a result of a difference of paired integrations >>> # To get flags for the raw data, we have to propagate flagged INS samples in time to all possible contributing times >>> # The mask_to_flags method returns an array where we have done this. This is useful for comparing to other UVFlag objects >>> flags = ins.mask_to_flags()
>>> # We can write these out to an h5 file as well, but we need to make a UVFlag object from the original data >>> uvd = UVData() >>> uvd.read(filepath, times=times) >>> uvf = UVFlag(uvd, waterfall=True, mode='flag') >>> ins.write(prefix, output_type='flags', clobber=True, uvf=uvf) >>> print(os.path.exists('%s_SSINS_flags.h5' % prefix)) True
(g) Applying time-propagated flags from INS to a UVData object and write new file¶
- ::
>>> from pyuvdata import utils as uvutils >>> uvf = ins.flag_uvf(uvf) >>> uvutils.apply_uvflag(uvd, uvf) # This is a pyuvdata utility function that safely applies flags to UVData from UVFlag >>> uvd.write_uvfits('SSINS/data/tutorial_test_writeout.uvfits')
(h) Writing flags to an mwaf file¶
- ::
>>> # We can add or replace flags from an existing mwaf file >>> # An mwaf file is a special fits file for storing flags of raw MWA data >>> # A special keyword option in ins.write() helps write them >>> # You must supply a list of existing mwaf files from which to gather the header data >>> # Currently you must flag at the same time/freq resolution as the data in the existing mwaf_files
>>> # For instance if you wanted to flag just the first two coarse bands for an obsid >>> mwaf_files = ['/path/to/obsid_01.mwaf', '/path/to/obsid/obsid_02.mwaf']
>>> # As usual you must supply a prefix for the file. >>> # You can choose to add flags to the file from SSINS flagging, or totally replace them >>> prefix_add = '/path/to/obsid_SSINS_add' >>> prefix_replace = '/path/to/obsid_SSINS_replace' >>> # Can use Ncoarse keyword if input data does not have 24 coarse channels in it (default is 24) >>> ins.write(prefix_add, output_type='mwaf', mwaf_files=mwaf_files, ... mwaf_method='add', Ncoarse=24) >>> ins.write(prefix_replace, output_type='mwaf', mwaf_files=mwaf_files, ... mwaf_method='replace', Ncoarse=24)
>>> # Be sure to set clobber=False (default) if using the same prefix >>> # as the original file and you don't want to overwrite
Getting Version Info¶
SSINS uses setuptools_scm to get the version from git tags.
- ::
>>> from SSINS import __version__ as SSINS_version >>> # This string will be recorded in the history string of written outputs >>> print(SSINS_version)
Advanced Techniques¶
The techniques below are for users who are already familiar with the basic tutorials above.
Using INS.mean_subtract¶
(a) Basic functionality¶
- ::
>>> from SSINS import INS >>> ins = INS('SSINS/data/1061313128_99bl_1pol_half_time_SSINS.h5')
>>> # The mean_subtract method returns the result of mean_subtraction >>> # It does NOT automatically change the metric_ms attribute >>> ms_arr = ins.mean_subtract()
>>> # You can do mean subtraction on just a subset of the frequencies to get a smaller output >>> # This functionality is used to speed up match filtering >>> # Let's just do the first ten frequency channels >>> ms_arr = ins.mean_subtract(freq_slice=slice(0, 10))
(b) Subtracting a polynomial fit instead of the mean¶
- ::
>>> # If the noise levels are expected to change over the course of the obs (due to a refrigeration cycle for instance) >>> # then may want to subtract a polynomial fit that describes the drift >>> # The mean_subtract method uses INS.order to determine what degree of polynomial to subtract >>> # Default order is 0, which just does mean subtraction >>> ins.order = 1 >>> ms_arr_ord_1 = ins.mean_subtract() >>> ins.order = 0 >>> ms_arr_ord_0 = ins.mean_subtract()
>>> # Can ask for the fit coefficients on a per-frequency basis >>> ins.order = 2 >>> ms_arr_ord_2, coeffs_ord_2 = ins.mean_subtract(return_coeffs=True) >>> # The shape is (INS.order + 1, Nfreqs, Npols) where Nfreqs is the number of frequencies in the slice >>> # It goes from higher degree coefficients to lower degree
Extra Flagging Bits¶
(a) Flagging all times for highly contaminated channels¶
- ::
>>> # Suppose you want to flag any channels with less than 40% clean data >>> # Construct a MF as follows >>> sig_thresh = {'narrow': 5, 'streak': 5} >>> mf = MF(ins.freq_array, sig_thresh, tb_aggro=0.4) >>> mf.apply_match_test(ins, time_broadcast=True)
(b) Broadcasting flags over subbands¶
- ::
>>> # Suppose you want to spread flags over certain subbands if RFI is found in those subbands >>> # For instance: maybe you want to flag a whole TV band if anything is found in it >>> # Make a broadcast_dict (this is a South Africa example) >>> broadcast_dict = {'TV4': [174e6, 182e6], 'TV5': [182e6, 190e6], 'TV6': [190e6, 192e6]} >>> mf = MF(ins.freq_array, 5, broadcast_dict=broadcast_dict) >>> # Note that intervals in SSINS are INCLUSIVE on both ends
(c) Broadcasting flags over subbands, with guard bands¶
>>> # Depending on the channelization, these subbands may overlap
>>> # This means events found at the very edge of one subbands may induce flags in the other, unless a guard band is thrown in
>>> # An example 100 kHz guard band program might look like this
>>> guard_width = 100e3
>>> broadcast_dict = {}
>>> broadcast_dict['TV4'] = [174e6, 182e6 - guard_width]
>>> broadcast_dict['guard_4_5'] = [182e6 - guard_width, 182e6 + guard_width]
>>> broadcast_dict['TV5'] = [182e6 + guard_width, 190e6 - guard_width]
>>> mf = MF(ins.freq_array, 5, broadcast_dict=broadcast_dict)
(d) Calculating occupancy¶
- ::
>>> # A dictionary that reports the occupancy of the shapes found by the flagger can be calculated >>> # See util.calc_occ docs >>> occ_dict = util.calc_occ(ins, mf, 0) >>> # This can then be written to a yaml >>> with open("SSINS/data/test_occ.yml", "w") as yaml_file: ... yaml.safe_dump(occ_dict, yaml_file)
(e) Setting different significance thresholds per shape¶
- ::
>>> # One may pass a dictionary of significance thresholds to set different >>> # thresholds per shape. All desired shapes, including narrow and broad if >>> # desired, must be included. >>> sig_thresh = {'narrow': 5, 'streak': 20, 'TV6': 5, 'TV7': 5, 'TV8': 5, 'TV9': 5} >>> shape_dict = {'TV6': [1.74e8, 1.81e8], ... 'TV7': [1.81e8, 1.88e8], ... 'TV8': [1.88e8, 1.95e8], ... 'TV9': [1.95e8, 2.02e8]} >>> mf = MF(ins.freq_array, sig_thresh, shape_dict=shape_dict)
(f) Writing out flags to a visibility file from a UVData object¶
- ::
>>> ss = SS() >>> ss.read(filepath, times=times, flag_choice='original', diff=True) >>> ss.write('SSINS/data/tutorial_test_writeout_2.uvfits', 'uvfits', UV=uvd)