API

Concept

midtools.Dataset is the basis of midtools. Amongst others it:

  • provides the command line interface,
  • reads the configuration file
  • handles metadata,
  • starts the SLURMCluster,
  • runs analysis routines from submodules,
  • and finally stores the results in an HDF5-file.

Data are provided and calibrated by midtools.Calibrator after running midtools.Calibrator._get_data() which is called by midtools.Dataset. It includes the following data processing steps:

  • loading the data as an xarray using the extra_data module,
  • slicing trains and pulses,
  • masking,
  • binning,
  • baseline correction.

Additionally, midtools.corrections provides functions to be applied on individual workers.

midtools provides the following analysis submodules:

  • midtools.azimuthal_integration,
  • midtools.correlation,
  • midtools.statistics,
  • midtools.average.

The first three modules use the apply_along_axis method from dask.array to apply an algorithm on each train or pulse of a run.

The Dataset Class

class midtools.Dataset(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]
__init__(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]

Dataset object to analyze MID datasets on Maxwell.

Parameters:
  • setupfile (str) – Setupfile (.yml) that contains information on the setup parameters.
  • analysis (str, optional) –

    Flags of the analysis to perform. Defaults to ‘00’. analysis is a string of ones and zeros where a one means to perform the analysis and a zero means to omit the analysis. The analysis types are:

    flags | analysis
    1000 average frames
    0100 SAXS azimuthal integration
    0010 XPCS correlation functions
    0001 compute statistics
  • last_train (int, optional) – Index of last train to analyze. If not provided, all trains are processed.
  • run_number (int, optional) – Specify run number. Defaults to None. If not defined, the datdir in the setupfile has to contain the .h5 files.
  • dark_run_number (int, optional) – Dark run number for calibration.
  • pulses_per_train (int, optional) – Specify the number of pulses per train. If not provided, take all stored memory cells.
  • train_step (int, optional) – Stepsize for slicing the train axis.
  • pulse_step (int, optional) – Stepsize for slicing the pulse axis.
  • is_dark (bool, optional) – If True switch to dark routines, i.e., average dark for dark subtraction, calculate mask from darks.
  • is_flatfield (bool, optional) – If True use flatfield algorithms.
  • flatfield_run_number (int, optional) – Run number of the processed flatfield for calibration.

Note

A setupfile might look like this:

# setup.yml file

# Data
datdir: /path/to/data/r0522

# Maskfile
mask: /path/to/mask/agipd_mask_tmp.npy

# Beamline
photon_energy: 9 # keV
sample_detector: 8 # m
pixel_size: 200 # um

quadrant_positions:
    dx: -18
    dy: -15
    q1: [-500, 650]
    q2: [-550, -30]
    q3: [ 570, -216]
    q4: [ 620, 500]

# XPCS
xpcs_opt:
    q_range:
        q_first: .1 # smallest q in nm-1
        q_last: 1.  # largest q in nm-1
        steps: 10   # how many q-bins
agipd_geom

AGIPD geometry obtained from extra_data.

Type:AGIPD_1MGeometry
center = None

Position of the direct beam in pixels

Type:tuple
compute(create_file=True)[source]

Start the actual computation based on the analysis attribute.

datdir

Data directory.

Type:str
file_name = None

HDF5 file name.

Type:str
h5_structure = None

Structure of the HDF5 file

Type:dict
is_dark = None

True if current run is a dark run.

Type:bool
is_flatfield = None

True if current run is a flatfield run.

Type:bool
mask

shape(16,512,128) Mask where bad pixels are 0 and good pixels 1.

Type:np.ndarray
merge_files(subset=None, delete_file=False)[source]

merge existing HDF5 files for a run

qmap = None

qmap (16, 512, 128)

Type:np.ndarray
run = None

e.g., returned from extra_data.RunDirectory

Type:DataCollection
run_number

Number of the run.

Type:int
setup = None

Xana setup instance

setupfile = None

Path to the setupfile.

Type:str

Analyze MID runs.

usage: midtools [-h] [-r RUN] [-dr DARK_RUN_NUMBER DARK_RUN_NUMBER]
                [--last-train LAST_TRAIN] [--first-train FIRST_TRAIN]
                [-ppt PULSES_PER_TRAIN] [-ts TRAIN_STEP] [-ps PULSE_STEP]
                [--is-dark [IS_DARK]] [--is-flatfield [IS_FLATFIELD]]
                [-ffr FLATFIELD_RUN FLATFIELD_RUN] [--out-dir [OUT_DIR]]
                [--chunk [CHUNK]] [--job-dir JOB_DIR] [--slurm [SLURM]]
                [--localcluster [LOCALCLUSTER]]
                [--file-identifier [FILE_IDENTIFIER]]
                [--first-cell FIRST_CELL] [--datdir DATDIR]
                setupfile analysis

Positional Arguments

setupfile the YAML file to configure midtools
analysis which analysis to perform. List of 0s and 1s: 1000 saves average data along specific axis, 0100 SAXS routines, 0010 XPCS routines, 0001 statistics (histograms pulse resolved.

Named Arguments

-r, --run Run number.
-dr, --dark-run-number
 Dark run number.
--last-train

last train to analyze.

Default: 1000000

--first-train

first train to analyze.

Default: 0

-ppt, --pulses-per-train
 

number of pulses per train

Default: 500

-ts, --train-step
 

spacing of trains

Default: 1

-ps, --pulse-step
 

spacing of pulses

Default: 1

--is-dark

whether the run is a dark run

Default: False

--is-flatfield

whether the run is a flatfield run

Default: False

-ffr, --flatfield-run
 Flatfield run number.
--out-dir

Output directory

Default: “./”

--chunk Split the number of trains in chunks of this size (default do not chunk)
--job-dir

Directory for the slurm output and error files

Default: “/gpfs/exfel/data/scratch/reiserm/mid-proteins/jobs/”

--slurm

Run midtools on dedicated node with slurm job (default False)

Default: False

--localcluster

Use dasks LocalCluster to run midtools locally (default False)

Default: False

--file-identifier
 Identifier at file ending. Default None.
--first-cell

Cell ID of the first AGIPD memory cell with X-rays.

Default: 2

--datdir Path to the data. This argument is only used if the data directory is not provided in the setupfile.