API¶

Concept¶

midtools.Dataset is the basis of midtools. Amongst others it:

provides the command line interface,
reads the configuration file
handles metadata,
starts the SLURMCluster,
runs analysis routines from submodules,
and finally stores the results in an HDF5-file.

Data are provided and calibrated by midtools.Calibrator after running midtools.Calibrator._get_data() which is called by midtools.Dataset. It includes the following data processing steps:

loading the data as an xarray using the extra_data module,
slicing trains and pulses,
masking,
binning,
baseline correction.

Additionally, midtools.corrections provides functions to be applied on individual workers.

midtools provides the following analysis submodules:

midtools.azimuthal_integration,
midtools.correlation,
midtools.statistics,
midtools.average.

The first three modules use the apply_along_axis method from dask.array to apply an algorithm on each train or pulse of a run.

`The Dataset Class`¶

class midtools.Dataset(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]¶

__init__(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]¶

Dataset object to analyze MID datasets on Maxwell.

Parameters:

setupfile (str) – Setupfile (.yml) that contains information on the setup parameters.
analysis (str, optional) –
Flags of the analysis to perform. Defaults to ‘00’. analysis is a string of ones and zeros where a one means to perform the analysis and a zero means to omit the analysis. The analysis types are:

flags | analysis

1000 average frames

0100 SAXS azimuthal integration

0010 XPCS correlation functions

0001 compute statistics
last_train (int, optional) – Index of last train to analyze. If not provided, all trains are processed.
run_number (int, optional) – Specify run number. Defaults to None. If not defined, the datdir in the setupfile has to contain the .h5 files.
dark_run_number (int, optional) – Dark run number for calibration.
pulses_per_train (int, optional) – Specify the number of pulses per train. If not provided, take all stored memory cells.
train_step (int, optional) – Stepsize for slicing the train axis.
pulse_step (int, optional) – Stepsize for slicing the pulse axis.
is_dark (bool, optional) – If True switch to dark routines, i.e., average dark for dark subtraction, calculate mask from darks.
is_flatfield (bool, optional) – If True use flatfield algorithms.
flatfield_run_number (int, optional) – Run number of the processed flatfield for calibration.

Note

A setupfile might look like this:

# setup.yml file

# Data
datdir: /path/to/data/r0522

# Maskfile
mask: /path/to/mask/agipd_mask_tmp.npy

# Beamline
photon_energy: 9 # keV
sample_detector: 8 # m
pixel_size: 200 # um

quadrant_positions:
    dx: -18
    dy: -15
    q1: [-500, 650]
    q2: [-550, -30]
    q3: [ 570, -216]
    q4: [ 620, 500]

# XPCS
xpcs_opt:
    q_range:
        q_first: .1 # smallest q in nm-1
        q_last: 1.  # largest q in nm-1
        steps: 10   # how many q-bins

agipd_geom¶

AGIPD geometry obtained from extra_data.

Type:	AGIPD_1MGeometry

center = None¶

Position of the direct beam in pixels

Type:	tuple

compute(create_file=True)[source]¶: Start the actual computation based on the analysis attribute.

datdir¶

Data directory.

Type:	str

file_name = None¶

HDF5 file name.

Type:	str

h5_structure = None¶

Structure of the HDF5 file

Type:	dict

is_dark = None¶

True if current run is a dark run.

Type:	bool

is_flatfield = None¶

True if current run is a flatfield run.

Type:	bool

mask¶

shape(16,512,128) Mask where bad pixels are 0 and good pixels 1.

Type:	np.ndarray

merge_files(subset=None, delete_file=False)[source]¶: merge existing HDF5 files for a run

qmap = None¶

qmap (16, 512, 128)

Type:	np.ndarray

run = None¶

e.g., returned from extra_data.RunDirectory

Type:	DataCollection

run_number¶

Number of the run.

Type:	int

setup = None¶: Xana setup instance

setupfile = None¶

Path to the setupfile.

Type:	str

Analyze MID runs.

usage: midtools [-h] [-r RUN] [-dr DARK_RUN_NUMBER DARK_RUN_NUMBER]
                [--last-train LAST_TRAIN] [--first-train FIRST_TRAIN]
                [-ppt PULSES_PER_TRAIN] [-ts TRAIN_STEP] [-ps PULSE_STEP]
                [--is-dark [IS_DARK]] [--is-flatfield [IS_FLATFIELD]]
                [-ffr FLATFIELD_RUN FLATFIELD_RUN] [--out-dir [OUT_DIR]]
                [--chunk [CHUNK]] [--job-dir JOB_DIR] [--slurm [SLURM]]
                [--localcluster [LOCALCLUSTER]]
                [--file-identifier [FILE_IDENTIFIER]]
                [--first-cell FIRST_CELL] [--datdir DATDIR]
                setupfile analysis

Positional Arguments¶

`setupfile`	the YAML file to configure midtools
`analysis`	which analysis to perform. List of 0s and 1s: 1000 saves average data along specific axis, 0100 SAXS routines, 0010 XPCS routines, 0001 statistics (histograms pulse resolved.

Named Arguments¶

`-r, --run`	Run number.
`-dr, --dark-run-number`
	Dark run number.
`--last-train`	last train to analyze. Default: 1000000
`--first-train`	first train to analyze. Default: 0
`-ppt, --pulses-per-train`
	number of pulses per train Default: 500
`-ts, --train-step`
	spacing of trains Default: 1
`-ps, --pulse-step`
	spacing of pulses Default: 1
`--is-dark`	whether the run is a dark run Default: False
`--is-flatfield`	whether the run is a flatfield run Default: False
`-ffr, --flatfield-run`
	Flatfield run number.
`--out-dir`	Output directory Default: “./”
`--chunk`	Split the number of trains in chunks of this size (default do not chunk)
`--job-dir`	Directory for the slurm output and error files Default: “/gpfs/exfel/data/scratch/reiserm/mid-proteins/jobs/”
`--slurm`	Run midtools on dedicated node with slurm job (default False) Default: False
`--localcluster`	Use dasks LocalCluster to run midtools locally (default False) Default: False
`--file-identifier`
	Identifier at file ending. Default None.
`--first-cell`	Cell ID of the first AGIPD memory cell with X-rays. Default: 2
`--datdir`	Path to the data. This argument is only used if the data directory is not provided in the setupfile.

API¶

Concept¶

The Dataset Class¶

Positional Arguments¶

Named Arguments¶

`The Dataset Class`¶