API¶
Concept¶
midtools.Dataset is the basis of midtools. Amongst others it:
- provides the command line interface,
- reads the configuration file
- handles metadata,
- starts the SLURMCluster,
- runs analysis routines from submodules,
- and finally stores the results in an HDF5-file.
Data are provided and calibrated by midtools.Calibrator after running
midtools.Calibrator._get_data() which is called by
midtools.Dataset. It includes the following data processing steps:
- loading the data as an
xarrayusing theextra_datamodule, - slicing trains and pulses,
- masking,
- binning,
- baseline correction.
Additionally, midtools.corrections provides functions to be applied on
individual workers.
midtools provides the following analysis submodules:
midtools.azimuthal_integration,midtools.correlation,midtools.statistics,midtools.average.
The first three modules use the apply_along_axis method from
dask.array to apply an algorithm on each train or pulse of a run.
The Dataset Class¶
-
class
midtools.Dataset(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]¶ -
__init__(run_number, setupfile, proposal=None, analysis=None, datdir=None, first_train=0, last_train=1000000.0, pulses_per_train=500, dark_run_number=None, train_file='train-file.npy', pulse_file='pulse-file.npy', first_cell=2, train_step=1, pulse_step=1, is_dark=False, localcluster=False, is_flatfield=False, flatfield_run_number=None, out_dir='./', file_identifier=None, trainId_offset=0, **kwargs)[source]¶ Dataset object to analyze MID datasets on Maxwell.
Parameters: - setupfile (str) – Setupfile (.yml) that contains information on the setup parameters.
- analysis (str, optional) –
Flags of the analysis to perform. Defaults to ‘00’. analysis is a string of ones and zeros where a one means to perform the analysis and a zero means to omit the analysis. The analysis types are:
flags | analysis 1000 average frames 0100 SAXS azimuthal integration 0010 XPCS correlation functions 0001 compute statistics - last_train (int, optional) – Index of last train to analyze. If not provided, all trains are processed.
- run_number (int, optional) – Specify run number. Defaults to None. If not defined, the datdir in the setupfile has to contain the .h5 files.
- dark_run_number (int, optional) – Dark run number for calibration.
- pulses_per_train (int, optional) – Specify the number of pulses per train. If not provided, take all stored memory cells.
- train_step (int, optional) – Stepsize for slicing the train axis.
- pulse_step (int, optional) – Stepsize for slicing the pulse axis.
- is_dark (bool, optional) – If True switch to dark routines, i.e., average dark for dark subtraction, calculate mask from darks.
- is_flatfield (bool, optional) – If True use flatfield algorithms.
- flatfield_run_number (int, optional) – Run number of the processed flatfield for calibration.
Note
A setupfile might look like this:
# setup.yml file # Data datdir: /path/to/data/r0522 # Maskfile mask: /path/to/mask/agipd_mask_tmp.npy # Beamline photon_energy: 9 # keV sample_detector: 8 # m pixel_size: 200 # um quadrant_positions: dx: -18 dy: -15 q1: [-500, 650] q2: [-550, -30] q3: [ 570, -216] q4: [ 620, 500] # XPCS xpcs_opt: q_range: q_first: .1 # smallest q in nm-1 q_last: 1. # largest q in nm-1 steps: 10 # how many q-bins
-
agipd_geom¶ AGIPD geometry obtained from extra_data.
Type: AGIPD_1MGeometry
-
center= None¶ Position of the direct beam in pixels
Type: tuple
-
datdir¶ Data directory.
Type: str
-
file_name= None¶ HDF5 file name.
Type: str
-
h5_structure= None¶ Structure of the HDF5 file
Type: dict
-
is_dark= None¶ True if current run is a dark run.
Type: bool
-
is_flatfield= None¶ True if current run is a flatfield run.
Type: bool
-
mask¶ shape(16,512,128) Mask where bad pixels are 0 and good pixels 1.
Type: np.ndarray
-
qmap= None¶ qmap (16, 512, 128)
Type: np.ndarray
-
run= None¶ e.g., returned from extra_data.RunDirectory
Type: DataCollection
-
run_number¶ Number of the run.
Type: int
-
setup= None¶ Xana setup instance
-
setupfile= None¶ Path to the setupfile.
Type: str
-
Analyze MID runs.
usage: midtools [-h] [-r RUN] [-dr DARK_RUN_NUMBER DARK_RUN_NUMBER]
[--last-train LAST_TRAIN] [--first-train FIRST_TRAIN]
[-ppt PULSES_PER_TRAIN] [-ts TRAIN_STEP] [-ps PULSE_STEP]
[--is-dark [IS_DARK]] [--is-flatfield [IS_FLATFIELD]]
[-ffr FLATFIELD_RUN FLATFIELD_RUN] [--out-dir [OUT_DIR]]
[--chunk [CHUNK]] [--job-dir JOB_DIR] [--slurm [SLURM]]
[--localcluster [LOCALCLUSTER]]
[--file-identifier [FILE_IDENTIFIER]]
[--first-cell FIRST_CELL] [--datdir DATDIR]
setupfile analysis
Positional Arguments¶
| setupfile | the YAML file to configure midtools |
| analysis | which analysis to perform. List of 0s and 1s: 1000 saves average data along specific axis, 0100 SAXS routines, 0010 XPCS routines, 0001 statistics (histograms pulse resolved. |
Named Arguments¶
| -r, --run | Run number. |
| -dr, --dark-run-number | |
| Dark run number. | |
| --last-train | last train to analyze. Default: 1000000 |
| --first-train | first train to analyze. Default: 0 |
| -ppt, --pulses-per-train | |
number of pulses per train Default: 500 | |
| -ts, --train-step | |
spacing of trains Default: 1 | |
| -ps, --pulse-step | |
spacing of pulses Default: 1 | |
| --is-dark | whether the run is a dark run Default: False |
| --is-flatfield | whether the run is a flatfield run Default: False |
| -ffr, --flatfield-run | |
| Flatfield run number. | |
| --out-dir | Output directory Default: “./” |
| --chunk | Split the number of trains in chunks of this size (default do not chunk) |
| --job-dir | Directory for the slurm output and error files Default: “/gpfs/exfel/data/scratch/reiserm/mid-proteins/jobs/” |
| --slurm | Run midtools on dedicated node with slurm job (default False) Default: False |
| --localcluster | Use dasks LocalCluster to run midtools locally (default False) Default: False |
| --file-identifier | |
| Identifier at file ending. Default None. | |
| --first-cell | Cell ID of the first AGIPD memory cell with X-rays. Default: 2 |
| --datdir | Path to the data. This argument is only used if the data directory is not provided in the setupfile. |