pyphi.batch module

Created on Mon Apr 11 14:58:35 2022

Batch data is assumed to come in an excel file with first column being batch identifier and following columns being process variables. Optionally the second column labeled ‘PHASE’,’Phase’ or ‘phase’ indicating the phase of exceution

Change log: * added Dec 28 2023 Titles can be sent to contribution plots via plot_title flag

Monitoring diagnostics are also plotted against sample starting with 1

  • added Dec 27 2023 Corrected plots to use correct xaxis starting with sample =1

    Ammended indicator variable alignment not to replace the IV with a linear sequence but to keep orginal data

  • added Dec 4 2023 Added a BatchVIP calculation

  • added Apr 23 2023 Corrected a very dumb mistake I made coding when tired

  • added Apr 18 2023 Added descriptors routine to obtain landmarks of the batch

    such as min,max,ave of a variable [during a phase if indicated so] Modifed plot_var_all_batches to plot against the values in a Time column and also add the legend for the BatchID

  • added Apr 10 2023 Added batch contribution plots

    Added build_rel_time to create a tag of relative run time from a timestamp

  • added Apr 7 2023 Added alignment using indicator variable per phase

  • added Apr 5 2023 Added the capability to monitor a variable in “Soft Sensor” mode

    which implies there are no measurements for it (pure prediction) as oppose to a forecast where there are new measurments coming in time.

  • added Jul 20 2022 Distribution of number of samples per phase plot

  • added Aug 10 2022 refold_horizontal | clean_empty_rows | predict

  • added Aug 12 2022 replicate_batch

@author: S. Garcia-Munoz sgarciam@ic.ak.uk salg@andrew.cmu.edu

pyphi.batch.unique(df, colid)[source]

Return unique values from a DataFrame column, preserving order of first occurrence.

A replacement for np.unique that does not sort the result, returning values in the order they first appear in the DataFrame.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • colid (str) – Name of the column to extract unique values from.

Returns:

Unique values in the order they first appear in df[colid].

Return type:

list

pyphi.batch.mean(X, axis)[source]

Compute the mean of a 2-D array along an axis, ignoring NaN values.

Parameters:
  • X (np.ndarray) – 2-D input array, may contain np.nan.

  • axis (int) – Axis along which to compute the mean. 0 = column-wise (mean of each column across rows). 1 = row-wise (mean of each row across columns).

Returns:

1-D array of mean values, with NaN entries excluded from the denominator so results remain unbiased in the presence of missing data.

Return type:

np.ndarray

pyphi.batch.simple_align(bdata, nsamples)[source]

Align batch data to a common length by linear interpolation on row index.

Resamples every batch to exactly nsamples rows by linearly interpolating each variable against the original row sequence. No phase information is used; all samples are treated as a single continuous trajectory.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column may optionally be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables. Batches are stacked vertically.

  • nsamples (int) – Target number of samples per batch after alignment.

Returns:

Aligned batch data with all batches resampled to nsamples rows. Phase labels (if present) are mapped to the nearest original sample using rounded interpolation indices.

Return type:

pd.DataFrame

pyphi.batch.phase_simple_align(bdata, nsamples)[source]

Align batch data to a common length per phase by linear interpolation.

Resamples each phase of each batch independently to the specified number of samples, then concatenates phases back in order. Requires phase information in the second column.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column must be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables. Batches are stacked vertically.

  • nsamples (dict) – Number of samples to generate per phase. Keys must match the phase labels in the data. Resampling is linear with respect to row number within each phase. Example: {'Heating': 100, 'Reaction': 200, 'Cooling': 10}.

Returns:

Aligned batch data with each phase resampled to the specified number of samples, phases concatenated in key order of nsamples.

Return type:

pd.DataFrame

pyphi.batch.phase_iv_align(bdata, nsamples)[source]

Align batch data using an indicator variable (IV) or row index, per phase.

Provides the most flexible alignment: each phase can be aligned either by linear resampling (default) or by using a monotonically changing process variable (indicator variable) as the alignment axis.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column must be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables. Batches are stacked vertically.

  • nsamples (dict) – Alignment specification per phase. Each value can be: an integer for linear alignment, e.g. {'Heating': 100, 'Reaction': 200}; a 4-element list for IV alignment with known start and end, e.g. ['TIC101', 100, 30, 50] (IVarID, num_samples, start_value, end_value); or a 3-element list for IV alignment with known end only, e.g. ['TIC101', 100, 50] where start_value is taken from the first row of that phase. The indicator variable must be monotonically increasing or decreasing within the phase; non-monotonic samples are removed with a warning before interpolation.

Returns:

Aligned batch data with each phase resampled as specified, phases concatenated in key order of nsamples.

Return type:

pd.DataFrame

pyphi.batch.plot_var_all_batches(bdata, *, which_var=False, plot_title='', mkr_style='.-', phase_samples=False, alpha_=0.2, timecolumn=False, lot_legend=False)[source]

Plot trajectories of one or more variables for all batches.

Produces one Matplotlib figure per variable, with each batch overlaid as a separate line. Optionally adds phase boundary annotations and a batch legend.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column may optionally be a phase label; remaining columns are process variables. Batches are stacked vertically.

  • which_var (list[str], str, or bool) – Variables to plot. If False (default), all process variables are plotted.

  • plot_title (str) – Title applied to all figures. Default ''.

  • mkr_style (str) – Matplotlib line/marker style string, e.g. '.-' (default), 'o', '-'.

  • phase_samples (dict or bool) – Phase structure used to annotate phase boundaries as vertical magenta lines. Pass the same nsamples dict used for alignment. Default False (no annotations).

  • alpha (float) – Transparency of phase boundary lines (0–1). Default 0.2.

  • timecolumn (str or bool) – If a column name is given, the x-axis uses the values in that column instead of the sample sequence index. Default False.

  • lot_legend (bool) – If True, adds a legend showing each batch ID. Default False.

Returns:

Displays one Matplotlib figure per variable.

Return type:

None

pyphi.batch.plot_batch(bdata, which_batch, which_var, *, include_mean_exc=False, include_set=False, phase_samples=False, single_plot=False, plot_title='')[source]

Plot the trajectory of one or more batches for selected variables.

Highlights the specified batch(es) in black against an optional backdrop of all other batch trajectories and/or their mean.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column may optionally be a phase label; remaining columns are process variables. Batches are stacked vertically.

  • which_batch (str or list[str]) – Batch ID(s) to highlight.

  • which_var (str or list[str]) – Variable name(s) to plot.

  • include_mean_exc (bool) – If True, overlays the mean trajectory of all other batches (excluding the highlighted batch) in red. Default False.

  • include_set (bool) – If True, overlays all other batch trajectories in light magenta for context. Default False.

  • phase_samples (dict or bool) – Phase structure for annotating phase boundaries. Pass the same nsamples dict used for alignment. Default False (no annotations).

  • single_plot (bool) – If True, plots all selected variables on a single axis. If False (default), each variable gets its own figure.

  • plot_title (str) – Text appended to each figure title after the batch ID. Default ''.

Returns:

Displays one or more Matplotlib figures.

Return type:

None

pyphi.batch.unfold_horizontal(bdata)[source]

Unfold batch data horizontally (batch-wise unfolding).

Reshapes aligned batch data from the vertical stacked format (samples × variables) into a 2-D matrix where each row is one batch and columns represent all variables at all time points: [Var1_t1, Var1_t2, ..., Var1_tN, Var2_t1, ...].

This is the standard preprocessing step before fitting mpca() or mpls() with batch-wise unfolding.

Parameters:

bdata (pd.DataFrame) – Aligned batch data. First column is batch ID; second column may optionally be a phase label; remaining columns are process variables. All batches must have the same number of rows.

Returns:

  • bdata_hor (pd.DataFrame): Unfolded matrix, one row per batch. First column is batch ID; remaining columns are variable-time combinations.

  • colnames (list[str]): Column names of the unfolded matrix (e.g. ['Var1_1', 'Var1_2', ..., 'VarN_T']).

  • bid (list[str]): Variable-block identifiers — for each column in bdata_hor, the name of the original process variable it belongs to. Used internally to reconstruct block structure.

Return type:

tuple

pyphi.batch.refold_horizontal(xuf, nvars, nsamples)[source]

Refold a horizontally unfolded batch matrix back to 3-D array form.

Inverts the operation of unfold_horizontal(), converting a 2-D unfolded matrix (one row per batch) back to a 3-D arrangement (total_samples × nvars), suitable for conversion back to a DataFrame.

Parameters:
  • xuf (np.ndarray) – Horizontally unfolded batch data, shape (n_batches × (nvars × nsamples)). Strictly numeric — no ID column.

  • nvars (int) – Number of process variables per sample.

  • nsamples (int) – Number of time samples per batch.

Returns:

Refolded array of shape (n_batches × nsamples, nvars), where the rows for each batch are stacked vertically.

Return type:

np.ndarray

pyphi.batch.loadings(mmvm_obj, dim, *, r2_weighted=False, which_var=False)[source]

Plot batch model loadings as a function of sample number.

For each process variable, produces a filled-area plot of the loading (W* for PLS, P for PCA) vs. sample index, so the temporal pattern of each variable’s influence on the model can be inspected visually.

For PLS models with initial conditions (ninit > 0), an additional bar chart is produced for the initial-condition variables.

Parameters:
  • mmvm_obj (dict) – Multi-way PCA model from mpca() or multi-way PLS model from mpls().

  • dim (int) – Component index to plot (1-indexed).

  • r2_weighted (bool) – If True, multiplies each loading by its corresponding per-variable R² value before plotting, so variables that explain more variance appear larger. Default False.

  • which_var (str, list[str], or bool) – Process variable name(s) to plot. If False (default), all variables are plotted.

Returns:

Displays one Matplotlib figure per variable (plus one bar chart for initial conditions if applicable).

Return type:

None

pyphi.batch.loadings_abs_integral(mmvm_obj, *, r2_weighted=False, addtitle=False)[source]

Plot the integral of absolute loadings per variable across all LVs/PCs.

For each latent variable / principal component, produces a bar chart where each bar represents the sum of absolute loading values over all time samples for that variable. This gives a scalar importance measure for each process variable per component.

Parameters:
  • mmvm_obj (dict) – Multi-way PCA model from mpca() or multi-way PLS model from mpls().

  • r2_weighted (bool) – If True, weights each loading by its per-variable R² before summing, emphasising variables that also explain more variance. Default False.

  • addtitle (str or bool) – Optional string to use as the figure title. If False (default), no title is added.

Returns:

Displays one Matplotlib figure per latent variable / PC.

Return type:

None

pyphi.batch.batch_vip(mmvm_obj, *, addtitle=False)[source]

Plot a batch-level VIP score summarising variable importance across all LVs.

Computes a scalar VIP-like score for each process variable by summing the absolute loadings weighted by R²Y across all latent variables, then summing over all time samples. The result is shown as a bar chart, sorted by variable (not by VIP magnitude).

This is conceptually analogous to the standard VIP but adapted for the temporal, unfolded batch model structure.

Parameters:
  • mmvm_obj (dict) – Multi-way PLS model from mpls(). For PCA models the plot uses R²X weighting instead of R²Y.

  • addtitle (str or bool) – Optional string to use as the figure title. If False (default), no title is added.

Returns:

Displays a single Matplotlib bar chart.

Return type:

None

pyphi.batch.r2pv(mmvm_obj, *, which_var=False)[source]

Plot cumulative R² per variable as a function of sample number.

For each process variable, produces a stacked filled-area plot where each band represents the cumulative R² contribution of one LV/PC at each time sample. For PLS models, a separate stacked bar chart is also produced for the Y-space R²pvY.

For models with initial conditions (ninit > 0), an additional bar chart is produced for the initial-condition variable R² values.

Parameters:
  • mmvm_obj (dict) – Multi-way PCA model from mpca() or multi-way PLS model from mpls().

  • which_var (str, list[str], or bool) – Process variable name(s) to plot. If False (default), all variables are plotted.

Returns:

Displays one Matplotlib figure per variable, plus additional figures for Y-space and initial conditions where applicable.

Return type:

None

pyphi.batch.mpca(xbatch, a, *, unfolding='batch wise', phase_samples=False, cross_val=0)[source]

Fit a Multi-way PCA (MPCA) model to aligned batch data.

Unfolds the batch data into a 2-D matrix (batch-wise or variable-wise) and fits a PCA model. Low-variance columns are removed automatically and their positions are restored in the model loadings for consistent interpretation.

Parameters:
  • xbatch (pd.DataFrame) – Aligned batch data, all batches having the same number of samples. First column is batch ID; second column may optionally be a phase label; remaining columns are process variables. Batches are stacked vertically.

  • a (int) – Number of principal components to fit.

  • unfolding (str) – Unfolding strategy. 'batch wise' (default) unfolds to one row per batch; 'variable wise' keeps the observation-per-sample structure.

  • phase_samples (dict or bool) – Phase structure stored in the model for use in plotting functions. Pass the same nsamples dict used for alignment. Default False.

  • cross_val (int) – Cross-validation percentage of elements to remove per round. 0 (default) = no CV; 100 = leave-one-out.

Returns:

Multi-way PCA model object extending the standard PCA dict from pyphi.calc.pca() with additional batch-specific keys:

  • 'varidX' (list[str]): Variable column names in unfolded order.

  • 'bid' (list[str]): Block ID for each column (original variable name).

  • 'uf' (str): Unfolding strategy used ('batch wise').

  • 'phase_samples': Phase structure passed in (for plotting).

  • 'nvars' (int): Number of process variables per sample.

  • 'nbatches' (int): Number of batches in the training set.

  • 'nsamples' (int): Number of samples per batch.

  • 'ninit' (int): Number of initial-condition variables (always 0 for MPCA).

  • 'A' (int): Number of principal components fitted.

Return type:

dict

pyphi.batch.monitor(mmvm_obj, bdata, *, which_batch=False, zinit=False, build_ci=True, shush=False, soft_sensor=False)[source]

Mimic real-time batch monitoring and produce dynamic diagnostic plots.

Two-stage workflow:

Stage 1 — Build confidence intervals (call once after fitting):

monitor(mmvm_obj, training_data)

Simulates monitoring for every batch in bdata, computes per-sample confidence intervals for scores, HT², global SPE, and instantaneous SPE, and writes them back into mmvm_obj in place.

Stage 2 — Monitor a new batch (call after Stage 1):

diags = monitor(mmvm_obj, bdata, which_batch='Batch01')

Simulates real-time monitoring for the specified batch, plots dynamic score, HT², SPE, and (for PLS models) Y-forecast trajectories with the Stage 1 confidence interval overlays.

Parameters:
  • mmvm_obj (dict) – Multi-way PCA or PLS model from mpca() or mpls(). Confidence interval keys are added in Stage 1.

  • bdata (pd.DataFrame) – Batch data (aligned, same structure as training data). Used to look up batch trajectories.

  • which_batch (str, list[str], or bool) – Batch ID(s) to monitor. If False (default), Stage 1 is performed (CI building).

  • zinit (pd.DataFrame or bool) – Initial-condition data for the batch(es) being monitored. Required if the model was fitted with zinit. Default False.

  • build_ci (bool) – If True (default) and which_batch=False, builds and stores confidence intervals in mmvm_obj.

  • shush (bool) – If True, suppresses progress messages. Default False.

  • soft_sensor (str, list[str], or bool) – Variable name(s) to treat as soft-sensor targets — their measurements are set to NaN before prediction so that only model-based estimates are produced. Default False.

Returns:

In Stage 2, returns a diags dictionary (or a list of dicts if multiple batches are requested) with keys:

  • 'Batch' (str): Batch ID.

  • 't_mon' (ndarray): Score trajectories (nsamples × A).

  • 'HT2_mon' (ndarray): Hotelling’s T² trajectory.

  • 'spe_mon' (ndarray): Global SPE trajectory.

  • 'spei_mon' (ndarray): Instantaneous SPE trajectory.

  • 'cont_spe' (list[pd.DataFrame]): SPE contributions per sample.

  • 'cont_spei' (pd.DataFrame): Instantaneous SPE contributions.

  • 'cont_ht2' (list[pd.DataFrame]): HT² contributions per sample.

  • 'forecast' (list[pd.DataFrame]): X-space forecast per sample.

  • 'forecast y' (pd.DataFrame): Y forecast trajectory (PLS models only).

  • 'spe z', 'cont_spe_z', 'cont_ht2_z', 'reconstructed z': Initial-condition diagnostics (if zinit was provided).

Returns 'error batch not found' if the requested batch is not in bdata. In Stage 1, returns None (results written to mmvm_obj).

Return type:

dict or str

pyphi.batch.mpls(xbatch, y, a, *, zinit=False, phase_samples=False, mb_each_var=False, cross_val=0, cross_val_X=False)[source]

Fit a Multi-way PLS (MPLS) model to aligned batch data.

Unfolds the batch data batch-wise, optionally prepends initial-condition variables, and fits a PLS (or Multi-Block PLS) model to predict y. Low-variance columns are removed and their positions are restored for consistent interpretation.

Parameters:
  • xbatch (pd.DataFrame) – Aligned batch data, all batches having the same number of samples. First column is batch ID; second column may optionally be a phase label; remaining columns are process variables. Batches are stacked vertically.

  • y (pd.DataFrame or np.ndarray) – Response matrix, one row per batch. If a DataFrame, the first column is the batch ID.

  • a (int) – Number of latent variables.

  • zinit (pd.DataFrame or bool) – Initial-condition variables, one row per batch. First column must be batch ID. If False (default), no initial conditions are used.

  • phase_samples (dict or bool) – Phase structure stored in the model for use in plotting functions. Default False.

  • mb_each_var (bool) – If True, treats each process variable as a separate block in a Multi-Block PLS model. If False (default), trajectories form a single block (plus an initial-conditions block if zinit is provided).

  • cross_val (int) – Cross-validation level (0 = none, 100 = LOO). Default 0.

  • cross_val_X (bool) – If True, also cross-validates the X-space. Default False.

Returns:

Multi-way PLS model object extending the standard PLS dict from pyphi.calc.pls() (or pyphi.calc.mbpls()) with additional batch-specific keys:

  • 'Yhat' (np.ndarray): In-sample Y predictions.

  • 'varidX' (list[str]): Variable column names in unfolded order.

  • 'bid' (list[str]): Block ID for each column.

  • 'uf' (str): 'batch wise'.

  • 'nvars' (int): Number of process variables per sample.

  • 'nbatches' (int): Number of batches in the training set.

  • 'nsamples' (int): Number of samples per batch.

  • 'A' (int): Number of latent variables fitted.

  • 'phase_samples': Phase structure passed in.

  • 'mb_each_var' (bool): Whether MB-PLS was used.

  • 'ninit' (int): Number of initial-condition variables (0 if zinit was not provided).

Return type:

dict

pyphi.batch.find(a, func)[source]

Return indices of elements in a list that satisfy a predicate function.

Parameters:
  • a (list) – Input list to search.

  • func (callable) – A function that takes a single element and returns True if the element should be included. Example: lambda x: x == 0 finds all zero-valued elements.

Returns:

Indices of elements in a for which func returns True.

Return type:

list[int]

pyphi.batch.clean_empty_rows(X, *, shush=False)[source]

Remove rows that are entirely NaN from a batch DataFrame.

Parameters:
  • X (pd.DataFrame) – Batch data. First column is batch ID; second column may optionally be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables.

  • shush (bool) – If True, suppresses printed output listing removed rows. Default False.

Returns:

Batch data with fully empty rows removed. Returns the original DataFrame unchanged if no empty rows are found.

Return type:

pd.DataFrame

pyphi.batch.phase_sampling_dist(bdata, time_column=False, addtitle=False, use_phases=False)[source]

Plot and return the distribution of samples (or time) consumed per phase.

Produces a histogram panel — one subplot per phase plus one for the total — showing how many samples (or how much time) each batch spends in each phase. Useful for diagnosing alignment issues and batch variability before fitting a model.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column must be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables. Batches are stacked vertically.

  • time_column (str or bool) – If a column name is given, the x-axis represents elapsed time in that column rather than sample count. Default False.

  • addtitle (str or bool) – Optional string used as the overall figure super-title. Default False.

  • use_phases (list[str] or bool) – Subset of phases to include. If False (default), all phases present in the data are used.

Returns:

Nested dictionary {phase: {batch_id: value}} where value is the sample count or elapsed time for that batch in that phase.

Return type:

dict

pyphi.batch.predict(xbatch, mmvm_obj, *, zinit=False)[source]

Generate predictions for all batches in a dataset using a fitted MPCA or MPLS model.

Unfolds xbatch batch-wise, projects through the model, and returns reconstructed X (and predicted Y for PLS models) refolded back to the original batch DataFrame structure.

Parameters:
  • xbatch (pd.DataFrame) – Aligned batch data, same variable structure and number of samples per batch as the training data. First column is batch ID; second column may optionally be a phase label.

  • mmvm_obj (dict) – Multi-way PCA or PLS model from mpca() or mpls().

  • zinit (pd.DataFrame or bool) – Initial-condition data, one row per batch. First column must be batch ID. Required if the model was fitted with initial conditions. Default False.

Returns:

Prediction results with keys:

  • 'Tnew' (ndarray): Batch scores (n_batches × A).

  • 'Xhat' (pd.DataFrame): Reconstructed X in original batch-stacked format (same structure as xbatch).

  • 'speX' (ndarray): X-space SPE per batch.

  • 'T2' (ndarray): Hotelling’s T² per batch.

  • 'Yhat' (pd.DataFrame): Predicted Y (PLS models only), one row per batch with batch IDs as the first column.

  • 'Zhat' (pd.DataFrame): Reconstructed initial conditions (PLS models only, when zinit is provided).

Return type:

dict

pyphi.batch.contributions(mmvmobj, X, cont_type, *, to_obs=False, from_obs=False, lv_space=False, phase_samples=False, dyn_conts=False, which_var=False, plot_title='')[source]

Plot variable contributions to scores, HT², or SPE for a batch model.

Computes and visualises how much each process variable at each time point contributes to the specified monitoring statistic for the given batch(es). Both a summary bar chart (absolute contributions summed over time) and an optional dynamic time-series plot are produced.

Parameters:
  • mmvmobj (dict) – Multi-way PCA or PLS model from mpca() or mpls().

  • X (pd.DataFrame) – Batch data, same structure as the training data (aligned, batch-wise stacked).

  • cont_type (str) – Type of contribution to compute. Options: 'scores', 'ht2', 'spe'.

  • to_obs (list[str] or bool) – Batch ID(s) to diagnose. This argument is required. Default False.

  • from_obs (list[str] or bool) – Reference batch ID(s) for difference-based contributions ('scores' and 'ht2' only). If False (default), the model origin is used as the reference. Ignored for 'spe'.

  • lv_space (int, list[int], or bool) – Component index/indices to compute contributions for ('scores' only). If False (default), contributions are summed across all components.

  • phase_samples (dict or bool) – Phase structure for annotating phase boundaries in dynamic contribution plots. Default False.

  • dyn_conts (bool) – If True, also produces dynamic time-series contribution plots (one per variable) in addition to the summary bar chart. Default False.

  • which_var (str, list[str], or bool) – Variables to include in the dynamic contribution plots (only used when dyn_conts=True). If False (default), all variables are shown.

  • plot_title (str) – Title applied to all figures. Default ''.

Returns:

Displays Matplotlib figures (bar chart always; time-series plots if dyn_conts=True).

Return type:

None

pyphi.batch.build_rel_time(bdata, *, time_unit='min')[source]

Convert a 'Timestamp' column to relative elapsed time from batch start.

For each batch, computes elapsed time since the first timestamp and adds it as a new 'Time (<unit>)' column, replacing the original 'Timestamp' column.

Parameters:
  • bdata (pd.DataFrame) – Batch data containing a 'Timestamp' column with datetime-compatible values. First column is batch ID.

  • time_unit (str) – Unit for the output time column. 'min' (default) produces minutes; 'hr' produces hours; 's' keeps seconds.

Returns:

Batch data with 'Timestamp' replaced by 'Time (<time_unit>)', where values are elapsed time from the start of each batch.

Return type:

pd.DataFrame

pyphi.batch.descriptors(bdata, which_var, desc, *, phase=False)[source]

Compute summary descriptors for batch trajectories, optionally per phase.

Calculates one or more statistical descriptors for each variable in each batch, returning a single row per batch suitable for use as input to a PLS or PCA model.

Parameters:
  • bdata (pd.DataFrame) – Batch data. First column is batch ID; second column may optionally be a phase label ('Phase', 'phase', or 'PHASE'); remaining columns are process variables.

  • which_var (list[str]) – Variable names to compute descriptors for.

  • desc (list[str]) –

    Descriptor types to calculate. Supported values:

    • 'min': Minimum value.

    • 'max': Maximum value.

    • 'mean': Arithmetic mean.

    • 'median': Median value.

    • 'std': Standard deviation (ddof=1).

    • 'var': Variance (ddof=1).

    • 'range': Max minus min.

    • 'ave_slope': Average linear slope (estimated via least squares).

  • phase (list[str] or bool) – If a list of phase names is provided, descriptors are computed separately within each phase, and column names are suffixed with '_<phase>_<descriptor>'. If False (default), descriptors are computed over the full batch trajectory.

Returns:

One row per batch, first column is batch ID, remaining columns are descriptor values named '<variable>_<phase>_<descriptor>' (with phase) or '<variable>_<descriptor>' (without phase).

Return type:

pd.DataFrame