pyphi.calc module¶
Phi for Python (pyPhi) — Version 2.0
By Sal Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com)
- Added Feb 23 2026
Added _validate_inputs function for input validation and observation reconciliation
Integrated validation into pca, pls, lpls entry points
Replaced np.tile with numpy broadcasting throughout
Optimized _Ab_btbinv with fast path for complete data
var_t (score covariance matrix) stored in model objects to avoid recalculation
Added _extract_array and _calc_r2 helper functions to reduce duplication
Replaced hardcoded F-distribution and chi2 lookup tables with scipy.stats
Replaced hardcoded t-distribution with scipy.stats
- Added Feb 07 2026
fixed cat_2_matrix for the output to be consistent with MBPLS
- Added Jan 30 2025
Added a pinv alternative protection in spectra_savgol for the case where inv fails
- Added Jan 20 2025
Added the ‘cca’ flag to the pls routine to calculate CCA between the Ts and each of the Ys (one by one), calculating loadings and scores equivalent to a perfectly orthogonalized OPLS model. The covariant scores (Tcv) the covariant Loadings (Pcv) and predictive weights (Wcv) are added as keys to the model object. [The covariant loadings(Pcv) are equivalent to the predictive loadings in OPLS]
Added cca and cca-multi routines to perform PLS-CCA (alternative to OPLS) as of now cca-multi remains unused.
- Added Nov 18th, 2024
replaced interp2d with RectBivariateSpline
Protected SPE lim calculations for near zero residuals
Added build_polynomial function to create linear regression models with variable selection assited by PLS
- by merge from James
Added spectra preprocessing methods
bootstrap PLS
by Salvador Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com) Added Dec 19th 2023
phi.clean_htmls removes all html files in the working directory
clean_empty_rows returns also the names of the rows removed
- Added May 1st
YMB is now added in the same structure as the XMB
Corrected the dimensionality of the lwpls prediction, it was a double-nested array.
- Added Apr 30
Modified Multi-block PLS to include the block name in the variable name
- Added Apr 29
Included the unique routine and adjusted the parse_materials routine so materials and lots are in the same order as in the raw data
- Added Apr 27
Enhanced adapt_pls_4_pyomo to use variable names as indices if flag is sent
- Added Apr 25
Enhanced the varimax_rotation to adjust the r2 and r2pv to the rotated loadings
- Added Apr 21
Re added varimax_rotation with complete model rotation for PCA and PLS
- Added Apr 17
Added tpls and tpls_pred
- Added Apr 15
Added jrpls model and jrpls_pred
Added routines to reconcile columns to rows identifier so that X and R materices correspond correctly
Added routines to reconcile rows across a list of dataframes and produces a list of dataframes containing only those observations present in all dataframes
- Added on Apr 9 2023
Added lpls and lpls_pred routines
Added parse_materials to read linear table and produce R or Ri
- Release as of Nov 23 2022
Added a function to export PLS model to gPROMS code
- Release as of Aug 22 2022
Fixed access to NEOS server and use of GAMS instead of IPOPT
- Release as of Aug 12 2022
Fixed the SPE calculations in pls_pred and pca_pred
Changed to a more efficient inversion in pca_pred (=pls_pred)
Added a pseudo-inverse option in pmp for pca_pred
- Release as of Aug 2 2022
Added replicate_data
- pyphi.calc.f99(i, j)[source]¶
Return the F-distribution critical value at 99% confidence.
- Parameters:
df1 (int) – Numerator degrees of freedom.
df2 (int) – Denominator degrees of freedom.
- Returns:
F critical value at alpha = 0.01.
- Return type:
float
- pyphi.calc.f95(i, j)[source]¶
Return the F-distribution critical value at 95% confidence.
- Parameters:
df1 (int) – Numerator degrees of freedom.
df2 (int) – Denominator degrees of freedom.
- Returns:
F critical value at alpha = 0.05.
- Return type:
float
- pyphi.calc.spe_ci(spe)[source]¶
Estimate SPE control limits from training data using a chi-squared approximation.
- Parameters:
spe_values (ndarray) – SPE values from the training set (n_obs × 1).
alpha (float) – Confidence level. Default
0.95(also returns 99%).
- Returns:
(lim95, lim99)— SPE control limits at 95% and 99%.- Return type:
tuple
- pyphi.calc.single_score_conf_int(t)[source]¶
Calculate confidence ellipse radii for score scatter plots.
- Parameters:
mvmobj (dict) – Fitted PCA or PLS model.
alpha (float) – Confidence level. Default
0.95.
- Returns:
Ellipse radii for each pair of scores.
- Return type:
ndarray
- pyphi.calc.scores_conf_int_calc(st, N)[source]¶
Calculate per-score univariate confidence intervals.
- Parameters:
mvmobj (dict) – Fitted PCA or PLS model.
alpha (float) – Confidence level. Default
0.95.
- Returns:
Confidence interval half-widths for each latent variable (A,).
- Return type:
ndarray
- pyphi.calc.z2n(X, X_nan_map)[source]¶
Replace zeros with NaN (zero to NaN).
- Parameters:
X (np.ndarray) – Input array.
- Returns:
Array with zeros replaced by
np.nan.- Return type:
np.ndarray
- pyphi.calc.n2z(X)[source]¶
Replace NaN with zero (NaN to zero).
- Parameters:
X (np.ndarray) – Input array.
- Returns:
(X_filled, nan_map)— array with NaNs replaced by 0, and a boolean mask whereTrueindicates original non-NaN positions.- Return type:
tuple
- pyphi.calc.meancenterscale(X, *, mcs=True)[source]¶
Mean-center and/or scale a data matrix.
- Parameters:
X (np.ndarray) – Data matrix to preprocess (n_obs × n_vars).
mcs (str or bool) – Preprocessing mode.
'autoscale': mean-center and scale to unit variance.'center': mean-center only.False: return unchanged.
- Returns:
(X_processed, x_mean, x_std)— preprocessed matrix, column means, and column standard deviations.- Return type:
tuple
- pyphi.calc.find(a, func)[source]¶
Find row indices where the first column equals a given value.
- Parameters:
X (pd.DataFrame or np.ndarray) – Data matrix to search.
value – Value to search for in the first column.
- Returns:
Row indices where the match was found.
- Return type:
list
- pyphi.calc.pca(X, A, *, mcs=True, md_algorithm='nipals', force_nipals=True, shush=False, cross_val=0)[source]¶
Fit a Principal Component Analysis (PCA) model.
Supports missing data via NIPALS. Can use SVD for complete data as well.
- Parameters:
X (pd.DataFrame or np.ndarray) – Observations × variables matrix. If a DataFrame, the first column must contain observation IDs.
A (int) – Number of principal components to extract.
mcs (str or bool) – Mean-centering/scaling flag.
'autoscale'(default): mean-center and scale to unit variance.'center': mean-center only.False: no preprocessing.md_algorithm (str) – Missing-data algorithm.
'nipals'(default) or'nlp'.force_nipals (bool) – If
True, forces NIPALS even when data is complete. DefaultFalse.cross_val (int) – Cross-validation percentage of elements to remove per round.
0= no CV,100= leave-one-out,1–99= element-wise removal. Default0.shush (bool) – If
True, suppresses printed output. DefaultFalse.tolerance (float) – NIPALS convergence tolerance. Default
1e-10.maxit (int) – Maximum NIPALS iterations per component. Default
5000.
- Returns:
Fitted PCA model with keys:
T(ndarray): Scores matrix (n_obs × A).P(ndarray): Loadings matrix (n_vars × A).r2x(float): Cumulative R² for X.r2xpv(ndarray): Per-variable R² (n_vars × A).mx(ndarray): Variable means used for preprocessing.sx(ndarray): Variable std devs used for preprocessing.var_t(ndarray): Score covariance matrix (A × A).T2(ndarray): Hotelling’s T² for training observations.T2_lim95(float): 95% T² control limit.T2_lim99(float): 99% T² control limit.speX(ndarray): X-space SPE for training observations.speX_lim95(float): 95% SPE control limit.speX_lim99(float): 99% SPE control limit.obsidX(list): Observation IDs (only if X was a DataFrame).varidX(list): Variable IDs (only if X was a DataFrame).q2x(float): Cross-validated Q² (only ifcross_val > 0).
- Return type:
dict
NLP algorithn for missing data as in:
de la Fuente, R.L.N., García‐Muñoz, S. and Biegler, L.T., 2010. An efficient nonlinear programming strategy for PCA models with incomplete data sets. Journal of Chemometrics, 24(6), pp.301-311.
- pyphi.calc.pls(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cross_val=0, cross_val_X=False, cca=False)[source]¶
Fit a Partial Least Squares (PLS) regression model.
Supports missing data in both X and Y via NIPALS. Optionally computes CCA-based covariant components (equivalent to OPLS predictive space).
- Parameters:
X (pd.DataFrame or np.ndarray) – Predictor matrix (n_obs × n_x). If a DataFrame, the first column must contain observation IDs.
Y (pd.DataFrame or np.ndarray) – Response matrix (n_obs × n_y). If a DataFrame, the first column must contain observation IDs. Observation IDs are reconciled with X automatically.
A (int) – Number of latent variables.
mcsX – Preprocessing flags Each can be
'autoscale','center', orFalse. Default'autoscale'.mcsY – Preprocessing flags Each can be
'autoscale','center', orFalse. Default'autoscale'.md_algorithm (str) – Missing-data algorithm.
'nipals'or'nlp''nipals'is (default).force_nipals (bool) – Force NIPALS even for complete data. Default
False.cross_val (int) – Cross-validation level.
0= none,100= LOO,1–99= element-wise. Default0.cross_val_X (bool) – Also cross-validate X-space. Default
False.shush (bool) – Suppress printed output. Default
False.tolerance (float) – NIPALS convergence tolerance. Default
1e-10.maxit (int) – Max NIPALS iterations per component. Default
5000.cca (bool) – If
True, compute CCA-based covariant components and addTcv,Pcv,Wcvto the model. DefaultFalse.
- Returns:
Fitted PLS model with keys:
T(ndarray): X-scores (n_obs × A).P(ndarray): X-loadings (n_x × A).Q(ndarray): Y-loadings (n_y × A).W(ndarray): X-weights (n_x × A).Ws(ndarray): Rotated weights W*(P’W)⁻¹ (n_x × A).r2x(float): Cumulative R² for X.r2xpv(ndarray): Per-variable R² for X (n_x × A).r2y(float): Cumulative R² for Y.r2ypv(ndarray): Per-variable R² for Y (n_y × A).mx,sx(ndarray): X preprocessing parameters.my,sy(ndarray): Y preprocessing parameters.var_t(ndarray): Score covariance matrix (A × A).T2,T2_lim95,T2_lim99: Hotelling’s T² and limits.speX,speX_lim95,speX_lim99: X-space SPE and limits.speY,speY_lim95,speY_lim99: Y-space SPE and limits.obsidX,varidX: IDs (only if X was a DataFrame).obsidY,varidY: IDs (only if Y was a DataFrame).q2x,q2y(float): Cross-validated Q² (ifcross_val > 0).Tcv,Pcv,Wcv: CCA covariant components (ifcca=True).
- Return type:
dict
NLP approach to missing data as in:
Puwakkatiya‐Kankanamage, E.H., García‐Muñoz, S. and Biegler, L.T., 2014. An optimization‐based undeflated PLS (OUPLS) method to handle missing data in the training set. Journal of Chemometrics, 28(7), pp.575-584.
- pyphi.calc.pls_(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cca=False)[source]¶
- pyphi.calc.hott2(mvmobj, *, Xnew=False, Tnew=False)[source]¶
Compute Hotelling’s T² statistic.
- Parameters:
mvmobj (dict) – Fitted PCA or PLS model.
Xnew (pd.DataFrame or np.ndarray) – New X observations (optional). If provided, scores are computed internally before T² calculation.
Tnew (np.ndarray) – Pre-computed scores (optional). Used directly if provided; avoids redundant projection.
- Returns:
T² value for each observation (n_obs,).
- Return type:
ndarray
Note
If neither
XnewnorTnewis provided, returns T² for the training set stored inmvmobj.
- pyphi.calc.pca_pred(Xnew, pcaobj, *, algorithm='p2mp')[source]¶
Project new observations onto a fitted PCA model.
- Parameters:
Xnew (pd.DataFrame or np.ndarray) – New observations to project. Variables must match those used to train
pcaobj.pcaobj (dict) – Fitted PCA model from
pca().algorithm (str) – Projection algorithm.
'p2mp'(default) handles missing data;'standard'uses direct matrix multiplication and requires complete data.
- Returns:
Prediction results with keys:
Tnew(ndarray): Projected scores (n_new × A).Xhat(ndarray): Reconstructed X in original scale.speX(ndarray): SPE for each new observation.T2(ndarray): Hotelling’s T² for each new observation.
- Return type:
dict
- pyphi.calc.pls_pred(Xnew, plsobj)[source]¶
Predict Y for new observations using a fitted PLS model.
- Parameters:
Xnew (pd.DataFrame or np.ndarray) – New predictor observations. Variables must match those used to train
plsobj.plsobj (dict) – Fitted PLS model from
pls().algorithm (str) – Projection algorithm.
'p2mp'(default) handles missing data;'standard'requires complete data.
- Returns:
Prediction results with keys:
Tnew(ndarray): X-scores for new observations (n_new × A).Yhat(ndarray): Predicted Y in original scale (n_new × n_y).Xhat(ndarray): Reconstructed X in original scale.speX(ndarray): X-space SPE for each new observation.T2(ndarray): Hotelling’s T² for each new observation.Tcv(ndarray): CCA covariant scores (only if model hasWcv).
- Return type:
dict
- pyphi.calc.spe(mvmobj, Xnew, *, Ynew=False)[source]¶
Compute Squared Prediction Error (SPE / Q statistic).
- Parameters:
mvmobj (dict) – Fitted PCA or PLS model.
Xnew (pd.DataFrame or np.ndarray) – New X observations.
Ynew (pd.DataFrame or np.ndarray) – New Y observations (optional). Only used for PLS models to also return Y-space SPE.
- Returns:
If
Ynewis not provided (or model is PCA): returnsspeX(ndarray, shape n_obs × 1).If
Ynewis provided and model is PLS: returns(speX, speY)tuple of arrays.
- Return type:
ndarray or tuple
- pyphi.calc.lwpls(xnew, loc_par, mvmobj, X, Y, *, shush=False)[source]¶
Locally Weighted PLS (LWPLS) prediction for a single new observation.
Per Kim et al. Int. J. Pharmaceutics 421 (2011) 269–274.
- Parameters:
xnew (np.ndarray or pd.DataFrame) – Single new observation (1 × n_x).
loc_par (float) – Locality parameter controlling the width of the Gaussian kernel. Larger values include more training observations.
mvmobj (dict) – Global PLS model from
pls(), used to define the score space for distance calculation.X (pd.DataFrame or np.ndarray) – Training X data.
Y (pd.DataFrame or np.ndarray) – Training Y data.
shush (bool) – Suppress printed output. Default
False.
- Returns:
Prediction results with keys:
Yhat(ndarray): Locally predicted Y (1 × n_y).weights(ndarray): Observation weights used in local model.
- Return type:
dict
- pyphi.calc.contributions(mvmobj, X, cont_type, *, Y=False, from_obs=False, to_obs=False, lv_space=False)[source]¶
Compute variable contributions to monitoring statistics.
- Args:
mvmobj (dict): Fitted PCA or PLS model. Xnew (pd.DataFrame or np.ndarray): Observations to diagnose. cont_type (str): Type of contribution to compute.
'scores': contribution to each score.'spex': contribution to X-space SPE.'spey': contribution to Y-space SPE (PLS only).'t2': contribution to Hotelling’s T².- Ynew (pd.DataFrame or np.ndarray): Y observations (optional,
required for
cont_type='spey').
- Returns:
ndarray: Contribution values (n_obs × n_vars).
- Ref: Miller, P., Swanson, R.E. and Heckler, C.E., 1998. Contribution plots: a missing link
in multivariate quality control. Applied mathematics and computer science, 8(4), pp.775-792.
- pyphi.calc.clean_empty_rows(X, *, shush=False)[source]¶
Remove rows that are entirely NaN.
- Parameters:
X (pd.DataFrame or np.ndarray) – Input data matrix.
shush (bool) – Suppress printed output. Default
False.
- Returns:
Data with fully empty rows removed.
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.clean_low_variances(X, *, shush=False, min_var=1e-10)[source]¶
Remove columns with variance below a threshold.
- Parameters:
X (pd.DataFrame or np.ndarray) – Input data matrix.
min_var (float) – Minimum acceptable variance. Default
1e-10.shush (bool) – Suppress printed output. Default
False.
- Returns:
Data with low-variance columns removed.
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_snv(x)[source]¶
Apply Standard Normal Variate (SNV) correction to spectra.
Each spectrum (row) is mean-centered and scaled by its own standard deviation. Removes multiplicative scatter effects.
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths). If a DataFrame, the first column must contain sample IDs.
- Returns:
SNV-corrected spectra (same type as input).
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_savgol(ws, od, op, Dm)[source]¶
Apply Savitzky-Golay smoothing and/or differentiation to spectra.
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths). If a DataFrame, the first column must contain sample IDs.
window (int) – Window length (must be odd and greater than
poly).poly (int) – Polynomial order for the filter.
deriv (int) – Derivative order.
0= smoothing only,1= first derivative,2= second derivative.
- Returns:
Filtered spectra (same type as input).
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_mean_center(Dm)[source]¶
Mean-center each wavelength across the sample set.
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths).
- Returns:
Mean-centered spectra.
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_autoscale(Dm)[source]¶
Autoscale spectra (mean-center and scale each wavelength to unit variance).
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths).
- Returns:
Autoscaled spectra.
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_baseline_correction(Dm)[source]¶
Apply piecewise linear baseline correction to spectra.
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths). If a DataFrame, the first column must contain sample IDs.
anchor_points (list of int) – Column indices to use as baseline anchor points for the piecewise linear interpolation.
- Returns:
Baseline-corrected spectra.
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.spectra_msc(Dm, reference_spectra=None)[source]¶
Apply Multiplicative Scatter Correction (MSC) to spectra.
- Parameters:
X (pd.DataFrame or np.ndarray) – Spectra matrix (n_samples × n_wavelengths). If a DataFrame, the first column must contain sample IDs.
reference (np.ndarray) – Reference spectrum to correct against. Defaults to the mean spectrum of
X.
- Returns:
MSC-corrected spectra (same type as input).
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.bootstrap_pls(X, Y, num_latents, num_samples, **kwargs)[source]¶
Estimate PLS loading uncertainty via bootstrap resampling.
- Parameters:
X (pd.DataFrame or np.ndarray) – Training X data.
Y (pd.DataFrame or np.ndarray) – Training Y data.
A (int) – Number of latent variables.
n_boots (int) – Number of bootstrap iterations.
mcs (tuple) – Preprocessing flags. Default
('autoscale', 'autoscale').shush (bool) – Suppress per-iteration output. Default
True.
- Returns:
Bootstrap results with keys:
W_boot(ndarray): Bootstrap distribution of W (n_boots × n_x × A).Q_boot(ndarray): Bootstrap distribution of Q (n_boots × n_y × A).W_mean,W_std: Mean and std of bootstrap W.Q_mean,Q_std: Mean and std of bootstrap Q.
- Return type:
dict
- pyphi.calc.bootstrap_pls_pred(X_new, bootstrap_pls_obj, quantiles=[0.025, 0.975])[source]¶
Predict Y with uncertainty estimates using a bootstrap PLS ensemble.
- Parameters:
Xnew (pd.DataFrame or np.ndarray) – New X observations to predict.
boot_obj (dict) – Bootstrap model from
bootstrap_pls().alpha (float) – Confidence level for prediction intervals. Default
0.95.
- Returns:
Prediction results with keys:
Yhat(ndarray): Mean predicted Y (n_new × n_y).Yhat_lb(ndarray): Lower bound of prediction interval.Yhat_ub(ndarray): Upper bound of prediction interval.Yhat_std(ndarray): Std dev of bootstrap predictions.
- Return type:
dict
- pyphi.calc.np2D2pyomo(arr, *, varids=False)[source]¶
Convert a 2D NumPy array to a Pyomo-compatible dictionary.
- Parameters:
data (np.ndarray) – 2D array to convert.
- Returns:
Dictionary keyed by
(i, j)integer index tuples.- Return type:
dict
- pyphi.calc.np1D2pyomo(arr, *, indexes=False)[source]¶
Convert a 1D NumPy array to a Pyomo-compatible dictionary.
- Parameters:
data (np.ndarray) – 1D array to convert.
- Returns:
Dictionary keyed by integer index.
- Return type:
dict
- pyphi.calc.adapt_pls_4_pyomo(plsobj, *, use_var_ids=False)[source]¶
Convert PLS model arrays to Pyomo-compatible dictionaries.
Transforms
P,Q,W,Ws,mx,sx,my,syinto the indexed dict format required by PyomoParamobjects.- Parameters:
plsobj (dict) – Fitted PLS model from
pls().- Returns:
Model parameters as Pyomo-indexed dictionaries.
- Return type:
dict
- pyphi.calc.prep_pca_4_MDbyNLP(pcaobj, X)[source]¶
Prepare a PCA model for missing-data imputation by NLP.
Extracts and formats the loadings and preprocessing parameters needed to set up a Pyomo optimization problem for MD imputation.
- Parameters:
pcaobj (dict) – Fitted PCA model from
pca().- Returns:
Parameters formatted for use in a Pyomo MD-by-NLP formulation.
- Return type:
dict
- pyphi.calc.prep_pls_4_MDbyNLP(plsobj, X, Y)[source]¶
Prepare a PLS model for missing-data imputation by NLP.
- Parameters:
plsobj (dict) – Fitted PLS model from
pls().- Returns:
Parameters formatted for use in a Pyomo MD-by-NLP formulation.
- Return type:
dict
- pyphi.calc.conv_pls_2_eiot(plsobj, *, r_length=False)[source]¶
Convert a PLS model for use in EIOT (Extended Iterative Optimization Technology).
- Parameters:
plsobj (dict) – Fitted PLS model from
pls().r2y_threshold (float) – Minimum cumulative R²Y to determine the number of LVs to retain. Default
0.95.
- Returns:
EIOT-compatible model parameters.
- Return type:
dict
- pyphi.calc.cat_2_matrix(X)[source]¶
Convert a categorical variable column to a binary indicator matrix.
- Parameters:
x (pd.DataFrame) – Data frame with columns of categorical data First column is the variable ID.
shush (bool) – Suppress printed output. Default
False.
- Returns:
Binary indicator matrix with one column per unique category (same type as input), all categories concatenated
xmb (pd.DataFrame): Binary indicator matrix with one column per unique category (same type as input) categories organized by block for multi-block models (if DataFrame has multiple columns)
- Return type:
x_binary (pd.DataFrame)
- pyphi.calc.mbpls(XMB, YMB, A, *, mcsX=True, mcsY=True, md_algorithm_='nipals', force_nipals_=False, shush_=False, cross_val_=0, cross_val_X_=False, cca=False)[source]¶
Fit a Multi-Block PLS (MBPLS) model.
- Parameters:
Xmb (dict) – Dictionary of X blocks
{'block_name': pd.DataFrame}. Each DataFrame’s first column must contain observation IDs.Y (pd.DataFrame or np.ndarray) – Response matrix. First column is observation IDs if a DataFrame.
A (int) – Number of latent variables.
mcs (tuple) – Preprocessing flags
(mcs_X, mcs_Y). Default('autoscale', 'autoscale').shush (bool) – Suppress printed output. Default
False.cross_val (int) – Cross-validation level (same as
pls()).cross_val_X (bool) – Cross-validate X-space. Default
False.
- Returns:
Fitted MBPLS model, extending the standard PLS model dict with per-block keys:
T(ndarray): Super-scores.Tb(dict): Per-block scores keyed by block name.Pb(dict): Per-block loadings.Wb(dict): Per-block weights.r2xb(dict): Per-block R² contributions.block_importance(ndarray): Variance importance per block.
Plus all standard PLS keys (
Q,r2y,speX, etc.).- Return type:
dict
- pyphi.calc.replicate_data(mvm_obj, X, num_replicates, *, as_set=False, rep_Y=False, Y=False)[source]¶
Augment a dataset by adding small noise replicates.
Useful for regularizing models when training data is limited.
- Parameters:
X (pd.DataFrame or np.ndarray) – Original data matrix.
n_reps (int) – Number of noisy replicates to add. Default
2.noise_level (float) – Standard deviation of additive Gaussian noise relative to each variable’s std dev. Default
0.01.
- Returns:
Augmented matrix with original + replicated rows (same type as input).
- Return type:
pd.DataFrame or np.ndarray
- pyphi.calc.export_2_gproms(mvmobj, *, fname='phi_export.txt')[source]¶
Export PLS model to gPROMS syntax.
- pyphi.calc.unique(df, colid)[source]¶
Return unique values preserving original order.
- Parameters:
x (list or np.ndarray) – Input sequence.
- Returns:
Unique values in the order they first appear.
- Return type:
list
- pyphi.calc.parse_materials(filename, sheetname)[source]¶
Build R matrices for JRPLS from linear table in Excel.
- pyphi.calc.reconcile_rows(df_list)[source]¶
Align two DataFrames by their observation IDs (first column).
Reorders Y to match the row order of X. Observations present in one but not the other are dropped, with a warning printed.
- Parameters:
X (pd.DataFrame) – Reference DataFrame. First column is observation IDs.
Y (pd.DataFrame) – DataFrame to align. First column is observation IDs.
- Returns:
(X_aligned, Y_aligned)— DataFrames sharing the same ordered set of observation IDs.- Return type:
tuple
- pyphi.calc.reconcile_rows_to_columns(df_list_r, df_list_c)[source]¶
Map DataFrame rows to the columns of another DataFrame.
Used in L-shaped data structures where material lot IDs appear as column headers in X and as row IDs in R.
- Parameters:
X (pd.DataFrame) – Process data where columns (after the first) correspond to lot IDs.
R (pd.DataFrame) – Material property data where the first column contains lot IDs.
- Returns:
(X_matched, R_matched)— aligned matrices ready for LPLS.- Return type:
tuple
- pyphi.calc.lpls(X, R, Y, A, *, shush=False)[source]¶
Fit an L-shaped PLS (LPLS) model.
Models the relationship between lot physical properties (R), process observations (X), and product quality (Y), where X rows correspond to lots described by R columns.
Per Muteki et al., Chemom. Intell. Lab. Syst. 85 (2007) 186–194.
- Parameters:
X (pd.DataFrame or np.ndarray) – Process data matrix (n_obs × n_x). First column is observation IDs if a DataFrame.
R (pd.DataFrame or np.ndarray) – Raw material property matrix (n_lots × n_r). Columns of X map to rows of R.
Y (pd.DataFrame or np.ndarray) – Quality/response matrix (n_lots × n_y). Rows match rows of R.
A (int) – Number of latent variables.
shush (bool) – Suppress printed output. Default
False.
- Returns:
Fitted LPLS model with keys:
T(ndarray): X-space scores (n_obs × A).P(ndarray): X-loadings (n_x × A).Q(ndarray): Y-loadings (n_y × A).H(ndarray): R-space scores (n_lots × A).V(ndarray): R-space loadings (n_r × A).Rscores(ndarray): R projected scores.Ss(ndarray): Rotated R weights S*(V’S)⁻¹.r2x,r2xpv: R² for X space.r2y,r2ypv: R² for Y space.r2r,r2rpv: R² for R space.mx,sx,my,sy,mr,sr: Preprocessing params.var_t: Score covariance matrix.T2,T2_lim95,T2_lim99: Hotelling’s T² and limits.speX,speX_lim95,speX_lim99: X SPE and limits.speY,speY_lim95,speY_lim99: Y SPE and limits.speR,speR_lim95,speR_lim99: R SPE and limits.
- Return type:
dict
- pyphi.calc.lpls_pred(rnew, lpls_obj)[source]¶
Predict Y for new lot(s) using a fitted LPLS model.
- Parameters:
rnew (np.ndarray or pd.DataFrame) – R-space observation(s) for new lot(s). Variables must match those in
lpls_obj.lpls_obj (dict) – Fitted LPLS model from
lpls().
- Returns:
Prediction results with keys:
Tnew(ndarray): Projected scores (n_new × A).Yhat(ndarray): Predicted Y in original scale.speR(ndarray): R-space SPE for each new lot.
- Return type:
dict
- pyphi.calc.jrpls(Xi, Ri, Y, A, *, shush=False)[source]¶
Fit a Joint R-LPLS (JRPLS) model across multiple campaigns.
Extends LPLS to handle multiple manufacturing campaigns, each with their own X (process) and R (raw material) blocks sharing a common Y.
Per Garcia-Munoz, Chemom. Intell. Lab. Syst. 133 (2014) 49–62.
- Parameters:
Xi (dict) – Process data blocks
{'campaign': pd.DataFrame}. Each DataFrame’s first column is observation IDs.Ri (dict) – Raw material property blocks
{'campaign': pd.DataFrame}. Keys must matchXi. First column is lot IDs.Y (pd.DataFrame or np.ndarray) – Shared response matrix. Rows match lots across all campaigns.
A (int) – Number of latent variables.
shush (bool) – Suppress printed output. Default
False.
- Returns:
- Fitted JRPLS model with per-campaign sub-dicts and shared keys.
Structure mirrors
lpls()output but indexed by campaign.
- Return type:
dict
- pyphi.calc.jrpls_pred(rnew, jrplsobj)[source]¶
Predict Y for a new observation using a fitted JRPLS model.
- Args:
- xnew (pd.DataFrame or np.ndarray): New process observation(s).
Variables must match the specified campaign’s X block.
- rnew (pd.DataFrame or np.ndarray): New raw material lot properties.
Variables must match the specified campaign’s R block.
campaign (str): Name of the campaign this observation belongs to. jrpls_obj (dict): Fitted JRPLS model from
jrpls().- Returns:
dict: Prediction results with keys:
Tnew(ndarray): Projected X-scores.Yhat(ndarray): Predicted Y in original scale.speX(ndarray): X-space SPE.speR(ndarray): R-space SPE.T2(ndarray): Hotelling’s T².
Example
- rnew={
‘MAT1’: [(‘A0129’,0.557949425 ),(‘A0130’,0.442050575 )], ‘MAT2’: [(‘Lac0003’,1)], ‘MAT3’: [(‘TLC018’, 1) ], ‘MAT4’: [(‘M0012’, 1) ], ‘MAT5’:[(‘CS0017’, 1) ] }
- pyphi.calc.tpls(Xi, Ri, Z, Y, A, *, shush=False)[source]¶
Fit a TPLS model.
Models relationships between time-varying process trajectories (Z), raw material properties (R), and product quality (Y).
- Parameters:
Z (pd.DataFrame or np.ndarray) – Process trajectory matrix. First column is observation IDs if a DataFrame.
Xi (dict) – Process data blocks
{'campaign': pd.DataFrame}. Each DataFrame’s first column is observation IDs.Ri (dict) – Raw material property blocks
{'campaign': pd.DataFrame}. Keys must matchXi. First column is lot IDs.Y (pd.DataFrame or np.ndarray) – Shared response matrix. Rows match lots across all campaigns.
A (int) – Number of latent variables.
shush (bool) – Suppress printed output. Default
False.
- Returns:
- Fitted TPLS model. Keys mirror
jrpls()with an additional Ws(ndarray) rotated weight matrix for Z-space.
- Fitted TPLS model. Keys mirror
- Return type:
dict
- pyphi.calc.jypls(Xi, Yi, A, *, shush=False)[source]¶
Fit a Joint-Y PLS (JYPLS) model across multiple campaigns.
Each campaign has its own X block (different variables allowed), but all campaigns share a common Y column space and a jointly estimated Q matrix.
Per Garcia-Munoz, MacGregor, Kourti, Chemom. Intell. Lab. Syst. 79 (2005) 101–114.
- Parameters:
Xi (dict) – Predictor blocks
{'campaign_name': pd.DataFrame}. Each X can have a different number of columns. First column of each DataFrame is observation IDs.Yi (dict) – Response blocks
{'campaign_name': pd.DataFrame}. Keys must matchXi. All Y blocks must have identical columns (same Y variable space across campaigns). First column of each DataFrame is observation IDs.A (int) – Number of latent variables.
shush (bool) – Suppress printed output. Default
False.
- Returns:
Fitted JYPLS model with keys:
Q(ndarray): Shared Y-loadings (n_y × A).T(dict): Per-campaign X-scores.P(dict): Per-campaign X-loadings.W(dict): Per-campaign X-weights.Ws(dict): Per-campaign rotated weights W*(P’W)⁻¹.r2xi(dict): Per-campaign R² for X.r2yi(dict): Per-campaign R² for Y.r2y(float): Overall R² for Y.mx,sx(dict): Per-campaign X preprocessing params.my,sy(ndarray): Shared Y preprocessing params.blk_scale(dict): Per-campaign block scaling factors.var_t(ndarray): Pooled score covariance matrix.campaigns(list): Ordered list of campaign names.
- Return type:
dict
- pyphi.calc.jypls_pred(xnew, campaign, jypls_obj)[source]¶
Predict Y for a new observation using a fitted JYPLS model.
- Parameters:
- Returns:
Prediction results with keys:
Tnew(ndarray): Projected X-scores (n_new × A).Yhat(ndarray): Predicted Y in original scale (n_new × n_y).speX(ndarray): X-space SPE for each new observation.T2(ndarray): Hotelling’s T² using pooled score covariance.
- Return type:
dict
- pyphi.calc.tpls_pred(rnew, znew, tplsobj)[source]¶
Predict Y for new observations using a fitted TPLS model.
- Args:
rnew (np.ndarray or pd.DataFrame): New R-space (raw material) data. znew (np.ndarray or pd.DataFrame): New Z-space (trajectory) data. tpls_obj (dict): Fitted TPLS model from
tpls().- Returns:
dict: Prediction results with keys:
Tnew(ndarray): Projected scores.Yhat(ndarray): Predicted Y in original scale.speR(ndarray): R-space SPE.speZ(ndarray): Z-space SPE.T2(ndarray): Hotelling’s T².
Example for rnew:
- rnew={
‘MAT1’: [(‘A0129’,0.557949425 ),(‘A0130’,0.442050575 )], ‘MAT2’: [(‘Lac0003’,1)], ‘MAT3’: [(‘TLC018’, 1) ], ‘MAT4’: [(‘M0012’, 1) ], ‘MAT5’:[(‘CS0017’, 1) ] }
- pyphi.calc.varimax_rotation(mvm_obj, X, *, Y=False)[source]¶
Apply Varimax rotation to PCA or PLS loadings.
Rotates loadings toward a simple structure (sparse, interpretable). Updates the model object in-place and returns the rotated model.
- Parameters:
mvm_obj (dict) – Fitted PCA or PLS model.
X (pd.DataFrame or np.ndarray) – Training X data used to reproject scores after rotation.
Y (pd.DataFrame or np.ndarray) – Training Y data (optional, for PLS).
- Returns:
Model with rotated loadings and reprojected scores.
- Return type:
dict
- pyphi.calc.findstr(string)[source]¶
Find indices of strings containing a given pattern.
- Parameters:
str_list (list of str) – List of strings to search.
pattern (str) – Substring to search for.
- Returns:
Indices of elements in
str_listthat containpattern.- Return type:
list
- pyphi.calc.build_polynomial(data, factors, response, *, bias_term=True)[source]¶
Linear regression with variable selection assisted by PLS.
- pyphi.calc.cca(X, Y, tol=1e-06, max_iter=1000)[source]¶
Canonical Correlation Analysis (CCA) between PLS scores and Y.
Computes the maximum covariance directions between the score matrix T and response Y. Equivalent to computing the predictive component in OPLS.
- Parameters:
T (np.ndarray) – Score matrix from a fitted PLS model (n_obs × A).
Y (pd.DataFrame or np.ndarray) – Response matrix (n_obs × n_y).
mcs (tuple) – Preprocessing flags for T and Y. Default
('autoscale', 'autoscale').
- Returns:
CCA results with keys:
Tcv(ndarray): Covariant scores.Pcv(ndarray): Covariant loadings (predictive loadings in OPLS sense).Wcv(ndarray): Covariant weights.
- Return type:
dict