pyphi module
Phi for Python (pyPhi)
By Sal Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com) Added Jan 30 2025
Added a pinv alternative protection in spectra_savgol for the case where inv fails
- Added Jan 20 2025
Added the ‘cca’ flag to the pls routine to calculate CCA between the Ts and each of the Ys (one by one), calculating loadings and scores equivalent to a perfectly orthogonalized OPLS model. The covariant scores (Tcv) the covariant Loadings (Pcv) and predictive weights (Wcv) are added as keys to the model object. [The covariant loadings(Pcv) are equivalent to the predictive loadings in OPLS]
Added cca and cca-multi routines to perform PLS-CCA (alternative to OPLS) as of now cca-multi remains unused.
- Added Nov 18th, 2024
replaced interp2d with RectBivariateSpline
Protected SPE lim calculations for near zero residuals
Added build_polynomial function to create linear regression models with variable selection assited by PLS
- by merge from James
Added spectra preprocessing methods
bootstrap PLS
by Salvador Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com) Added Dec 19th 2023
phi.clean_htmls removes all html files in the working directory
clean_empty_rows returns also the names of the rows removed
- Added May 1st
YMB is now added in the same structure as the XMB
Corrected the dimensionality of the lwpls prediction, it was a double-nested array.
- Added Apr 30 {feliz día de los niños}
Modified Multi-block PLS to include the block name in the variable name
- Added Apr 29
Included the unique routine and adjusted the parse_materials routine so materials and lots are in the same order as in the raw data
- Added Apr 27
Enhanced adapt_pls_4_pyomo to use variable names as indices if flag is sent
- Added Apr 25
Enhanced the varimax_rotation to adjust the r2 and r2pv to the rotated loadings
- Added Apr 21
Re added varimax_rotation with complete model rotation for PCA and PLS
- Added Apr 17
Added tpls and tpls_pred
- Added Apr 15
Added jrpls model and jrpls_pred
Added routines to reconcile columns to rows identifier so that X and R materices correspond correctly
Added routines to reconcile rows across a list of dataframes and produces a list of dataframes containing only those observations present in all dataframes
- Added on Apr 9 2023
Added lpls and lpls_pred routines
Added parse_materials to read linear table and produce R or Ri
- Release as of Nov 23 2022
Added a function to export PLS model to gPROMS code
Release as of Aug 22 2022 What was done:
*Fixed access to NEOS server and use of GAMS instead of IPOPT
Release as of Aug 12 2022 What was done:
Fixed the SPE calculations in pls_pred and pca_pred
Changed to a more efficient inversion in pca_pred (=pls_pred)
Added a pseudo-inverse option in pmp for pca_pred
Relase as of now Aug 2 2022 What was done:
*Added replicate_data
Release Unknown What was done:
Fixed a bug in kernel PCA calculations
Changed the syntax of MBPLS arguments
Corrected a pretty severe error in pls_pred
Fixed a really bizzare one in mbpls
Release Dec 5, 2021 What was done:
*Added some small documentation to utilities routines
Release Jan 15, 2021 What was done:
Added routine cat_2_matrix to conver categorical classifiers to matrices
Added Multi-block PLS model
Release Date: NOv 16, 2020 What was done:
Fixed small bug un clean_low_variances routine
Release Date: Sep 26 2020 What was done:
Added rotation of loadings so that var(t) for ti>=0 is always larger than var(t) for ti<0
Release Date: May 27 2020 What was done:
Added the estimation of PLS models with missind data using
non-linear programming per Journal of Chemometrics, 28(7), pp.575-584.
Release Date: March 30 2020 What was done:
Added the estimation of PCA models with missing data using non-linear programming per Lopez-Negrete et al. J. Chemometrics 2010; 24: 301–311
Release Date: Aug 22 2019
What was done:
This header is now included to track high level changes
fixed LWPLS it works now for scalar and multivariable Y’s
fixed minor bug in phi.pca and phi.pls when mcsX/Y = False
- pyphi.adapt_pls_4_pyomo(plsobj, *, use_var_ids=False)[source]
Routine to create all the parameters in a PLS object to the structure needed by Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
All parameters are added to the original plsobj with the prefix pyo_
plsobj_pyomo = pyphi.adapt_pls_4_pyomo(plsobj, <use_var_ids=False>)
- Parameters:
plsobj – A PLS object created with pyphi.pls
use_var_ids – If True then al Variable IDs from plsobj are used as indexes for the dictionaries requried by Pyomo
- Returns:
A dictionary augmented with all parameters in dictionary format
- Return type:
plsobj_pyomo
- pyphi.bootstrap_pls(X, Y, num_latents, num_samples, **kwargs)[source]
Generates a list of PLS objects to be used in prediction
- pyphi.bootstrap_pls_pred(X_new, bootstrap_pls_obj, quantiles=[0.025, 0.975])[source]
Finds the quantiles predicion using bootstrapped PLS with gaussian errors. Only works with 1d outputs
- pyphi.build_polynomial(data, factors, response, *, bias_term=True)[source]
Function to create linear regression models with variable selection assited by PLS
- Parameters:
data – DataFrame Pandas data frame, first column is the observation id column
factors –
List list of factors to be included in expression. Powers, Mutliplications and divisions are allowed. Eg:
- structure=[
‘Variable 1’ ‘Variable 1^2’ ‘Variable 2’ ‘Variable 3 * Variable 1’ ‘Variable 1^2 / Variable 4’ ]
response – string Response variable in the dataset (must be a column of ‘data’)
- Returns:
Coefficients for factors` factors_out : Factors Xaug,Y : Numpy Arrays with X and Y data eqstr : Full equation
- Return type:
betasOLSlssq
- pyphi.cat_2_matrix(X)[source]
Function to convert categorical data into binary matrices for regression by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
Xmat,XmatMB = cat_2_matrix(X)
- Parameters:
observation (X is a Pandas Data Frame with categorical descriptors for each)
- Returns:
matrix of binary coded data, XmatMB : binary coded data orgainzed as a list of matrices for Multi-block modeling
- Return type:
Xmat
- pyphi.cca(X, Y, tol=1e-06, max_iter=1000)[source]
Perform Canonical Correlation Analysis (CCA) on two datasets, X and Y. by sgarcia@imperial.ac.uk :param X: An (n x p) matrix where n is the number of samples and p is the number of features in X. :type X: numpy.ndarray :param Y: An (n x q) matrix where n is the number of samples and q is the number of features in Y. :type Y: numpy.ndarray :param tol: Tolerance for convergence. Default is 1e-6. :type tol: float :param max_iter: Maximum number of iterations. Default is 1000. :type max_iter: int
- Returns:
Contains canonical correlation (float), and the canonical directions (w_x, w_y).
- Return type:
(tuple)
- pyphi.cca_multi(X, Y, num_components=1, tol=1e-06, max_iter=1000)[source]
Perform Canonical Correlation Analysis (CCA) on two datasets, X and Y, to compute multiple canonical variates. by sgarciam@imperial.ac.uk
- Args::
X (numpy.ndarray): An (n x p) matrix where n is the number of samples and p is the number of features in X. Y (numpy.ndarray): An (n x q) matrix where n is the number of samples and q is the number of features in Y. num_components (int): Number of canonical variates (components) to compute. Default is 1. tol (float): Tolerance for convergence. Default is 1e-6. max_iter (int): Maximum number of iterations. Default is 1000.
- Returns:
Contains canonical correlations, and the canonical direction vectors for X and Y.
- Return type:
(dict)
- pyphi.clean_empty_rows(X, *, shush=False)[source]
Routine to clean a matrix from rows containing all missing data by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
X,rows_removed = pyphi.clean_empty_rows(X)
- Parameters:
X – Matrix to be cleaned of empty rows (all np.nan)
- Returns:
Matrix without observations removed rows_removed: List of rows removed from X
- Return type:
X
- pyphi.clean_htmls()[source]
Routine to clean html files
pyphi.clean_htmls()
Deletes all .html files in the current directory
- Parameters:
none
- Returns:
none
- pyphi.clean_low_variances(X, *, shush=False, min_var=1e-10)[source]
Routine to remove columns of neglegible variance by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
X,columns_removed = pyphi.clean_low_variances(X,<min_var=1E-10,shush=False>)
- Parameters:
X – Matrix to be cleaned for columns of low variance
min_var – minimum required variance to keep a colum (default = 1E-10)
shush – ‘True’ disables output to console
- Returns:
Matrix without low variance columns cols_removed: Columns removed
- Return type:
X_clean
- pyphi.contributions(mvmobj, X, cont_type, *, Y=False, from_obs=False, to_obs=False, lv_space=False)[source]
Function to calculate contributions to diagnostics by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
contrib = pyphi.contributions(mvmobj,X,cont_type,<Y=False,from_obs=False,to_obs=False,lv_space=False>)
- Parameters:
mvmobj – A dictionary created by phi.pls or phi.pca
X/Y – Data [numpy arrays or pandas dataframes] - Y space is optional
cont_type – ‘ht2’ - Contributions to Hotelling’s T2 ‘spe’ - Contributions to SPE space ‘scores’ - Contribution to scores
to_obs – Scalar or list of scalars with observation(s) number(s) to calculate contributions (TO) Note: from_obs is ignored when cont_type=’spe’
from_obs – Scalar or list of scalars with observation(s) number(s) to offset (FROM) if not sent, contribution are calculated with respect to the mean.
lv_space – Latent spaces over which to do the calculations [applicable to ‘ht2’ and ‘scores’] if not sent all dimensions are considered.
- Returns:
A vector of scalars with the corresponding contributions
- Return type:
contrib
- pyphi.export_2_gproms(mvmobj, *, fname='phi_export.txt')[source]
Function to export PLS model to be build a hybrid model in gPROMS
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
pyphi.export_2_gproms(mvmobj,fname=’phi_export.txt’)
Exports the multivariate object coded in gPROMS syntax
typically one would use the variables X_NEW and Y_PRED as the Input/Output variables :param mvmobj: A PLS model createdy with pyphi.pls :param fname: Name of the text file to be created
- Returns:
None
- pyphi.jrpls(Xi, Ri, Y, A, *, shush=False)[source]
JRPLS Algorithm per Garcia-Munoz Chemom.Intel.Lab.Syst., 133, pp.49-62.
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
jrpls_obj = pyphi.jrpls(Xi,Ri,Y,A) :param X = Phys. Prop. dictionary of Dataframes of materials_i x mat. properties:
- X = {‘MatA’:df_with_props_for_mat_A (one row per lot of MatA, one col per property),
‘MatB’:df_with_props_for_mat_B (one row per lot of MatB, one col per property)}
- Parameters:
materials_i (R = Blending ratios dictionary of Dataframes of blends x) –
- R = {‘MatA’: df_with_ratios_of_lots_of_A_used_per_blend,
’MatB’: df_with_ratios_of_lots_of_B_used_per_blend, }
R[i] (Rows of X[i] must correspond to Columns of)
properties (Y = [ b x n ] Product characteristics dataframe of blends x prod.)
identifier (first column of all dataframes is the observation)
- Returns:
A dictionary with all the parameters for the JRPLS model
- Return type:
jrpls_obj
- pyphi.jrpls_pred(rnew, jrplsobj)[source]
Routine to produce the prediction for a new observation of Ri in a JRPLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
preds = pyphi.jrpls_pred(rnew,jrplsobj)
- Parameters:
rnew –
A dictionary with the format: rnew={
’matid’:[(lotid,rvalue )],
}
for example, a prediction for the scenario:
rvalue (material lot to use)
0.5 (API A0129)
0.1 (Lactose Lac0003)
0.2 (Lactose Lac1010)
0.02 (MgSt M0012)
0.18 (MCC MCC0017)
like (would be encoded)
rnew={ – ‘API’:[(‘A0129’,0.5)], ‘Lactose’:[(‘Lac0003’,0.1 ),(‘Lac1010’,0.2 )], ‘MgSt’:[(‘M0012’,0.02)], ‘MCC’:[(‘MCC0017’,0.18)], }
- Returns:
preds ={‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper}
where speR has the speR per each material
- Return type:
preds a dictionary of the form
- pyphi.lpls(X, R, Y, A, *, shush=False)[source]
LPLS Algorithm per Muteki et. al Chemom.Intell.Lab.Syst.85(2007) 186 – 194 by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
lpls_obj = pyphi.lpls(X,R,Y,A)
- Parameters:
properties (Y = [ b x n ] Product characteristics DataFrame of blends x prod.)
materials (R = [ b x m ] Blending ratios DataFrame of blends x)
properties –
- first column of all dataframes is the observation identifier
A = Number of components
- Returns:
A dictionary with all the LPLS parameters
- Return type:
lspls_obj
- pyphi.lpls_pred(rnew, lpls_obj)[source]
Function to evaluate a new observation for LPLS by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
Do a prediction with an LPLS model pred = pyphi.lpls_pred(rnew,lpls_obj)
- Parameters:
rnew – np.array, list or dataframe with elements of rnew if multiple rows are passed, then multiple predictions are done
lpls_obj – LPLS object built with pyphi.lpls routine
- Returns:
A dictionary {‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper}
- Return type:
pred
- pyphi.lwpls(xnew, loc_par, mvmobj, X, Y, *, shush=False)[source]
LWPLS algorithm as in: International Journal of Pharmaceutics 421 (2011) 269– 274
Implemented by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
yhat = pyphi.lwpls(xnew,loc_par,mvmobj,X,Y,<shush=False>)
- Parameters:
xnew (Numpy vector) – Regressor vector to make prediction
loc_par (scalar) – Localization parameter
mvmobj – PLS model between X and Y built with PLS routine
X (DataFrame or Numpy) – Training set for mvmobj (PLS model)
Y (DataFrame or Numpy) – Training set for mvmobj (PLS model)
shush – =’True’ will silent outpuit ‘False’ will display outpuit default if not sent
- Returns:
y prediction from xnew
- pyphi.ma57_dummy_check()[source]
Instantiates a trivial NLP to solve with IPOPT and MA57. :returns: boolean, True if IPOPT solved with SolverStaus.ok :rtype: ma57_ok
- pyphi.mbpls(XMB, YMB, A, *, mcsX=True, mcsY=True, md_algorithm_='nipals', force_nipals_=False, shush_=False, cross_val_=0, cross_val_X_=False)[source]
Function to calculate a Multi-Block PLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
Multi-block PLS model using the approach by Westerhuis, J. Chemometrics, 12, 301–321 (1998)
- mbpls_obj = pyphi.mbpls(XMB,YMB,A,<mcsX=True,mcsY=True,md_algorithm_=’nipals’,force_nipals_=False,
shush_=False,cross_val_=0,cross_val_X_=False>)
- Parameters:
XMB –
Dictionary or PandasDataFrames one key per block of data Dictionary structure: {‘BlockName1’:block_1_data_pd,
’BlockName2’:block_2_data_pd}
YMB –
Dictionary or PandasDataFrame Dictionary structure: {‘BlockName1’:block_1_data_pd,
’BlockName2’:block_2_data_pd}
- Returns:
Dictionary with all the parameters of a Multi-block PLS model
- Return type:
mbpls_obj
- pyphi.meancenterscale(X, *, mcs=True)[source]
Function to mean center and scale a matrix by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
X,xmean,xstd= pyphi.meancenterscale(X,<mcs=Flag>) :param X: Matrix to be meancenterd this call ONLY works with Numpy matrices :param mcs = True | ‘center’ | ‘autoscale’:
- Returns:
Post-processed X matrix xmean: Mean values per column xstd: Standard Deviation values per column
- Return type:
X
- pyphi.np1D2pyomo(arr, *, indexes=False)[source]
Routine to convert a vector in to a 1D dictionary for Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
Xdic=pyphi.np1D2pyomo(X,<varids=varId_list>) :param X: Vector to be converted :type X: Numpy :param varids: False | table of ids to be assigned as indexes
- Returns:
Vector in dictionary format (as Pyomo likes it)
- Return type:
Xdic
- pyphi.np2D2pyomo(arr, *, varids=False)[source]
Routine to convert a Numpy matrix in to a 2D dictionary for Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
Xdic=pyphi.np2D2pyomo(X,<varids=varId_list>) :param X: Matrix to be converted :type X: Numpy :param varids: False | table of ids to be assigned as indexes
- Returns:
Matrix in dictionary format (as Pyomo likes it)
- Return type:
Xdic
- pyphi.parse_materials(filename, sheetname)[source]
- Function to build R matrices for JRPLS model from linear table
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
Routine to parse out compositions from linear table This reads an excel file with four columns:
‘Finished Product Lot’ ‘Material Lot’ ‘Ratio or Quantity’ ‘Material’
where the usage per batch of finished product is recorded. e.g.
- ‘Finished Product Lot’ ‘Material Lot’ ‘Ratio or Quantity’ ‘Material’
A001 A 0.75 Drug A001 B 0.25 Drug A001 Z 1.0 Excipient . . . . . . . . . . . .
- Args:
filename: Name of excel workbook containing the data sheetname: Name of the sheet in the workbook with the data
- Returns:
JR = Joint R matrix of material consumption, list of dataframes materials_used = Names of materials
- pyphi.pca(X, A, *, mcs=True, md_algorithm='nipals', force_nipals=False, shush=False, cross_val=0)[source]
Function to creat a Principal Components Analysis model
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
pca_object = pyphi.pca (X,A,<mcs=True,md_algorithm=’nipals’,force_nipals=False,shush=False,cross_val=0>)
- Parameters:
X (Dataframe or Numpy) – Data to train the model
A (int) – Number of Principal Components to calculate
mcs – ‘True’ : Meancenter + autoscale default if not sent ‘False’ : No pre-processing ‘center’ : Only center ‘autoscale’ : Only autoscale
md_algorithm – Missing Data algorithm to use ‘nipals’ default if not sent ‘nlp’ Uses non-linear programming approach by Lopez-Negrete et al. J. Chemometrics 2010; 24: 301–311
force_nipals – If = True will use NIPALS. = False if X is complete will use SVD. default if not sent
shush – If = True supressess all printed output = False default if not sent
cross_val –
If sent a scalar between 0 and 100, will cross validate element wise removing cross_val% of the data every round
if == 0: Bypass cross-validation default if not sent
- Returns:
A dictionary with all PCA loadings, scores and other diagnostics.
- pyphi.pca_pred(Xnew, pcaobj, *, algorithm='p2mp')[source]
Function to evaluate new data using an already built PCA model
pred = pyphi.pca_pred(Xnew,pcaobj)
- Parameters:
X (DataFrame) – Data to be evaluated with the given PCA model
pcaobj – PCA object created by pyphi.pca routine
- Returns:
Dictionary with reconstructed values for X, Scores, Hotellings T2 and SPE for Xnew
- Return type:
pred
- pyphi.pls(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cross_val=0, cross_val_X=False, cca=False)[source]
Function to create a Projection to Latent Structures model
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
- pls_object = pyphi.pls(X,Y,A,<mcsX=True,mcsY=True,md_algorithm=’nipals’,force_nipals=True,shush=False,
cross_val=0,cross_val_X=False,cca=False>)
- Parameters:
X (DataFrame or Numpy) – Training Data
Y (DataFrame or Numpy) – Training Data
A (int) – Number of Latent Variables to calculate
mcsX/mcsY – ‘True’ : Will meancenter and autoscale the data default if not sent ‘False’ : No pre-processing ‘center’ : Will only center ‘autoscale’ : Will only autoscale
md_algorithm – ‘nipals’ default ‘nlp’ Uses algorithm described in Journal of Chemometrics, 28(7), pp.575-584.
force_nipals – If set to True and if X is complete, will use NIPALS. Otherwise, if X is complete will use SVD.
shush – If set to True supressess all printed output.
cross_val –
If sent a scalar between 0 and 100, will cross validate element wise removing cross_val% of the data every round
if == 0: Bypass cross-validation default if not sent
cross_val_X – True : Calculates Q2 values for the X and Y matrices False: Cross-validation strictly on Y matrix default if not sent
cca – True : Calculates covariable space of X with Y (analog to the predictive space in OPLS) “Tcv” and “Pcv” and the covariant scores and loadings if more than one Y, then there will be as many Tcv and Pcv vectors as columns in Y
- Returns:
A dictionary with all PLS loadings, scores and other diagnostics.
- pyphi.pls_(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cca=False)[source]
- pyphi.pls_pred(Xnew, plsobj)[source]
Function to evaluate new data using an already built PLS model by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com) pred = pyphi.pls_pred(Xnew,plsobj)
- Parameters:
X – Dataframe with new data to be project onto the PCA model
plsobj – PLS object created by pyphi.pls routine
- Returns:
Dictionary with predicted values for X and Y, Scores Hotellings T2 and SPE for Xnew if CCA = True in plsobj, then Tcv is also calculated
- Return type:
pred
- pyphi.reconcile_rows(df_list)[source]
Function to reconcile observations across multiple dataframes
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
df_list_reconciled = pyphi.reconcile_rows(df_list)
Routine to reconcile the observation names in a list of dataframes The returned list of df’s has exacly the same observation names in the same order handy when analyzing data for TPLS, LPLS and JRPLS
- pyphi.reconcile_rows_to_columns(df_list_r, df_list_c)[source]
- Function to reconcile the rows of the dataframes in df_list_r with the
columns in the list of dataframes df_list_r
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
df_list_reconciled_r,df_list_reconciled_c = pyphi.reconcile_rows_to_columns(df_list_r,df_list_c)
handy to align X - R datasets for TPLS, LPLS and JRPLS
- pyphi.spectra_autoscale(Dm)[source]
Autoscaling all spectra to have variance one. Dm: Spectra
Outputs: Processed spectra
- pyphi.spectra_baseline_correction(Dm)[source]
Shifiting all spectra to have minimum zero. Only works with pandas dataframe. Dm: Spectra
Outputs: Processed spectra
- pyphi.spectra_mean_center(Dm)[source]
Mean centering all spectra to have mean zero. Dm: Spectra
Outputs: Processed spectra
- pyphi.spectra_msc(Dm, reference_spectra=None)[source]
Perform Multivariate Scatter Correction transform
- pyphi.spectra_savgol(ws, od, op, Dm)[source]
Function to do row wise Savitzky-Golay filter for spectra by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
Dm_sg, M = pyphi.spectra_savgol(ws,od,op,Dm) :param ws: Window Size :param od: Order of the derivative :param op: Order of the polynomial :param Dm: Spectra
- Returns:
Processed Spectra M: Transformation Matrix for new vector samples
- Return type:
Dm_sg
- pyphi.spectra_snv(x)[source]
Function to do row wise SNV transform for spectroscopic data by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)
X=pyphi.spectra_snv(X)
- Parameters:
x – Spectra dataframe
- Returns:
Post-processed Spectra dataframe
- Return type:
x
- pyphi.tpls(Xi, Ri, Z, Y, A, *, shush=False)[source]
TPLS Algorithm per Garcia-Munoz Chemom.Intel.Lab.Syst., 133, pp.49-62.
by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
tpls_obj = pyphi.tpls(Xi,Ri,Z,Y,A)
- Parameters:
properties (Y = [ b x n ] Product characteristics dataframe of blends x prod.) –
- X = {‘MatA’:df_with_props_for_mat_A (one row per lot of MatA, one col per property),
’MatB’:df_with_props_for_mat_B (one row per lot of MatB, one col per property)}
materials_i (R = Blending ratios dictionary of Dataframes of blends x) –
- R = {‘MatA’: df_with_ratios_of_lots_of_A_used_per_blend,
’MatB’: df_with_ratios_of_lots_of_B_used_per_blend, }
R[i] (Rows of X[i] must correspond to Columns of)
properties
variables (Z = [b x p] Process conditions dataframe of blends x process)
identifier (first column of all dataframes is the observation)
- Returns:
A dictionary with all the parameters for the TPLS model
- Return type:
tpls_obj
- pyphi.tpls_pred(rnew, znew, tplsobj)[source]
Routine to produce the prediction for a new observation of Ri using a TPLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
preds = pyphi.tpls_pred(rnew,znew,tplsobj)
- Parameters:
rnew –
A dictionary with the format: rnew={
’matid’:[(lotid,rvalue )],
}
for example, a prediction for the scenario:
rvalue (material lot to use)
0.5 (API A0129)
0.1 (Lactose Lac0003)
0.2 (Lactose Lac1010)
0.02 (MgSt M0012)
0.18 (MCC MCC0017)
use
rnew={ – ‘API’:[(‘A0129’,0.5)], ‘Lactose’:[(‘Lac0003’,0.1 ),(‘Lac1010’,0.2 )], ‘MgSt’:[(‘M0012’,0.02)], ‘MCC’:[(‘MCC0017’,0.18)], }
znew – Dataframe or numpy with new observation
- Returns:
preds ={‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper,’speZ’:spez}
where speR has the speR per each material
- Return type:
preds a dictionary of the form
- pyphi.unique(df, colid)[source]
returns unique values in the column of a DataFrame in order of occurence by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
replacement of the np.unique routine, specifically for dataframes returns unique values in the order found in the dataframe unique_values = pyphi.unique(df,columnid)
- Parameters:
df – A pandas dataframe
columnid – Column identifier
- Returns:
List of unique values in the order they appear
- Return type:
unique_values
- pyphi.varimax_rotation(mvm_obj, X, *, Y=False)[source]
Function to do a Varimax Rotation on a PCA or PLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)
Routine to perform a VariMax rotation on a PCA or PLS model (I have also tested it with MBPLS models)
rotated_model=varimax_rotation(model_object,X,<Y=Ydata>)
- Parameters:
model_object – A PCA or PLS or MBPLS model object
- Returns:
The same model after VariMax rotation, scores and loadings are all rotated
- Return type:
rotated_model