pyphi module

Phi for Python (pyPhi)

By Sal Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com) Added Jan 30 2025

  • Added a pinv alternative protection in spectra_savgol for the case where inv fails

Added Jan 20 2025
  • Added the ‘cca’ flag to the pls routine to calculate CCA between the Ts and each of the Ys (one by one), calculating loadings and scores equivalent to a perfectly orthogonalized OPLS model. The covariant scores (Tcv) the covariant Loadings (Pcv) and predictive weights (Wcv) are added as keys to the model object. [The covariant loadings(Pcv) are equivalent to the predictive loadings in OPLS]

  • Added cca and cca-multi routines to perform PLS-CCA (alternative to OPLS) as of now cca-multi remains unused.

Added Nov 18th, 2024
  • replaced interp2d with RectBivariateSpline

  • Protected SPE lim calculations for near zero residuals

  • Added build_polynomial function to create linear regression models with variable selection assited by PLS

by merge from James
  • Added spectra preprocessing methods

  • bootstrap PLS

by Salvador Garcia (sgarciam@ic.ac.uk salvadorgarciamunoz@gmail.com) Added Dec 19th 2023

  • phi.clean_htmls removes all html files in the working directory

  • clean_empty_rows returns also the names of the rows removed

Added May 1st
  • YMB is now added in the same structure as the XMB

  • Corrected the dimensionality of the lwpls prediction, it was a double-nested array.

Added Apr 30 {feliz día de los niños}
  • Modified Multi-block PLS to include the block name in the variable name

Added Apr 29
  • Included the unique routine and adjusted the parse_materials routine so materials and lots are in the same order as in the raw data

Added Apr 27
  • Enhanced adapt_pls_4_pyomo to use variable names as indices if flag is sent

Added Apr 25
  • Enhanced the varimax_rotation to adjust the r2 and r2pv to the rotated loadings

Added Apr 21
  • Re added varimax_rotation with complete model rotation for PCA and PLS

Added Apr 17
  • Added tpls and tpls_pred

Added Apr 15
  • Added jrpls model and jrpls_pred

  • Added routines to reconcile columns to rows identifier so that X and R materices correspond correctly

  • Added routines to reconcile rows across a list of dataframes and produces a list of dataframes containing only those observations present in all dataframes

Added on Apr 9 2023
  • Added lpls and lpls_pred routines

  • Added parse_materials to read linear table and produce R or Ri

Release as of Nov 23 2022
  • Added a function to export PLS model to gPROMS code

Release as of Aug 22 2022 What was done:

*Fixed access to NEOS server and use of GAMS instead of IPOPT

Release as of Aug 12 2022 What was done:

  • Fixed the SPE calculations in pls_pred and pca_pred

  • Changed to a more efficient inversion in pca_pred (=pls_pred)

  • Added a pseudo-inverse option in pmp for pca_pred

Relase as of now Aug 2 2022 What was done:

*Added replicate_data

Release Unknown What was done:

  • Fixed a bug in kernel PCA calculations

  • Changed the syntax of MBPLS arguments

  • Corrected a pretty severe error in pls_pred

  • Fixed a really bizzare one in mbpls

Release Dec 5, 2021 What was done:

*Added some small documentation to utilities routines

Release Jan 15, 2021 What was done:

  • Added routine cat_2_matrix to conver categorical classifiers to matrices

  • Added Multi-block PLS model

Release Date: NOv 16, 2020 What was done:

  • Fixed small bug un clean_low_variances routine

Release Date: Sep 26 2020 What was done:

  • Added rotation of loadings so that var(t) for ti>=0 is always larger than var(t) for ti<0

Release Date: May 27 2020 What was done:

  • Added the estimation of PLS models with missind data using

non-linear programming per Journal of Chemometrics, 28(7), pp.575-584.

Release Date: March 30 2020 What was done:

  • Added the estimation of PCA models with missing data using non-linear programming per Lopez-Negrete et al. J. Chemometrics 2010; 24: 301–311

Release Date: Aug 22 2019

What was done:

  • This header is now included to track high level changes

  • fixed LWPLS it works now for scalar and multivariable Y’s

  • fixed minor bug in phi.pca and phi.pls when mcsX/Y = False

pyphi.adapt_pls_4_pyomo(plsobj, *, use_var_ids=False)[source]

Routine to create all the parameters in a PLS object to the structure needed by Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

All parameters are added to the original plsobj with the prefix pyo_

plsobj_pyomo = pyphi.adapt_pls_4_pyomo(plsobj, <use_var_ids=False>)

Parameters:
  • plsobj – A PLS object created with pyphi.pls

  • use_var_ids – If True then al Variable IDs from plsobj are used as indexes for the dictionaries requried by Pyomo

Returns:

A dictionary augmented with all parameters in dictionary format

Return type:

plsobj_pyomo

pyphi.bootstrap_pls(X, Y, num_latents, num_samples, **kwargs)[source]

Generates a list of PLS objects to be used in prediction

pyphi.bootstrap_pls_pred(X_new, bootstrap_pls_obj, quantiles=[0.025, 0.975])[source]

Finds the quantiles predicion using bootstrapped PLS with gaussian errors. Only works with 1d outputs

pyphi.build_polynomial(data, factors, response, *, bias_term=True)[source]

Function to create linear regression models with variable selection assited by PLS

Parameters:
  • data – DataFrame Pandas data frame, first column is the observation id column

  • factors

    List list of factors to be included in expression. Powers, Mutliplications and divisions are allowed. Eg:

    structure=[

    ‘Variable 1’ ‘Variable 1^2’ ‘Variable 2’ ‘Variable 3 * Variable 1’ ‘Variable 1^2 / Variable 4’ ]

  • response – string Response variable in the dataset (must be a column of ‘data’)

Returns:

Coefficients for factors` factors_out : Factors Xaug,Y : Numpy Arrays with X and Y data eqstr : Full equation

Return type:

betasOLSlssq

pyphi.cat_2_matrix(X)[source]

Function to convert categorical data into binary matrices for regression by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

Xmat,XmatMB = cat_2_matrix(X)

Parameters:

observation (X is a Pandas Data Frame with categorical descriptors for each)

Returns:

matrix of binary coded data, XmatMB : binary coded data orgainzed as a list of matrices for Multi-block modeling

Return type:

Xmat

pyphi.cca(X, Y, tol=1e-06, max_iter=1000)[source]

Perform Canonical Correlation Analysis (CCA) on two datasets, X and Y. by sgarcia@imperial.ac.uk :param X: An (n x p) matrix where n is the number of samples and p is the number of features in X. :type X: numpy.ndarray :param Y: An (n x q) matrix where n is the number of samples and q is the number of features in Y. :type Y: numpy.ndarray :param tol: Tolerance for convergence. Default is 1e-6. :type tol: float :param max_iter: Maximum number of iterations. Default is 1000. :type max_iter: int

Returns:

Contains canonical correlation (float), and the canonical directions (w_x, w_y).

Return type:

(tuple)

pyphi.cca_multi(X, Y, num_components=1, tol=1e-06, max_iter=1000)[source]

Perform Canonical Correlation Analysis (CCA) on two datasets, X and Y, to compute multiple canonical variates. by sgarciam@imperial.ac.uk

Args::

X (numpy.ndarray): An (n x p) matrix where n is the number of samples and p is the number of features in X. Y (numpy.ndarray): An (n x q) matrix where n is the number of samples and q is the number of features in Y. num_components (int): Number of canonical variates (components) to compute. Default is 1. tol (float): Tolerance for convergence. Default is 1e-6. max_iter (int): Maximum number of iterations. Default is 1000.

Returns:

Contains canonical correlations, and the canonical direction vectors for X and Y.

Return type:

(dict)

pyphi.clean_empty_rows(X, *, shush=False)[source]

Routine to clean a matrix from rows containing all missing data by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

X,rows_removed = pyphi.clean_empty_rows(X)

Parameters:

X – Matrix to be cleaned of empty rows (all np.nan)

Returns:

Matrix without observations removed rows_removed: List of rows removed from X

Return type:

X

pyphi.clean_htmls()[source]

Routine to clean html files

pyphi.clean_htmls()

Deletes all .html files in the current directory

Parameters:

none

Returns:

none

pyphi.clean_low_variances(X, *, shush=False, min_var=1e-10)[source]

Routine to remove columns of neglegible variance by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

X,columns_removed = pyphi.clean_low_variances(X,<min_var=1E-10,shush=False>)

Parameters:
  • X – Matrix to be cleaned for columns of low variance

  • min_var – minimum required variance to keep a colum (default = 1E-10)

  • shush – ‘True’ disables output to console

Returns:

Matrix without low variance columns cols_removed: Columns removed

Return type:

X_clean

pyphi.contributions(mvmobj, X, cont_type, *, Y=False, from_obs=False, to_obs=False, lv_space=False)[source]

Function to calculate contributions to diagnostics by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

contrib = pyphi.contributions(mvmobj,X,cont_type,<Y=False,from_obs=False,to_obs=False,lv_space=False>)

Parameters:
  • mvmobj – A dictionary created by phi.pls or phi.pca

  • X/Y – Data [numpy arrays or pandas dataframes] - Y space is optional

  • cont_type – ‘ht2’ - Contributions to Hotelling’s T2 ‘spe’ - Contributions to SPE space ‘scores’ - Contribution to scores

  • to_obs – Scalar or list of scalars with observation(s) number(s) to calculate contributions (TO) Note: from_obs is ignored when cont_type=’spe’

  • from_obs – Scalar or list of scalars with observation(s) number(s) to offset (FROM) if not sent, contribution are calculated with respect to the mean.

  • lv_space – Latent spaces over which to do the calculations [applicable to ‘ht2’ and ‘scores’] if not sent all dimensions are considered.

Returns:

A vector of scalars with the corresponding contributions

Return type:

contrib

pyphi.conv_pls_2_eiot(plsobj, *, r_length=False)[source]
pyphi.evalvar(data, vname)[source]
pyphi.export_2_gproms(mvmobj, *, fname='phi_export.txt')[source]

Function to export PLS model to be build a hybrid model in gPROMS

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

pyphi.export_2_gproms(mvmobj,fname=’phi_export.txt’)

Exports the multivariate object coded in gPROMS syntax

typically one would use the variables X_NEW and Y_PRED as the Input/Output variables :param mvmobj: A PLS model createdy with pyphi.pls :param fname: Name of the text file to be created

Returns:

None

pyphi.f95(i, j)[source]
pyphi.f99(i, j)[source]
pyphi.find(a, func)[source]
pyphi.findstr(string)[source]
pyphi.hott2(mvmobj, *, Xnew=False, Tnew=False)[source]
pyphi.isin_ordered_col0(df, alist)[source]
pyphi.jrpls(Xi, Ri, Y, A, *, shush=False)[source]

JRPLS Algorithm per Garcia-Munoz Chemom.Intel.Lab.Syst., 133, pp.49-62.

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

jrpls_obj = pyphi.jrpls(Xi,Ri,Y,A) :param X = Phys. Prop. dictionary of Dataframes of materials_i x mat. properties:

X = {‘MatA’:df_with_props_for_mat_A (one row per lot of MatA, one col per property),

‘MatB’:df_with_props_for_mat_B (one row per lot of MatB, one col per property)}

Parameters:
  • materials_i (R = Blending ratios dictionary of Dataframes of blends x) –

    R = {‘MatA’: df_with_ratios_of_lots_of_A_used_per_blend,

    ’MatB’: df_with_ratios_of_lots_of_B_used_per_blend, }

  • R[i] (Rows of X[i] must correspond to Columns of)

  • properties (Y = [ b x n ] Product characteristics dataframe of blends x prod.)

  • identifier (first column of all dataframes is the observation)

Returns:

A dictionary with all the parameters for the JRPLS model

Return type:

jrpls_obj

pyphi.jrpls_pred(rnew, jrplsobj)[source]

Routine to produce the prediction for a new observation of Ri in a JRPLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

preds = pyphi.jrpls_pred(rnew,jrplsobj)

Parameters:
  • rnew

    A dictionary with the format: rnew={

    ’matid’:[(lotid,rvalue )],

    }

    for example, a prediction for the scenario:

  • rvalue (material lot to use)

  • 0.5 (API A0129)

  • 0.1 (Lactose Lac0003)

  • 0.2 (Lactose Lac1010)

  • 0.02 (MgSt M0012)

  • 0.18 (MCC MCC0017)

  • like (would be encoded)

  • rnew={ – ‘API’:[(‘A0129’,0.5)], ‘Lactose’:[(‘Lac0003’,0.1 ),(‘Lac1010’,0.2 )], ‘MgSt’:[(‘M0012’,0.02)], ‘MCC’:[(‘MCC0017’,0.18)], }

Returns:

preds ={‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper}

where speR has the speR per each material

Return type:

preds a dictionary of the form

pyphi.lpls(X, R, Y, A, *, shush=False)[source]

LPLS Algorithm per Muteki et. al Chemom.Intell.Lab.Syst.85(2007) 186 – 194 by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

lpls_obj = pyphi.lpls(X,R,Y,A)

Parameters:
  • properties (Y = [ b x n ] Product characteristics DataFrame of blends x prod.)

  • materials (R = [ b x m ] Blending ratios DataFrame of blends x)

  • properties

    first column of all dataframes is the observation identifier

    A = Number of components

Returns:

A dictionary with all the LPLS parameters

Return type:

lspls_obj

pyphi.lpls_pred(rnew, lpls_obj)[source]

Function to evaluate a new observation for LPLS by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

Do a prediction with an LPLS model pred = pyphi.lpls_pred(rnew,lpls_obj)

Parameters:
  • rnew – np.array, list or dataframe with elements of rnew if multiple rows are passed, then multiple predictions are done

  • lpls_obj – LPLS object built with pyphi.lpls routine

Returns:

A dictionary {‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper}

Return type:

pred

pyphi.lwpls(xnew, loc_par, mvmobj, X, Y, *, shush=False)[source]

LWPLS algorithm as in: International Journal of Pharmaceutics 421 (2011) 269– 274

Implemented by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

yhat = pyphi.lwpls(xnew,loc_par,mvmobj,X,Y,<shush=False>)

Parameters:
  • xnew (Numpy vector) – Regressor vector to make prediction

  • loc_par (scalar) – Localization parameter

  • mvmobj – PLS model between X and Y built with PLS routine

  • X (DataFrame or Numpy) – Training set for mvmobj (PLS model)

  • Y (DataFrame or Numpy) – Training set for mvmobj (PLS model)

  • shush – =’True’ will silent outpuit ‘False’ will display outpuit default if not sent

Returns:

y prediction from xnew

pyphi.ma57_dummy_check()[source]

Instantiates a trivial NLP to solve with IPOPT and MA57. :returns: boolean, True if IPOPT solved with SolverStaus.ok :rtype: ma57_ok

pyphi.mbpls(XMB, YMB, A, *, mcsX=True, mcsY=True, md_algorithm_='nipals', force_nipals_=False, shush_=False, cross_val_=0, cross_val_X_=False)[source]

Function to calculate a Multi-Block PLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

Multi-block PLS model using the approach by Westerhuis, J. Chemometrics, 12, 301–321 (1998)

mbpls_obj = pyphi.mbpls(XMB,YMB,A,<mcsX=True,mcsY=True,md_algorithm_=’nipals’,force_nipals_=False,

shush_=False,cross_val_=0,cross_val_X_=False>)

Parameters:
  • XMB

    Dictionary or PandasDataFrames one key per block of data Dictionary structure: {‘BlockName1’:block_1_data_pd,

    ’BlockName2’:block_2_data_pd}

  • YMB

    Dictionary or PandasDataFrame Dictionary structure: {‘BlockName1’:block_1_data_pd,

    ’BlockName2’:block_2_data_pd}

Returns:

Dictionary with all the parameters of a Multi-block PLS model

Return type:

mbpls_obj

pyphi.mean(X)[source]
pyphi.meancenterscale(X, *, mcs=True)[source]

Function to mean center and scale a matrix by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

X,xmean,xstd= pyphi.meancenterscale(X,<mcs=Flag>) :param X: Matrix to be meancenterd this call ONLY works with Numpy matrices :param mcs = True | ‘center’ | ‘autoscale’:

Returns:

Post-processed X matrix xmean: Mean values per column xstd: Standard Deviation values per column

Return type:

X

pyphi.n2z(X)[source]
pyphi.np1D2pyomo(arr, *, indexes=False)[source]

Routine to convert a vector in to a 1D dictionary for Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

Xdic=pyphi.np1D2pyomo(X,<varids=varId_list>) :param X: Vector to be converted :type X: Numpy :param varids: False | table of ids to be assigned as indexes

Returns:

Vector in dictionary format (as Pyomo likes it)

Return type:

Xdic

pyphi.np2D2pyomo(arr, *, varids=False)[source]

Routine to convert a Numpy matrix in to a 2D dictionary for Pyomo by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

Xdic=pyphi.np2D2pyomo(X,<varids=varId_list>) :param X: Matrix to be converted :type X: Numpy :param varids: False | table of ids to be assigned as indexes

Returns:

Matrix in dictionary format (as Pyomo likes it)

Return type:

Xdic

pyphi.parse_materials(filename, sheetname)[source]
Function to build R matrices for JRPLS model from linear table

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

Routine to parse out compositions from linear table This reads an excel file with four columns:

‘Finished Product Lot’ ‘Material Lot’ ‘Ratio or Quantity’ ‘Material’

where the usage per batch of finished product is recorded. e.g.

‘Finished Product Lot’ ‘Material Lot’ ‘Ratio or Quantity’ ‘Material’

A001 A 0.75 Drug A001 B 0.25 Drug A001 Z 1.0 Excipient . . . . . . . . . . . .

Args:

filename: Name of excel workbook containing the data sheetname: Name of the sheet in the workbook with the data

Returns:

JR = Joint R matrix of material consumption, list of dataframes materials_used = Names of materials

pyphi.pca(X, A, *, mcs=True, md_algorithm='nipals', force_nipals=False, shush=False, cross_val=0)[source]

Function to creat a Principal Components Analysis model

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

pca_object = pyphi.pca (X,A,<mcs=True,md_algorithm=’nipals’,force_nipals=False,shush=False,cross_val=0>)

Parameters:
  • X (Dataframe or Numpy) – Data to train the model

  • A (int) – Number of Principal Components to calculate

  • mcs – ‘True’ : Meancenter + autoscale default if not sent ‘False’ : No pre-processing ‘center’ : Only center ‘autoscale’ : Only autoscale

  • md_algorithm – Missing Data algorithm to use ‘nipals’ default if not sent ‘nlp’ Uses non-linear programming approach by Lopez-Negrete et al. J. Chemometrics 2010; 24: 301–311

  • force_nipals – If = True will use NIPALS. = False if X is complete will use SVD. default if not sent

  • shush – If = True supressess all printed output = False default if not sent

  • cross_val

    If sent a scalar between 0 and 100, will cross validate element wise removing cross_val% of the data every round

    if == 0: Bypass cross-validation default if not sent

Returns:

A dictionary with all PCA loadings, scores and other diagnostics.

pyphi.pca_(X, A, *, mcs=True, md_algorithm='nipals', force_nipals=False, shush=False)[source]
pyphi.pca_pred(Xnew, pcaobj, *, algorithm='p2mp')[source]

Function to evaluate new data using an already built PCA model

pred = pyphi.pca_pred(Xnew,pcaobj)

Parameters:
  • X (DataFrame) – Data to be evaluated with the given PCA model

  • pcaobj – PCA object created by pyphi.pca routine

Returns:

Dictionary with reconstructed values for X, Scores, Hotellings T2 and SPE for Xnew

Return type:

pred

pyphi.pls(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cross_val=0, cross_val_X=False, cca=False)[source]

Function to create a Projection to Latent Structures model

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

pls_object = pyphi.pls(X,Y,A,<mcsX=True,mcsY=True,md_algorithm=’nipals’,force_nipals=True,shush=False,

cross_val=0,cross_val_X=False,cca=False>)

Parameters:
  • X (DataFrame or Numpy) – Training Data

  • Y (DataFrame or Numpy) – Training Data

  • A (int) – Number of Latent Variables to calculate

  • mcsX/mcsY – ‘True’ : Will meancenter and autoscale the data default if not sent ‘False’ : No pre-processing ‘center’ : Will only center ‘autoscale’ : Will only autoscale

  • md_algorithm – ‘nipals’ default ‘nlp’ Uses algorithm described in Journal of Chemometrics, 28(7), pp.575-584.

  • force_nipals – If set to True and if X is complete, will use NIPALS. Otherwise, if X is complete will use SVD.

  • shush – If set to True supressess all printed output.

  • cross_val

    If sent a scalar between 0 and 100, will cross validate element wise removing cross_val% of the data every round

    if == 0: Bypass cross-validation default if not sent

  • cross_val_X – True : Calculates Q2 values for the X and Y matrices False: Cross-validation strictly on Y matrix default if not sent

  • cca – True : Calculates covariable space of X with Y (analog to the predictive space in OPLS) “Tcv” and “Pcv” and the covariant scores and loadings if more than one Y, then there will be as many Tcv and Pcv vectors as columns in Y

Returns:

A dictionary with all PLS loadings, scores and other diagnostics.

pyphi.pls_(X, Y, A, *, mcsX=True, mcsY=True, md_algorithm='nipals', force_nipals=True, shush=False, cca=False)[source]
pyphi.pls_cca(pls_obj, Xmcs, Ymcs, not_Xmiss)[source]
pyphi.pls_pred(Xnew, plsobj)[source]

Function to evaluate new data using an already built PLS model by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com) pred = pyphi.pls_pred(Xnew,plsobj)

Parameters:
  • X – Dataframe with new data to be project onto the PCA model

  • plsobj – PLS object created by pyphi.pls routine

Returns:

Dictionary with predicted values for X and Y, Scores Hotellings T2 and SPE for Xnew if CCA = True in plsobj, then Tcv is also calculated

Return type:

pred

pyphi.prep_pca_4_MDbyNLP(pcaobj, X)[source]
pyphi.prep_pls_4_MDbyNLP(plsobj, X, Y)[source]
pyphi.reconcile_rows(df_list)[source]

Function to reconcile observations across multiple dataframes

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

df_list_reconciled = pyphi.reconcile_rows(df_list)

Routine to reconcile the observation names in a list of dataframes The returned list of df’s has exacly the same observation names in the same order handy when analyzing data for TPLS, LPLS and JRPLS

pyphi.reconcile_rows_to_columns(df_list_r, df_list_c)[source]
Function to reconcile the rows of the dataframes in df_list_r with the

columns in the list of dataframes df_list_r

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

df_list_reconciled_r,df_list_reconciled_c = pyphi.reconcile_rows_to_columns(df_list_r,df_list_c)

handy to align X - R datasets for TPLS, LPLS and JRPLS

pyphi.replicate_data(mvm_obj, X, num_replicates, *, as_set=False)[source]
pyphi.scores_conf_int_calc(st, N)[source]
pyphi.single_score_conf_int(t)[source]
pyphi.spe(mvmobj, Xnew, *, Ynew=False)[source]
pyphi.spe_ci(spe)[source]
pyphi.spectra_autoscale(Dm)[source]

Autoscaling all spectra to have variance one. Dm: Spectra

Outputs: Processed spectra

pyphi.spectra_baseline_correction(Dm)[source]

Shifiting all spectra to have minimum zero. Only works with pandas dataframe. Dm: Spectra

Outputs: Processed spectra

pyphi.spectra_mean_center(Dm)[source]

Mean centering all spectra to have mean zero. Dm: Spectra

Outputs: Processed spectra

pyphi.spectra_msc(Dm, reference_spectra=None)[source]

Perform Multivariate Scatter Correction transform

pyphi.spectra_savgol(ws, od, op, Dm)[source]

Function to do row wise Savitzky-Golay filter for spectra by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

Dm_sg, M = pyphi.spectra_savgol(ws,od,op,Dm) :param ws: Window Size :param od: Order of the derivative :param op: Order of the polynomial :param Dm: Spectra

Returns:

Processed Spectra M: Transformation Matrix for new vector samples

Return type:

Dm_sg

pyphi.spectra_snv(x)[source]

Function to do row wise SNV transform for spectroscopic data by Salvador Garcia (sgarciam@ic.ac.uk) / (salvadorgarciamunoz@gmail.com)

X=pyphi.spectra_snv(X)

Parameters:

x – Spectra dataframe

Returns:

Post-processed Spectra dataframe

Return type:

x

pyphi.std(X)[source]
pyphi.tpls(Xi, Ri, Z, Y, A, *, shush=False)[source]

TPLS Algorithm per Garcia-Munoz Chemom.Intel.Lab.Syst., 133, pp.49-62.

by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

tpls_obj = pyphi.tpls(Xi,Ri,Z,Y,A)

Parameters:
  • properties (Y = [ b x n ] Product characteristics dataframe of blends x prod.) –

    X = {‘MatA’:df_with_props_for_mat_A (one row per lot of MatA, one col per property),

    ’MatB’:df_with_props_for_mat_B (one row per lot of MatB, one col per property)}

  • materials_i (R = Blending ratios dictionary of Dataframes of blends x) –

    R = {‘MatA’: df_with_ratios_of_lots_of_A_used_per_blend,

    ’MatB’: df_with_ratios_of_lots_of_B_used_per_blend, }

  • R[i] (Rows of X[i] must correspond to Columns of)

  • properties

  • variables (Z = [b x p] Process conditions dataframe of blends x process)

  • identifier (first column of all dataframes is the observation)

Returns:

A dictionary with all the parameters for the TPLS model

Return type:

tpls_obj

pyphi.tpls_pred(rnew, znew, tplsobj)[source]

Routine to produce the prediction for a new observation of Ri using a TPLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

preds = pyphi.tpls_pred(rnew,znew,tplsobj)

Parameters:
  • rnew

    A dictionary with the format: rnew={

    ’matid’:[(lotid,rvalue )],

    }

    for example, a prediction for the scenario:

  • rvalue (material lot to use)

  • 0.5 (API A0129)

  • 0.1 (Lactose Lac0003)

  • 0.2 (Lactose Lac1010)

  • 0.02 (MgSt M0012)

  • 0.18 (MCC MCC0017)

  • use

  • rnew={ – ‘API’:[(‘A0129’,0.5)], ‘Lactose’:[(‘Lac0003’,0.1 ),(‘Lac1010’,0.2 )], ‘MgSt’:[(‘M0012’,0.02)], ‘MCC’:[(‘MCC0017’,0.18)], }

  • znew – Dataframe or numpy with new observation

Returns:

preds ={‘Tnew’:tnew,’Yhat’:yhat,’speR’:sper,’speZ’:spez}

where speR has the speR per each material

Return type:

preds a dictionary of the form

pyphi.unique(df, colid)[source]

returns unique values in the column of a DataFrame in order of occurence by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

replacement of the np.unique routine, specifically for dataframes returns unique values in the order found in the dataframe unique_values = pyphi.unique(df,columnid)

Parameters:
  • df – A pandas dataframe

  • columnid – Column identifier

Returns:

List of unique values in the order they appear

Return type:

unique_values

pyphi.varimax_(X, gamma=1.0, q=20, tol=1e-06)[source]
pyphi.varimax_rotation(mvm_obj, X, *, Y=False)[source]

Function to do a Varimax Rotation on a PCA or PLS model by Salvador Garcia-Munoz (sgarciam@ic.ac.uk ,salvadorgarciamunoz@gmail.com)

Routine to perform a VariMax rotation on a PCA or PLS model (I have also tested it with MBPLS models)

rotated_model=varimax_rotation(model_object,X,<Y=Ydata>)

Parameters:

model_object – A PCA or PLS or MBPLS model object

Returns:

The same model after VariMax rotation, scores and loadings are all rotated

Return type:

rotated_model

pyphi.writeeq(beta_, features_)[source]
pyphi.z2n(X, X_nan_map)[source]