pyphi_batch package
Submodules
pyphi_batch.pyphi_batch module
Created on Mon Apr 11 14:58:35 2022
Batch data is assumed to come in an excel file with first column being batch identifier and following columns being process variables. Optionally the second column labeled ‘PHASE’,’Phase’ or ‘phase’ indicating the phase of exceution
Change log: * added Dec 28 2023 Titles can be sent to contribution plots via plot_title flag
Monitoring diagnostics are also plotted against sample starting with 1
- added Dec 27 2023 Corrected plots to use correct xaxis starting with sample =1
Ammended indicator variable alignment not to replace the IV with a linear sequence but to keep orginal data
added Dec 4 2023 Added a BatchVIP calculation
added Apr 23 2023 Corrected a very dumb mistake I made coding when tired
- added Apr 18 2023 Added descriptors routine to obtain landmarks of the batch
such as min,max,ave of a variable [during a phase if indicated so] Modifed plot_var_all_batches to plot against the values in a Time column and also add the legend for the BatchID
- added Apr 10 2023 Added batch contribution plots
Added build_rel_time to create a tag of relative run time from a timestamp
added Apr 7 2023 Added alignment using indicator variable per phase
- added Apr 5 2023 Added the capability to monitor a variable in “Soft Sensor” mode
which implies there are no measurements for it (pure prediction) as oppose to a forecast where there are new measurments coming in time.
added Jul 20 2022 Distribution of number of samples per phase plot
added Aug 10 2022 refold_horizontal | clean_empty_rows | predict
added Aug 12 2022 replicate_batch
@author: S. Garcia-Munoz sgarciam@ic.ak.uk salg@andrew.cmu.edu
- pyphi_batch.pyphi_batch.batch_vip(mmvm_obj, *, addtitle=False)[source]
plot the summation across componets of the integral of the absolute value of loadings for a batch multiplied by the R2 [which kinda mimicks the VIP]
batch_vip(mmvm_obj,*,addtitle=False)
- Parameters:
mmvm_obj – A multiway PCA or PLS model
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.build_rel_time(bdata, *, time_unit='min')[source]
Converts the column ‘Timestamp’ into ‘Time’ in time_units relative to the start of each batch
bdata_new = build_rel_time(bdata,*,time_unit=’min’)
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.clean_empty_rows(X, *, shush=False)[source]
Cleans empty rows in batch data Input:
X: Batch data to be cleaned of empty rows (all np.nan) DATAFRAME
- Output:
X: Batch data Without observations removed
- pyphi_batch.pyphi_batch.contributions(mmvmobj, X, cont_type, *, to_obs=False, from_obs=False, lv_space=False, phase_samples=False, dyn_conts=False, which_var=False, plot_title='')[source]
Plot batch contribution plots to Scores, HT2 or SPE
- contributions (mmvmobj,X,cont_type,*,to_obs=False,from_obs=False,
lv_space=False,phase_samples=False,dyn_conts=False,which_var=False, plot_title=’’)
- Parameters:
Model (mmvmobj= Multiway)
data (X = batch)
'spe' (cont_type = 'scores' | 'ht2' |)
to (to_obs = Observation to calculate contributions)
only] (from_obs = Relative basis to calculate contributions to [for 'scores' and 'ht2') – if not sent the origin of the model us used as the base.
- Returns:
contribution_vector
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.descriptors(bdata, which_var, desc, *, phase=False)[source]
Get descriptor values for a batch trajectory
descriptors_df = descriptors(bdata,which_var,desc,*,phase=False)
- Parameters:
bdata – Dataframe of batch data, first column is batch ID, second column can be phase id
which_var – List of variables to get descriptors for
desc – List of descriptors to calculate, options are: ‘min’ ‘max’ ‘mean’ ‘median’ ‘std’ ‘var’ ‘range’ ‘ave_slope’
phase – to specify what phases to do this for
- Returns:
A dataframe with the descriptors per batch
- Return type:
descriptors
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.loadings(mmvm_obj, dim, *, r2_weighted=False, which_var=False)[source]
Plot batch loadings for variables as a function of time/sample
loadings(mmvm_obj,dim,*,r2_weighted=False,which_var=False)
- Parameters:
mmvm_obj – Multiway PCA or PLS object
dim – What component or latent variable to plot
r2_weighted – If True => weight the loading by the R2pv
which_var – Variable for which the plot is done, if not sent all are plotted
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.loadings_abs_integral(mmvm_obj, *, r2_weighted=False, addtitle=False)[source]
Plot the integral of the absolute value of loadings for a batch
loadings_abs_integral(mmvm_obj,*,r2_weighted=False,addtitle=False)
- Parameters:
mmvm_obj – A multiway PCA or PLS model
r2_weighted – Boolean flag, if True then in weights the loading by the R2pv
addtitle – Text to place in the title of the figure
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.monitor(mmvm_obj, bdata, *, which_batch=False, zinit=False, build_ci=True, shush=False, soft_sensor=False)[source]
Routine to mimic the real-time monitoring of a batch given a model
monitor(mmvm_obj,bdata,*,which_batch=False,zinit=False,build_ci=True,shush=False,soft_sensor=False):
- usage: 1st you need to run: monitor(mmvm_obj,bdata)
to mimic monitoring for all bdata batches and build CI these new parameters are written back to mmvm_obj
Then you can run:
- diagnostics = monitor(mmvm_obj,bdata,which_batch=your_batchid)
to mimic monitoring for your_batchid and will produce all dynamic metrics and forecasts
- diagnostics = monitor(mmvm_obj,bdata,which_batch=your_batchid,zinit=your_z_data)
to mimic monitoring for your_batchid using initial conditions will produce all dynamic metrics and forecasts
- diagnostics = monitor(mmvm_obj,bdata,which_batch=your_batchid,soft_sensor=your_variable)
to mimic monitoring for your_batchid will produce all dynamic metrics and forecasts and produce soft-sensor predictions for your_variable
Returns:
diagnostics:A dictionary with all the monitoring diagnostics and contributions
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.mpca(xbatch, a, *, unfolding='batch wise', phase_samples=False, cross_val=0)[source]
Multi-way PCA for batch analysis
mpca_obj= mpca(xbatch,a,*,unfolding=’batch wise’,phase_samples=False,cross_val=0)
- Parameters:
xbatch – Pandas dataframe with aligned batch data it is assumed that all batches have the same number of samples
a – Number of PC’s to fit
unfolding – ‘batch wise’ or ‘variable wise’
phase_samples – information about samples per phase [optional]
cross_val – percent of elements for cross validation (defult is 0 = no cross val)
- Returns:
A Dictionary with all the parameters for the MPCA model
- Return type:
mpca_obj
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.mpls(xbatch, y, a, *, zinit=False, phase_samples=False, mb_each_var=False, cross_val=0, cross_val_X=False)[source]
Multi-way PLS for batch analysis
mpls_obj = mpls(xbatch,y,a,*,zinit=False,phase_samples=False,mb_each_var=False,cross_val=0,cross_val_X=False):
- Parameters:
xbatch – Pandas dataframe with aligned batch data it is assumed that all batches have the same number of samples
y – Response to predict, one row per batch
a – Number of PC’s to fit
zinit – Initial conditions <optional>
phase_samples – alignment information
mb_each_var – if “True” will make each variable measured a block otherwise zinit is one block and xbatch another
- Returns:
A dictionary with all the parameters of the MPLS model
- Return type:
mpls_obj
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.phase_iv_align(bdata, nsamples)[source]
Batch alignment using an indicator variable
batch_aligned_data = phase_iv_align(bdata,nsamples)
- Parameters:
Identifier (bdata is a Pandas DataFrame where 1st column is Batch) –
- the second column is a phase indicator
and following columns are variables, each row is a new time sample. Batches are concatenated vertically.
nsamples:
if nsamples is a dictionary: samples to generate per phase e.g.
nsamples = {‘Heating’:100,’Reaction’:200,’Cooling’:10}
If an indicator variable is used, with known start and end values
indicate it with a list like this:
[IVarID,num_samples,start_value,end_value]
example:
nsamples = {‘Heating’:[‘TIC101’,100,30,50],’Reaction’:200,’Cooling’:10}
During the ‘Heating’ phase use TIC101 as an indicator variable take 100 samples equidistant from TIC101=30 to TIC101=50 and align against that variable as a measure of batch evolution (instead of time)
If an indicator variable is used, with unknown start but known end values
indicate it with a list like this:
[IVarID,num_samples,end_value]
example:
nsamples = {‘Heating’:[‘TIC101’,100,50],’Reaction’:200,’Cooling’:10}
During the ‘Heating’ phase use TIC101 as an indicator variable take 100 samples equidistant from the value of TIC101 at the start of the phase to the point when TIC101=50 and align against that variable as a measure of batch evolution (instead of time)
If no IV is sent, the resampling is linear with respect to row number per phase
- Returns:
A pandas dataframe with batch data resampled (aligned)
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.phase_sampling_dist(bdata, time_column=False, addtitle=False, use_phases=False)[source]
Count and plot a histogram of the distribution of samples (or time if time_column is indicated) consumed per phase on a batch dataset
phase_sampling_dist(bdata,time_column=False,addtitle=False,use_phases=False)
- Parameters:
bdata –
Batch data organized as: column[0] = Batch Identifier column name is unrestricted column[1] = Phase information per sample must be called ‘Phase’,’phase’, or ‘PHASE’
this information is optional
column[2:]= Variables measured throughout the batch
time_column – Indicates the name of the column with time, if not sent, counting is done in terms samples
add_title – Optional text to be placed as the figure title
use_phases – In case the user wants to only do counting for a subset of phases
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.phase_simple_align(bdata, nsamples)[source]
Simple batch alignment (0 to 100%) per phase
bdata_aligned = phase_simple_align(bdata,nsamples)
- Parameters:
bdata – is a Pandas DataFrame where 1st column is Batch Identifier the second column is a phase indicator and following columns are variables, each row is a new time sample. Batches are concatenated vertically.
nsamples –
if integer: Number of samples to collect per phase
if dictionary: samples to generate per phase e.g.
{'Heating' (nsamples =) – 100,’Reaction’:200,’Cooling’:10}
number (resampling is linear with respect to row)
- Returns:
a pandas dataframe with batch data resampled (aligned)
by Salvador Garcia Munoz (sgarciam@imperial.ac.uk , salvadorgarciamunoz@gmail.com)
- pyphi_batch.pyphi_batch.plot_batch(bdata, which_batch, which_var, *, include_mean_exc=False, include_set=False, phase_samples=False, single_plot=False, plot_title='')[source]
Plotting routine for batch data
- plot_batch(bdata,which_batch,which_var,*,include_mean_exc=False,include_set=False,
phase_samples=False,single_plot=False,plot_title=’’)
- Parameters:
bdata –
Batch data organized as: column[0] = Batch Identifier column name is unrestricted column[1] = Phase information per sample must be called ‘Phase’,’phase’, or ‘PHASE’
this information is optional
column[2:]= Variables measured throughout the batch
The data for each batch is one on top of the other in a vertical matrix
which_batch – Which batches to plot
which_var – Which variables are to be plotted, if not sent, all are.
include_mean_exc – Include the mean trajectory of the set EXCLUDING the one batch being plotted
include_set – Include all other trajectories (will be colored in light gray)
phase_samples – Information used to align the batch, so that phases are marked in the plot
single_plot – If True => Plot everything in a single axis
plot_title – Optional text to be added to the title of all figures
Munoz (by Salvador Garcia)
salvadorgarciamunoz@gmail.com (sgarciam@imperial.ac.uk)
- pyphi_batch.pyphi_batch.plot_var_all_batches(bdata, *, which_var=False, plot_title='', mkr_style='.-', phase_samples=False, alpha_=0.2, timecolumn=False, lot_legend=False)[source]
Plotting routine for batch data plot data for all batches in a dataset
- plot_var_all_batches(bdata,*,which_var=False,plot_title=’’,mkr_style=’.-‘,
phase_samples=False,alpha_=0.2,timecolumn=False,lot_legend=False):
- Parameters:
bdata –
Batch data organized as: column[0] = Batch Identifier column name is unrestricted column[1] = Phase information per sample must be called ‘Phase’,’phase’, or ‘PHASE’
this information is optional
column[2:]= Variables measured throughout the batch
The data for each batch is one on top of the other in a vertical matrix
which_var – Which variables are to be plotted, if not sent, all are.
plot_title – Optional text to be used as the title of all figures
phase_samples – information used to align the batch, so that phases are marked in the plot
alpha – Transparency for the phase dividing line
timecolumn – Name of the column that indicates time, if given all data is plotted against time
lot_legend – Flag to add a legend for the batch identifiers
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.predict(xbatch, mmvm_obj, *, zinit=False)[source]
Generate predictions for a Multi-way PCA/PLS model
predictions = predict(xbatch,mmvm_obj,*,zinit=False)
- Parameters:
xbatch – Batch data with same variables and alignment as model will generate predictions for all batches
mmvm_obj – Multi-way PLS or PCA
zinit – Initial conditions [if any]
- Returns:
A dictionary with keys [‘Yhat’, ‘Xhat’, ‘Tnew’, ‘speX’, ‘T2’]
- Return type:
preds
- by Salvador Garcia Munoz
- pyphi_batch.pyphi_batch.r2pv(mmvm_obj, *, which_var=False)[source]
Plot batch r2 for variables as a function of time/sample
r2pv(mmvm_obj,*,which_var=False)
- Parameters:
mmvm_obj – Multiway PCA or PLS object
which_var – Variable for which the plot is done, if not sent all are plotted
by Salvador Garcia Munoz sgarciam@imperial.ac.uk salvadorgarciamunoz@gmail.com
- pyphi_batch.pyphi_batch.simple_align(bdata, nsamples)[source]
- Simple alignment for bacth data using row number to linearly interpolate
to the same number of samples
bdata_aligned= simple_align(bdata,nsamples)
- Parameters:
Identifier (bdata is a Pandas DataFrame where 1st column is Batch) – and following columns are variables, each row is a new time sample. Batches are concatenated vertically.
batch (nsamples is the new number of samples to generate per) – irrespective of phase
- Returns:
A pandas dataframe with batch data resampled to nsamples for all batches
by Salvador Garcia Munoz (sgarciam@imperial.ac.uk, salvadorgarciamunoz@gmail.com)
- pyphi_batch.pyphi_batch.unique(df, colid)[source]
Replacement of the np.unique routine, specifically for dataframes
unique(df,colid)
- Parameters:
df – A pandas dataframe
colid – Column identifier
- Returns:
A list with unique values in the order found in the dataframe
by Salvador Garcia (sgarciam@ic.ac.uk)