Basic Workflow: From Data to Decomposition#

Example Reaction Spectroscopy Dataset#

Let’s demonstrate the API using a real-world reaction spectroscopy dataset included with the package:

import matplotlib.pyplot as plt
from matplotlib.colors import Normalize

# for loading the dataset
from mcrnmf.datasets import load_rxn_spectra

# import all necessary algorithms
from mcrnmf import MinVol, SNPA

# Load Raman spectroscopy data from a chemical reaction
# X: spectral data matrix (wavenumber × time points)
# wv: wavenumber values (x-axis for spectra)
# time: time points of the reaction (in hours)
X, wv, time = load_rxn_spectra()

# Visualization of spectral data colored by time
norm = Normalize(vmin=time[0], vmax=time[-1])
fig, ax = plt.subplots(figsize=(6, 3))
for i in range(X.shape[1]):
    ax.plot(wv, X[:, i], color=plt.cm.viridis(norm(time[i])), linewidth=0.75)
ax.set_xlabel("Raman shift [1/cm]")
ax.set_ylabel("Intensity [a.u.]")
cbar = plt.colorbar(
    mappable=plt.cm.ScalarMappable(cmap=plt.cm.viridis, norm=norm), ax=ax
)
cbar.set_label("Time [hr]")
plt.show()

Raman spectra colored by reaction time. Each line represents one spectrum at a specific time point. Changes in the spectra over time indicate chemical transformations.

Workflow Overview#

A typical MCR-NMF workflow consists of:

Initial estimation of \(W\) and \(H\) matrices using SNPA
Refinement of these estimates using algorithms like MinVol, FroALS, or FroALS
Analysis of the resulting component spectra and concentration profiles

Note

While the example below demonstrates SNPA, the same workflow initialization with SNPA, followed by fitting and evaluation also applies to FroALS and FroFPGM. We use MinVol here because, in our experience, it often produces the best results on real-world datasets.

Initializing Component Estimates with `SNPA`#

Successive Nonnegative Projection Algorithm (SNPA) provides good initial values for \(W\) and \(H\) matrices. Here is an example of how you could use it.

# Choose the number of chemical components to extract
num_components = 4

# Initialize and run SNPA algorithm
# SNPA identifies vertices in the data cloud that likely represent pure components
snpa = SNPA(rank=num_components, iter_max=1000)
snpa.fit(X=X)

# Get the initial estimates
Wi = snpa.W.copy()  # Initial estimate for component spectra
Hi = snpa.H.copy()  # Initial estimate for concentration profiles

# Visualization of the initial estimates
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6, 4))
axs[0].plot(wv, Wi)
axs[0].set_xlabel("Wavenumber [1/cm]")
axs[0].set_ylabel("Intensity [a.u.]")
axs[0].set_title("Initial estimate of pure component spectra (W)")
axs[1].plot(time, Hi.T)
axs[1].set_xlabel("Time [h]")
axs[1].set_ylabel("Mole fraction")
axs[1].set_title("Initial estimate of concentration profiles (H)")
fig.tight_layout()
plt.show()

Refining Spectral Decomposition with `MinVol`#

Create an instance of MinVol and fit the model to the data.

# Create a MinVol instance with constraints
mvol_uni_all = MinVol(
    rank=num_components,
    constraint_kind=1,  # Sum of concentrations at each timepoint ≤ 1
    unimodal={"H": True},  # All concentration profiles should be unimodal
    lambdaa=1e-4,  # Regularization parameter controlling the simplex volume
    iter_max=2000,  # Maximum number of iterations
)

# Fit the MinVol model using initial estimates from SNPA
mvol_uni_all.fit(X=X, Wi=Wi, Hi=Hi)

# Visualize the refined concentration profiles
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(time, mvol_uni_all.H.T) # 'H' result accessed from the property `H`
ax.legend([f"{i}" for i in range(num_components)], title="Component")
ax.set_xlabel("Time [h]")
ax.set_ylabel("Mole fraction")
ax.set_title("Concentration Evolution with All Components Unimodal")
fig.tight_layout()
plt.show()

Concentration profiles with unimodal constraints

Evaluating Algorithm Convergence and Fit Quality#

After fitting a model, you can evaluate its convergence and performance using several properties. The MinVol algorithm provides three properties for evaluating results:

is_converged: Boolean indicating whether the algorithm reached convergence based on the tolerance criteria. If False, you might consider increasing iter_max.
rel_loss_ls: List of relative loss values at each iteration, combining both reconstruction error and volume regularization terms. This shows the overall optimization progress.
rel_reconstruction_error_ls: List of just the relative reconstruction error component \(\dfrac{||X-WH||_F}{||X||_F}\) at each iteration. This measures how well your factorization approximates the original data.

We now examine the convergence behavior of the previously fitted MinVol model.

# Check if the algorithm converged within the maximum iterations
converged = mvol_uni_all.is_converged
print(f"Did the algorithm converge? {converged}")

fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6, 5))

# Plot the overall loss (includes both reconstruction error and ...
# volume regularization term)
axs[0].plot(mvol_uni_all.rel_loss_ls) # property containing relative loss
axs[0].set_xlabel("Iteration")
axs[0].set_ylabel("Relative Loss")
axs[0].set_title("Combined Loss (Reconstruction + Volume Regularization)")

# Plot just the reconstruction error
# property containing the relative reconstruction error
axs[1].plot(mvol_uni_all.rel_reconstruction_error_ls)
axs[1].set_xlabel("Iteration")
axs[1].set_ylabel("Reconstruction Error")
axs[1].set_title("Relative Reconstruction Error: ||X-WH||_F/||X||_F")

fig.tight_layout()
plt.show()

# Get the final reconstruction error
final_error = mvol_uni_all.rel_reconstruction_error_ls[-1]
print(f"Final relative reconstruction error: {final_error:.6f}")