Analysis Methods Referenceο
This comprehensive reference documents all analysis methods available in the Raman Spectroscopy Analysis Application, including preprocessing algorithms, exploratory analysis, statistical tests, and machine learning methods.
Purpose of This Referenceο
Each method is documented with:
Purpose - What the method does and when to use it
Theory - Brief explanation of the underlying algorithm
Parameters - Complete parameter reference with recommendations
Interpretation Guide - How to understand and report results
Examples - Practical usage examples
Common Issues - Troubleshooting and limitations
References - Primary literature citations
Method Categoriesο
Preprocessing Methodsο
40+ methods for spectral preprocessing:
Category |
Methods |
Purpose |
|---|---|---|
Baseline Correction |
AsLS, AirPLS, Polynomial, Whittaker, FABC, Butterworth |
Remove fluorescence background |
Smoothing |
Savitzky-Golay, Gaussian, Moving Average, Median Filter |
Reduce noise |
Normalization |
Vector, Min-Max, Area, SNV, MSC, Quantile, PQN |
Scale spectra |
Derivatives |
1st, 2nd order Savitzky-Golay |
Enhance peaks |
Feature Engineering |
Peak Ratio, Wavelet Transform, Rank Transform |
Extract features |
Advanced |
CDAE (Deep Learning), MCR-ALS, NMF |
Denoising and decomposition |
Exploratory Analysisο
Unsupervised methods for data exploration:
Method |
Purpose |
When to Use |
|---|---|---|
PCA |
Dimensionality reduction, variance visualization |
First step for all analyses |
UMAP |
Non-linear manifold learning |
Complex high-dimensional data |
t-SNE |
Non-linear clustering visualization |
Identifying subgroups |
Hierarchical Clustering |
Dendrogram-based grouping |
Unknown group structure |
K-means Clustering |
Partition-based clustering |
Known number of clusters |
Statistical Methodsο
Hypothesis testing and correlation analysis:
Method |
Purpose |
Data Type |
|---|---|---|
Pairwise Tests |
Compare two groups |
t-test (normal), Mann-Whitney (non-parametric) |
ANOVA |
Compare multiple groups |
F-test (normal), Kruskal-Wallis (non-parametric) |
Correlation Analysis |
Find relationships |
Pearson, Spearman, Kendall |
Band Ratio Analysis |
Calculate biochemical ratios |
Protein/Lipid, Amide I/II, etc. |
Peak Detection |
Identify significant peaks |
Automatic peak finding |
Machine Learningο
Supervised classification algorithms:
Algorithm |
Type |
Best For |
|---|---|---|
Support Vector Machine (SVM) |
Margin-based |
High-dimensional, small sample |
Random Forest |
Ensemble tree |
Robust, interpretable |
XGBoost |
Gradient boosting |
High accuracy, large datasets |
Logistic Regression |
Linear |
Baseline, interpretability |
Note: Multi-Layer Perceptron (MLP) and PLS-DA are planned for future releases.
Validation Methods:
Stratified K-Fold
Leave-One-Patient-Out (LOPOCV)
Time-series splits
Interpretability:
SHAP values (feature importance)
Permutation importance
Partial dependence plots
Decision boundary visualization
Quick Method Selectorο
By Research Questionο
βAre my groups different?β
Start with: PCA (visual)
Confirm with: Statistical Tests (quantitative)
βWhat are the key differentiating features?β
Use: Band Ratio Analysis
Or: SHAP Values from ML models
βCan I predict group membership?β
Use: Machine Learning (classification)
Validate with: GroupKFold
βWhat are the pure components in my mixture?β
Use: MCR-ALS (spectral unmixing)
βHow many clusters exist in my data?β
Use: Hierarchical Clustering (dendrogram)
Or: Elbow Method with K-means
By Data Characteristicsο
Small sample size (n < 30 per group)
Avoid: Deep learning, complex models
Use: Random Forest, SVM with careful validation
Validate with: LOPOCV
Large sample size (n > 100 per group)
Can use: All methods, including deep learning
Recommended: XGBoost, Neural Networks
Validate with: Hold-out test set + cross-validation
High class imbalance (e.g., 90% healthy, 10% disease)
Use: Stratified sampling, SMOTE
Metrics: ROC-AUC, F1-score (not accuracy)
See: Handling Imbalance
Multiple groups (>2)
Exploratory: PCA, UMAP with color-coded groups
Statistical: ANOVA with post-hoc tests
ML: Multi-class classification
Method Selection Flowchartο
graph TD
A[Start] --> B{Research Goal?}
B -->|Explore| C{Known Groups?}
B -->|Test Hypothesis| D{How Many Groups?}
B -->|Predict| E{Labeled Data?}
C -->|Yes| F[PCA with Groups]
C -->|No| G[Hierarchical Clustering]
D -->|2 groups| H[t-test or Mann-Whitney]
D -->|>2 groups| I[ANOVA or Kruskal-Wallis]
E -->|Yes| J{Sample Size?}
E -->|No| K[Unsupervised Learning]
J -->|Small| L[Random Forest + LOPOCV]
J -->|Large| M[XGBoost + Hold-out Test]
F --> N[Examine Loadings]
G --> O[Cut Dendrogram]
H --> P[Calculate Effect Size]
I --> P
L --> Q[SHAP Interpretation]
M --> Q
K --> F
Preprocessing Guidelinesο
Recommended Pipeline for Raman Dataο
Minimum pipeline:
1. Baseline Correction (AsLS or AirPLS)
2. Smoothing (Savitzky-Golay)
3. Normalization (Vector or SNV)
For noisy data:
1. Cosmic Ray Removal
2. Baseline Correction (AirPLS)
3. Smoothing (Savitzky-Golay, window=11)
4. Normalization (SNV)
5. Outlier Detection & Removal
For classification:
1. Baseline Correction (AsLS)
2. Smoothing (Savitzky-Golay, window=7)
3. Normalization (Vector)
4. (Optional) Peak Ratio Features
For biomarker discovery:
1. Baseline Correction (AirPLS)
2. Minimal Smoothing (window=5)
3. Normalization (SNV)
4. NO derivative (preserves peak positions)
See Preprocessing: Recommended Pipelines for detailed guidance.
Validation Best Practicesο
Critical Rulesο
Never use test data for preprocessing
Fit preprocessing on training data only
Transform test data using training parameters
Use patient-level splitting
If multiple spectra per patient, keep patientβs spectra together
Use GroupKFold with patient IDs
See: GroupKFold Guide
Report all metrics
Accuracy, Precision, Recall, F1, ROC-AUC
Confusion matrix
Confidence intervals or standard deviations
See: Reporting Results
Validate preprocessing choices
Test multiple preprocessing pipelines
Use cross-validation to select pipeline
Document final choices
Parameter Selection Guidesο
Each method page includes:
Parameter Tablesο
Example from PCA:
Parameter |
Type |
Default |
Range |
Recommendation |
|---|---|---|---|---|
n_components |
int |
3 |
2-10 |
Start with 2-3 for visualization, 5-10 for detailed analysis |
scaling |
str |
StandardScaler |
StandardScaler, MinMaxScaler, None |
Always use StandardScaler for Raman data |
Visual Parameter Guidesο
Example effects of changing parameters, with figures showing:
Before/after comparisons
Parameter sweep results
Optimal parameter ranges highlighted
Decision Treesο
For complex methods, decision trees guide parameter selection:
Is your data noisy?
ββ Yes β Use higher lambda in baseline correction
ββ No β Use default lambda
Are peaks sharp or broad?
ββ Sharp β Use smaller smoothing window (5-7)
ββ Broad β Use larger smoothing window (11-15)
Interpretation Guidesο
Every method includes:
Example Resultsο
Annotated figures showing typical outputs
Good vs problematic results
Edge cases and how to handle them
What to Reportο
Required information for publications
Suggested figure panels
Statistical reporting guidelines
Common Misinterpretationsο
Frequent mistakes
How to avoid overinterpretation
Validity checks
Citations and Referencesο
Using This Documentation in Publicationsο
If you use these methods, cite:
This software:
Rozain, M. H. (2025). Raman Spectroscopy Analysis Application. University of Toyama. https://github.com/zerozedsc/Raman-Spectroscopy-Analysis-Application
Original method papers (provided for each method)
Key libraries:
scikit-learn: Pedregosa et al., 2011
RamanSPy: Stevens et al., 2023
pybaselines: Erb et al., 2023
Bibliographyο
Complete bibliography with DOIs is provided in References.
Contributing Method Documentationο
Want to add a new method or improve existing documentation?
Follow the Contributing Guide
Include all required sections (Purpose, Theory, Parameters, etc.)
Provide at least one working example
Cite primary literature
Submit via pull request
See Contributing Guide for details.
Supportο
For method-specific questions:
Check the FAQ
Search GitHub Discussions
Review Troubleshooting Guide
For method requests or bug reports:
Open an issue on GitHub