Frequently Asked Questions (FAQ)

General Questions

What is this application for?

The Raman Spectroscopy Analysis Application is designed for analyzing Raman spectroscopy data, particularly for disease detection and biomedical research. It provides preprocessing, exploratory analysis, statistical testing, and machine learning classification in an easy-to-use desktop interface.

Who developed this software?

This software was developed by Muhammad Helmi bin Rozain as a final year project for BSc in Information Intelligence Engineering (工学部の知能情報工学コース、4年生、学部生) at the University of Toyama, under the Laboratory for Clinical Photonics and Information Engineering (臨床光情報工学研究室), supervised by 大嶋佑介 (Oshima Yusuke) and 竹谷皓規 (Taketani Akinori).

Is this software free?

Yes, this software is open-source and released under the MIT License. You can use it freely for academic research, commercial applications, and personal projects.

Can I use this for clinical diagnosis?

No. This software is intended for research use only and is not approved for clinical diagnostic purposes. Always consult qualified medical professionals for medical decisions.

What platforms are supported?

Windows: 10/11 (fully supported, installer and portable versions available)
macOS: 10.14+ (supported from source)
Linux: Ubuntu 18.04+ (supported from source)

Installation Questions

Do I need Python installed?

For source installation: Yes, Python 3.12+ is required
For Windows executable/installer: No, Python is bundled

Why is the executable so large (375 MB)?

The executable bundles Python, all libraries, and dependencies for complete portability. This ensures it works on any Windows system without installation.

How do I update to a new version?

From source:

git pull origin main
uv pip install -e .

From executable: Download the latest version from Releases and replace the old executable.

Can I install on a computer without internet?

Yes, the portable executable runs completely offline once downloaded. For source installation, download dependencies on a connected computer and transfer them.

Data Questions

What file formats are supported?

Fully supported:

CSV (.csv)
TXT (.txt)
ASCII (.asc, .ascii)
Pickle (.pkl) (Python/pandas)

Folder import:

A folder containing multiple single-spectrum .txt files
A folder containing multiple single-spectrum .asc/.ascii files

Planned support (not implemented yet):

SPC (Galactic)
WDF (Renishaw WiRE)

What data structure is required?

Recommended format:

Rows: Wavenumber values (cm⁻¹)
Columns: Individual spectra
First row: Optional column headers (spectrum IDs)
First column: Wavenumber values

Example CSV:

Wavenumber,Sample1,Sample2,Sample3
400,125.3,134.2,128.7
401,126.1,135.4,129.3
...

My data has x-axis in nm, not cm⁻¹. What should I do?

Convert wavelength (nm) to wavenumber (cm⁻¹):

Formula: Wavenumber = 10,000,000 / wavelength (nm)

Example:

785 nm laser → 12,738 cm⁻¹
For Raman shift, subtract excitation wavenumber

Can I import multiple files at once?

Yes, use the batch import feature:

Click Import Data in the Data Package tab
Select multiple files (Ctrl+Click or Shift+Click)
All files will be imported as separate datasets

How do I handle replicates?

Option 1: Average replicates during import

Select “Average Replicates” in import dialog
Specify replicate pattern (e.g., Sample1_rep1, Sample1_rep2)

Option 2: Keep replicates separate

Import all spectra individually
Use Groups to organize (e.g., “Sample1” group contains all Sample1 replicates)

Preprocessing Questions

What preprocessing should I use?

Minimum recommended pipeline:

Baseline Correction (AsLS or AirPLS)
Smoothing (Savitzky-Golay, window=11)
Normalization (Vector or SNV)

See Preprocessing Guide for specific use cases.

What is the difference between AsLS and AirPLS?

AsLS (Asymmetric Least Squares): Fast, works well for smooth baselines
AirPLS (Adaptive Iteratively Reweighted Penalized Least Squares): Better for complex baselines with sharp peaks

Try both and compare the preview. AsLS is a good starting point.

Should I normalize before or after baseline correction?

Always baseline correction first, then normalization:

Baseline correction removes additive background
Normalization handles multiplicative intensity differences

Reversing this order produces incorrect results.

Can I save my preprocessing pipeline?

Yes:

Build your pipeline in the Preprocessing tab
Click Save Pipeline
Give it a descriptive name (e.g., “MGUS_Classification_Pipeline”)
Load it later with Load Pipeline button

My preview shows all zeros after preprocessing. What’s wrong?

Common causes:

Parameter out of range: Check parameter values are within valid ranges
Incorrect order: Ensure baseline correction comes before normalization
Data already processed: Don’t apply preprocessing twice
Negative intensities: Some methods require positive intensities only

Check the console/log for error messages.

Analysis Questions

PCA shows no group separation. What should I do?

Possible reasons and solutions:

Groups are actually similar
- Try supervised method (PLS-DA) instead
- Check if differences are subtle (small effect size)
Preprocessing issue
- Verify baseline is removed
- Check normalization is applied
- Try different preprocessing pipeline
Outliers dominating
- Use outlier detection and remove bad spectra
- Check for cosmic rays
Need more components
- Try PC2 vs PC3 plot
- Examine scree plot for variance distribution

How many principal components should I use?

For visualization: 2-3 components (PC1 vs PC2 plot)

For analysis: Keep components until cumulative explained variance > 80-90%

For classification: Use scree plot elbow point (typically 5-10 components for Raman data)

What is “multiple testing correction” and do I need it?

When testing at many wavenumbers (~1400 points), you risk false positives. Multiple testing correction (e.g., FDR, Bonferroni) adjusts p-values to control false discovery rate.

Always use correction when testing across full spectrum. Bonferroni is conservative, FDR (Benjamini-Hochberg) is balanced.

Machine Learning Questions

What algorithm should I choose?

For beginners: Random Forest

Easy to use
Robust to overfitting
Provides feature importance
Few hyperparameters to tune

For best performance: XGBoost

Often highest accuracy
Requires careful hyperparameter tuning
May overfit on small datasets

For interpretability: Logistic Regression

Simple, transparent
Works well for linearly separable data
Fast training

How much data do I need?

Minimum:

20-30 samples per group
At least 5 patients per group (if using LOPOCV)

Recommended:

50-100 samples per group
10+ patients per group

For deep learning:

100+ samples per group minimum
500+ samples for robust models

My model has 100% accuracy. Is that good?

Probably not! 100% accuracy often indicates:

Data leakage - Test data contaminated training
Overfitting - Model memorized training data
Too easy problem - Groups are perfectly separable (rare)

Check:

Use external validation set
Verify proper data splitting is used
Simplify model and see if performance drops
Check confusion matrix for patterns

Export and Results Questions

How do I export results?

Each analysis method has an Export Results button that saves:

Figures: High-resolution PNG images (300 DPI)
Data: CSV files with numerical results
Report: Text file with analysis summary

Can I get publication-quality figures?

Yes! Figures are exported at 300 DPI (publication quality). You can also customize:

Figure size
Font sizes
Color schemes
Line widths

In the Settings menu, adjust “Figure Export Settings”.

How do I cite this software?

@software{rozain2025raman,
  author = {Rozain, Muhammad Helmi bin},
  title = {Raman Spectroscopy Analysis Application},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/zerozedsc/Raman-Spectroscopy-Analysis-Application},
  institution = {University of Toyama}
}

Language and Localization Questions

Can I use the interface in Japanese?

Yes! Go to Settings → Interface → Language → 日本語

The application supports:

English (default)
Japanese (日本語)
Malay (planned)

Some text is still in English after changing language. Why?

Some elements (error messages, console output) may not be fully translated yet. We’re continuously improving localization. Report incomplete translations via GitHub Issues.

Still Have Questions?

Documentation

User Guide - Comprehensive tutorials
Analysis Methods Reference - Detailed method documentation
Troubleshooting Guide - Common issues and solutions

Community

GitHub Discussions: Ask questions
GitHub Issues: Report bugs
Email: Contact via @zerozedsc

Contributing

Found an error in the FAQ? Want to add a question?

Fork the repository
Edit docs/faq.md
Submit a pull request

Your contributions help everyone!