Data Import Guide
Complete guide to importing, organizing, and managing spectral data in the application.
Table of Contents
Supported File Formats
Primary Formats
CSV Files (Recommended)
Format: Comma-Separated Values
Structure:
Wavenumber,Sample1,Sample2,Sample3
400.0,100.5,98.3,102.1
401.0,101.2,99.1,103.5
402.0,102.8,100.4,104.2
...
Requirements:
First column: Wavenumbers (numeric, ascending)
Subsequent columns: Intensity values for each spectrum
Header row: Sample identifiers (optional but recommended)
Decimal separator: Period (
.)No missing values (use
0or interpolate)
Example Import:
# File: blood_plasma_data.csv
# Columns: wavenumber, patient_001, patient_002, patient_003
# Rows: 1000+ wavenumber points
TXT Files (Text Format)
Format: Tab or space-delimited
Structure:
400.0 100.5 98.3 102.1
401.0 101.2 99.1 103.5
402.0 102.8 100.4 104.2
...
Requirements:
Similar to CSV but using tabs or spaces
Optional header row
Consistent delimiter throughout file
ASC/ASCII Files
Format: Text format containing two columns: wavenumber and intensity
Supported extensions: .asc, .ascii
PKL Files
Format: Pickled pandas DataFrame
Supported extension: .pkl
Future Import Support (Planned)
SPC: Galactic SPC binary format
WDF: Renishaw WiRE format
Import Workflow
Step 2: Select Files for Import
Method A: File Browser
Click [Import Data] button
File dialog opens
Navigate to your data directory
Select one or multiple files (CSV/TXT/ASC/PKL)
Click [Open]
Method B: Drag and Drop
Open file explorer (Windows Explorer, Finder)
Navigate to your data files
Drag files directly into the import area
Release to drop
Method C: Paste File Paths
Copy file path(s) from explorer
Click [Import from Path]
Paste paths (one per line for multiple files)
Click [Import]
Step 3: Data Validation
Application automatically checks:
During import, you will see a validation status panel/toast listing items like:
File format (CSV/TXT/ASC/PKL)
Wavenumber column detected
Number of spectra (samples)
Wavenumber range (e.g., 400–1800 cm⁻¹)
Data integrity checks
Missing values handling (if enabled)
Visual reference: See the Data Package Page screenshot in
interface-overview.md.
Validation Checks:
File format compatibility
Wavenumber column detection
Consistent wavenumber spacing
No duplicate wavenumbers
Numeric data types
Missing value handling
Outlier detection (optional)
Step 4: Preview and Confirm
Preview window:
The preview dialog shows:
Selected file name
Sample count
Wavenumber range
A preview plot (typically the first few spectra)
Import options (auto-detect wavenumber column, interpolate missing values, etc.)
Cancel / Import actions
Options:
Auto-detect wavenumber column: Automatically identify x-axis
Interpolate missing values: Fill gaps with linear interpolation
Apply baseline correction: Pre-process during import (optional)
Step 5: Confirmation
After import completes, a success notification confirms the number of spectra imported and the source file.
Data Organization
Project Structure
Data is organized hierarchically:
Project: blood_plasma_study/
├── Data Packages/
│ ├── batch1_healthy/
│ │ ├── healthy_001.csv
│ │ ├── healthy_002.csv
│ │ └── metadata.json
│ ├── batch2_disease/
│ │ ├── disease_001.csv
│ │ ├── disease_002.csv
│ │ └── metadata.json
│ └── batch3_validation/
│ └── validation_set.csv
├── Preprocessing Pipelines/
│ └── standard_pipeline.json
└── Results/
├── analysis/
└── ml_models/
Creating Data Packages
Data Package = Collection of related spectra
Create New Package:
Click [+ New Package] in Data Package page
Enter package name:
batch1_healthyAdd description (optional): “Healthy controls, batch 1”
Import files into this package
Benefits:
Organize by experimental batch
Group by sample type
Separate training/validation/test sets
Apply batch-specific processing
Metadata Management
Each data package can have metadata:
{
"package_name": "batch1_healthy",
"description": "Healthy control samples from first batch",
"acquisition_date": "2025-12-15",
"laser_power": 50,
"integration_time": 10,
"spectrometer": "RamanSpecPro 5000",
"notes": "Room temperature, 785nm laser"
}
Edit Metadata:
Right-click on data package
Select Edit Metadata
Fill in fields
Click Save
Group Management
Creating Sample Groups
Groups are used for:
Classification labels
Statistical comparisons
Visualization colors
Cross-validation splits
Create Group:
Click [Manage Groups] button
Click [+ New Group]
Enter group details:
Name:
Healthy ControlLabel:
0(numeric for ML)Color: 🟢 Green
Description: “Healthy patients without disease”
Click Create
Common Group Naming:
For Classification:
- Healthy Control (label: 0)
- Disease Group A (label: 1)
- Disease Group B (label: 2)
For Regression:
- Low Concentration (value: 0-5)
- Medium Concentration (value: 5-10)
- High Concentration (value: 10-20)
Assigning Samples to Groups
Method A: Manual Selection
Select samples in data list (Ctrl+Click for multiple)
Right-click → Assign to Group
Select group from dropdown
Click Assign
Method B: Bulk Assignment
Click [Bulk Assign] button
Use pattern matching:
Pattern:
healthy_*→ Group: Healthy ControlPattern:
disease_*→ Group: Disease
Preview assignments
Click Apply
Method C: CSV Mapping
Create a CSV file with sample-to-group mapping:
sample_name,group_label
healthy_001,Healthy Control
healthy_002,Healthy Control
disease_001,Disease
disease_002,Disease
Import:
Click [Import Group Mapping]
Select CSV file
Verify mappings
Click Apply
Multi-Group Assignment
Some samples may belong to multiple groups:
Example: Clinical study with multiple factors
Group 1: Disease Status (Healthy, Disease A, Disease B)
Group 2: Gender (Male, Female)
Group 3: Age Range (<30, 30-50, >50)
Enable:
Settings → Data Management → Allow Multiple GroupsAssign samples to multiple group hierarchies
Select active grouping for analysis
Data Validation
Automatic Checks
Application performs validation on import:
1. Wavenumber Consistency
Check: All spectra must have identical wavenumber axis
✓ All spectra: 400-1800 cm⁻¹, 1000 points
✗ Mismatch detected:
- File 1: 400-1800 cm⁻¹
- File 2: 500-1700 cm⁻¹ (different range)
Solution:
Interpolate to common grid
Crop to common range
Use “Align Wavenumbers” tool
2. Missing Values
Check: No NaN or infinite values
⚠ Missing values detected:
- Spectrum 15: 3 NaN values at 1200-1202 cm⁻¹
- Spectrum 47: 1 NaN value at 850 cm⁻¹
Solutions:
Linear interpolation (default)
Polynomial interpolation
Remove affected spectra
Manual correction
3. Outlier Detection
Check: Identify spectra with unusual intensity values
⚠ Potential outliers:
- Spectrum 32: Intensity >10σ from mean
- Spectrum 88: Negative intensity values
Solutions:
Flag for review (don’t remove yet)
Visual inspection (plot spectrum)
Remove if confirmed (after manual check)
Note in metadata (keep but annotate)
4. Duplicate Spectra
Check: Detect identical or near-identical spectra
⚠ Duplicates detected:
- Spectra 15 and 47: 99.8% correlation
- Spectra 22 and 23: Identical (100%)
Solutions:
Remove exact duplicates (keep one copy)
Flag near-duplicates (may be technical replicates)
Keep all (if intentional replicates)
Manual Validation Tools
Spectrum Viewer
Inspect individual spectra:
Click on spectrum in list
Viewer shows:
Full spectrum plot
Statistics (mean, std, min, max)
Peak detection
Quality metrics
Actions:
Accept: Mark as validated
Reject: Remove from dataset
Edit: Manually correct issues
Notes: Add comments
Batch Validation
Review multiple spectra:
Click [Batch Validation]
Spectra displayed in grid (e.g., 3x3)
Navigate: Next/Previous pages
Actions: Accept, Reject, Flag
Use the on-screen controls for review actions.
Advanced Features
Wavenumber Calibration
Purpose: Correct systematic shifts in wavenumber axis
Calibration Methods:
Reference Peak Calibration
Select known peak (e.g., 1001 cm⁻¹ for benzene)
Specify expected position
Apply linear shift correction
Multi-Peak Calibration
Use multiple reference peaks
Fit polynomial correction curve
Apply non-linear calibration
Workflow:
# Example: Calibrate using 1001 cm⁻¹ benzene peak
1. Click [Calibration] in Data Package page
2. Select calibration standard spectrum
3. Mark expected peak position: 1001 cm⁻¹
4. Detected peak: 1003.5 cm⁻¹
5. Shift: -2.5 cm⁻¹
6. Apply to all spectra in package
Data Merging
Combine multiple datasets:
Select data packages to merge
Click [Merge Packages]
Choose merge strategy:
Concatenate: Stack spectra (keep all)
Average: Mean of all spectra per group
Interleave: Alternate between datasets
Handle wavenumber mismatches:
Interpolate: Resample to common grid
Crop: Use common wavenumber range only
Click Merge
Use Cases:
Combine multiple experimental batches
Create larger training sets
Merge technical replicates
Data Splitting
Split dataset into train/validation/test:
Select data package
Click [Split Dataset]
Configure split ratios:
Training: 70%
Validation: 15%
Test: 15%
Choose split strategy:
Random: Random assignment
Stratified: Maintain group proportions
Patient-level: Keep all spectra from one patient together
Click Split
Result: Three new data packages created automatically
Export Data
Export for external use:
In the current application:
The Data Package page can export metadata as JSON.
The Analysis page can export:
Plots: PNG, SVG
Data tables: CSV, XLSX, JSON, TXT, PKL
Options:
Available export options depend on the selected analysis method and output type.
Batch Import
Import multiple files at once:
Click [Batch Import]
Select folder containing CSV files
Options:
Recursive: Include subfolders
Pattern: Filter by filename (e.g.,
*.csv)Auto-group: Assign groups by folder name
Preview file list
Click Import All
Progress:
During batch import, the application shows a progress indicator with:
Overall percent complete
Processed/total file count
Current file name
Estimated time remaining
Best Practices
File Organization
Recommended folder structure:
data/
├── raw/
│ ├── batch1/
│ │ ├── healthy/
│ │ │ ├── patient_001.csv
│ │ │ └── patient_002.csv
│ │ └── disease/
│ │ ├── patient_101.csv
│ │ └── patient_102.csv
│ └── batch2/
│ └── ...
└── processed/
└── ...
Benefits:
Clear organization by batch and condition
Easy batch import
Automatic group assignment
Simplified version control
Naming Conventions
Files:
Good: patient_001_healthy.csv
Bad: p1.csv
Good: disease_group_a_replicate_1.csv
Bad: data.csv
Groups:
Good: Healthy_Control, Disease_GroupA, Disease_GroupB
Bad: Group1, Group2, G3
Quality Control
Before analysis:
✓ Visual inspection of spectra
✓ Check for outliers
✓ Verify group assignments
✓ Validate wavenumber calibration
✓ Document any issues in metadata
During project:
Keep raw data unchanged
Version processed datasets
Document preprocessing steps
Backup regularly
Troubleshooting
Import Fails
Error: “Could not parse CSV file”
Solutions:
Check delimiter (comma vs tab vs semicolon)
Verify decimal separator (period vs comma)
Check for non-numeric characters
Use UTF-8 encoding
Wavenumber Mismatch
Error: “Spectra have different wavenumber axes”
Solutions:
Use Align Wavenumbers tool
Interpolate to common grid
Crop to common range
Import separately and merge later
Memory Issues
Error: “Out of memory during import”
Solutions:
Import in smaller batches
Close other applications
Enable “Chunked Loading” in settings
Use 64-bit version of application
Missing Groups
Error: “No groups defined for classification”
Solutions:
Create groups first
Assign samples to groups
Verify group labels are correct
Check for unassigned samples
See Also
Interface Overview - Navigate the Data Package page
Preprocessing Guide - Next step after import
FAQ - Data Import - Common questions
Troubleshooting - Detailed error solutions
Next: Preprocessing Guide →