/ Nextflow · nf-core template · v0.1.0-dev

Soil community composition,
from one measurement.

nf-core-soillifeatlas takes raw LC-MS lipidomics spectra and returns a quantitative readout of bacteria, fungi, plants, archaea, animals and protozoa — a cross-kingdom source-decomposition pipeline in a single command.

Configure a run How the pipeline works github ↗

Method: SIMPER fingerprint
decomposition
Atlas: 12,710 features
× 18 phyla
Runtime: ~20 min laptop
~5 min cluster
CI Status: 32 unit · 11 nf-test · green

§ 02 · Workflow

Five stages.
ClimGrass demo, end‑to‑end.

The pipeline implements the APPLY path for cross-kingdom source decomposition, with an IS/RIE/ref-filter correction stack on top. Every node below maps one-to-one onto a Nextflow module, which wraps one function in framework/*.py.

Fig. 1 · APPLY subworkflow, v0.1. TRAIN path scaffolded, ships v0.2+. → click any stage for details view full DAG ↗

stage 01

Ingest

MS2 · feature table

A consensus feature table (Parquet) and MS2 spectra (MGF) produced upstream by MZmine 4.9.14 IIMN + GNPS2 FBMN. Optional sample metadata (TSV) unlocks treatment-effect verification.

inputs

soil_intensity.parquet
soil.mgf
sample_metadata.tsv (opt.)

stage 02

Match

SIMPER fingerprint atlas

Sample spectra are matched against the v0.1 SIMPER fingerprint atlas — 12,710 cross-batch features across 18 phyla — using matchms cosine ≥0.7, precursor ≤5 ppm, ≥4 matched peaks. Decomposition depends only on feature_id matching, not lipid annotations.

artefacts

verified_simper_matches.parquet

stage 03

Correct

IS · RIE(floor=0.20) · ref-filter

Three correction layers from the analysis-19 feedback loop. L2 scales intensities by the spiked internal standard (default LPE 18:0(d7), 100 pmol). L3 applies the response-ionization-efficiency correction with a floor of 0.20 to block over-amplification of weak ionizers. L5 restricts archaea to the ArchLips reference set.

artefacts

corrected_matrix.parquet
correction_report.tsv

stage 04

Decompose

4 methods in parallel

The corrected matrix is decomposed against the SIMPER atlas using four methods run in parallel: non-negative least squares, standard bray-curtis, enriched-weighted bray-curtis, and fold-change-weighted bray-curtis (the primary method). Methods are checkable — compare results side-by-side.

artefacts

composition_nnls.parquet
composition_std_bc.parquet
composition_enriched_bc.parquet
composition_fc_weighted_bc.parquet

stage 05

Report

kingdom + phylum · plausibility · treatment

Each run emits kingdom and phylum composition, a plausibility metric (Bray-Curtis vs. the expected kingdom distribution), treatment-effect tests (Mann-Whitney U), and a top-features diagnostic table that flags RIE over-amplification. A MultiQC HTML report ties it together, and a provenance.yaml records git SHA, atlas DOI, container digests, and params.

artefacts

composition.parquet
plausibility.tsv
treatment_effects.tsv
diagnostic.tsv
soillifeatlas_report.html

§ 03 · Run it

Three ways in.
Pick whichever you have time for.

01

Try the demo

The 12-sample ClimGrass bundle

Download, run, inspect the exact input + output files the pipeline consumes. ~20 minutes on a laptop.

Inspect demo bundle

# fetch demo bundle and run APPLY end-to-end
curl -L -o demo_climgrass.tar.gz \
  https://github.com/soillifeatlas/nf-core-soillifeatlas/releases/\
download/v0.1.0-dev/demo_climgrass.tar.gz
tar -xzf demo_climgrass.tar.gz

nextflow run soillifeatlas/nf-core-soillifeatlas \
  -r v0.1.0-dev -profile docker \
  --mode apply \
  --atlas_path demo_climgrass/atlas \
  --soil_intensity demo_climgrass/soil_intensity.parquet \
  --soil_mgf demo_climgrass/soil.mgf \
  --sample_metadata demo_climgrass/sample_metadata.tsv

02

Run on your data

Configure a run in four steps.

Upload → metadata → instrument → advanced. You walk away with a copy-paste nextflow run command for your laptop, Docker, SLURM, or Google Batch.

Zero upload. The form is entirely client-side — your files never leave your browser. All it produces is a command line.

begin

Open the configurator →

01 · Upload 02 · Metadata 03 · Instrument 04 · Advanced
03

Hosted cloud

Upload it, we run it.
Coming soon, post-funding.

Eventually we'll run the pipeline on Google Batch for you and email the results. The google_batch profile ships as a stub today — execution unlocks when the grant lands.

Join the waitlist

§ 03.2 · Configure

Configure a run in
four steps — no upload.

The form below never contacts a server. It reads file sizes and names locally, composes a nextflow run command, and hands it back to you. Copy-paste into a terminal that has Nextflow installed and you're off.

v0.1 scope · mode=apply · POS ion mode · atlas=v0.1.0

§ 04 · Demo bundle

What you get
when you run the demo.

Twelve ClimGrass grassland samples, positive ion mode, already aligned. See exactly what the pipeline consumes and what it produces — every artefact is plain Parquet, TSV, HTML, or YAML.

inputs

4 items · 18 MB

soil_intensity.parquet 1.8 MB

consensus feature × sample matrix (MZmine 4.9.14 IIMN output)
soil.mgf 14.2 MB

MS2 consensus spectra, one per feature
sample_metadata.tsv 2.1 KB

treatment · replicate · ClimGrass plot id (optional)
atlas/ —

v0.1.0 reference (simper_fp, RIE, IS masses, ArchLips)
- simper_fingerprint_atlas.parquet
- rie_table_s10.csv
- equisplash_IS_masses_POS.csv
- archlips_validated_features.csv
- expected_kingdom_composition.csv

outputs

6 items · ~1.6 MB

composition/ —

one Parquet per decomposition method
- composition_nnls.parquet
- composition_std_bc.parquet
- composition_enriched_bc.parquet
- composition_fc_weighted_bc.parquet
plausibility.tsv 0.9 KB

BC distance to expected kingdom composition
treatment_effects.tsv 3.4 KB

Mann-Whitney U per phylum × treatment
diagnostic.tsv 2.2 KB

top 50 features driving each phylum (RIE guard)
soillifeatlas_report.html 1.1 MB

MultiQC + custom panels
pipeline_info/provenance.yaml 0.6 KB

git_sha · atlas DOI · container digests · params

See a rendered results page

§ 05 · Hosted (soon)

Notify me when the
hosted version is live.

Note · This form does nothing yet. The hosted service is scheduled for post-funding (see the Google.org Impact Challenge). In the meantime, the pipeline runs locally (Docker), on a SLURM cluster, or on any Nextflow-compatible compute.

Soil community composition, from one measurement.

Ingest

Match

Correct

Decompose

Report

The 12-sample ClimGrass bundle

Configure a run in four steps.

Upload it, we run it. Coming soon, post-funding.

Notify me when the hosted version is live.

Soil community composition,
from one measurement.

Upload it, we run it.
Coming soon, post-funding.

Notify me when the
hosted version is live.