Run it

/  Nextflow  ·  nf-core template  ·  v0.1.0-dev

Soil community composition,
from one measurement.

nf-core-soillifeatlas takes raw LC-MS lipidomics spectra and returns a quantitative readout of bacteria, fungi, plants, archaea, animals and protozoa — a cross-kingdom source-decomposition pipeline in a single command.

Method
SIMPER fingerprint
decomposition
Atlas
12,710 features
× 18 phyla
Runtime
~20 min laptop
~5 min cluster
CI Status
32 unit · 11 nf-test · green

§  02 · Workflow

Five stages.
ClimGrass demo, end‑to‑end.

The pipeline implements the APPLY path for cross-kingdom source decomposition, with an IS/RIE/ref-filter correction stack on top. Every node below maps one-to-one onto a Nextflow module, which wraps one function in framework/*.py.

Pipeline DAG: Ingest → Match → Correct → Decompose → Report 01 Ingest MS2 · feature table 02 Match SIMPER fingerprint atlas 03 Correct IS · RIE(floor=0.20) · ref-filter 04 Decompose 4 methods in parallel 05 Report kingdom + phylum · plausibility · treatment
Fig. 1 · APPLY subworkflow, v0.1. TRAIN path scaffolded, ships v0.2+. view full DAG ↗

stage 01

Ingest

MS2 · feature table

A consensus feature table (Parquet) and MS2 spectra (MGF) produced upstream by MZmine 4.9.14 IIMN + GNPS2 FBMN. Optional sample metadata (TSV) unlocks treatment-effect verification.

inputs

  • soil_intensity.parquet
  • soil.mgf
  • sample_metadata.tsv (opt.)

stage 02

Match

SIMPER fingerprint atlas

Sample spectra are matched against the v0.1 SIMPER fingerprint atlas — 12,710 cross-batch features across 18 phyla — using matchms cosine ≥0.7, precursor ≤5 ppm, ≥4 matched peaks. Decomposition depends only on feature_id matching, not lipid annotations.

artefacts

  • verified_simper_matches.parquet

stage 03

Correct

IS · RIE(floor=0.20) · ref-filter

Three correction layers from the analysis-19 feedback loop. L2 scales intensities by the spiked internal standard (default LPE 18:0(d7), 100 pmol). L3 applies the response-ionization-efficiency correction with a floor of 0.20 to block over-amplification of weak ionizers. L5 restricts archaea to the ArchLips reference set.

artefacts

  • corrected_matrix.parquet
  • correction_report.tsv

stage 04

Decompose

4 methods in parallel

The corrected matrix is decomposed against the SIMPER atlas using four methods run in parallel: non-negative least squares, standard bray-curtis, enriched-weighted bray-curtis, and fold-change-weighted bray-curtis (the primary method). Methods are checkable — compare results side-by-side.

artefacts

  • composition_nnls.parquet
  • composition_std_bc.parquet
  • composition_enriched_bc.parquet
  • composition_fc_weighted_bc.parquet

stage 05

Report

kingdom + phylum · plausibility · treatment

Each run emits kingdom and phylum composition, a plausibility metric (Bray-Curtis vs. the expected kingdom distribution), treatment-effect tests (Mann-Whitney U), and a top-features diagnostic table that flags RIE over-amplification. A MultiQC HTML report ties it together, and a provenance.yaml records git SHA, atlas DOI, container digests, and params.

artefacts

  • composition.parquet
  • plausibility.tsv
  • treatment_effects.tsv
  • diagnostic.tsv
  • soillifeatlas_report.html

§  03 · Run it

Three ways in.
Pick whichever you have time for.

  1. 01

    Try the demo

    The 12-sample ClimGrass bundle

    Download, run, inspect the exact input + output files the pipeline consumes. ~20 minutes on a laptop.

    Inspect demo bundle
    # fetch demo bundle and run APPLY end-to-end
    curl -L -o demo_climgrass.tar.gz \
      https://github.com/soillifeatlas/nf-core-soillifeatlas/releases/\
    download/v0.1.0-dev/demo_climgrass.tar.gz
    tar -xzf demo_climgrass.tar.gz
    
    nextflow run soillifeatlas/nf-core-soillifeatlas \
      -r v0.1.0-dev -profile docker \
      --mode apply \
      --atlas_path demo_climgrass/atlas \
      --soil_intensity demo_climgrass/soil_intensity.parquet \
      --soil_mgf demo_climgrass/soil.mgf \
      --sample_metadata demo_climgrass/sample_metadata.tsv
  2. 02

    Run on your data

    Configure a run in four steps.

    Upload → metadata → instrument → advanced. You walk away with a copy-paste nextflow run command for your laptop, Docker, SLURM, or Google Batch.

    Zero upload. The form is entirely client-side — your files never leave your browser. All it produces is a command line.

  3. 03

    Hosted cloud

    Upload it, we run it.
    Coming soon, post-funding.

    Eventually we'll run the pipeline on Google Batch for you and email the results. The google_batch profile ships as a stub today — execution unlocks when the grant lands.

§  03.2 · Configure

Configure a run in
four steps — no upload.

The form below never contacts a server. It reads file sizes and names locally, composes a nextflow run command, and hands it back to you. Copy-paste into a terminal that has Nextflow installed and you're off.

v0.1 scope · mode=apply · POS ion mode · atlas=v0.1.0

§  04 · Demo bundle

What you get
when you run the demo.

Twelve ClimGrass grassland samples, positive ion mode, already aligned. See exactly what the pipeline consumes and what it produces — every artefact is plain Parquet, TSV, HTML, or YAML.

inputs

4 items · 18 MB

  • soil_intensity.parquet 1.8 MB

    consensus feature × sample matrix (MZmine 4.9.14 IIMN output)

  • soil.mgf 14.2 MB

    MS2 consensus spectra, one per feature

  • sample_metadata.tsv 2.1 KB

    treatment · replicate · ClimGrass plot id (optional)

  • atlas/

    v0.1.0 reference (simper_fp, RIE, IS masses, ArchLips)

    • simper_fingerprint_atlas.parquet
    • rie_table_s10.csv
    • equisplash_IS_masses_POS.csv
    • archlips_validated_features.csv
    • expected_kingdom_composition.csv

outputs

6 items · ~1.6 MB

  • composition/

    one Parquet per decomposition method

    • composition_nnls.parquet
    • composition_std_bc.parquet
    • composition_enriched_bc.parquet
    • composition_fc_weighted_bc.parquet
  • plausibility.tsv 0.9 KB

    BC distance to expected kingdom composition

  • treatment_effects.tsv 3.4 KB

    Mann-Whitney U per phylum × treatment

  • diagnostic.tsv 2.2 KB

    top 50 features driving each phylum (RIE guard)

  • soillifeatlas_report.html 1.1 MB

    MultiQC + custom panels

  • pipeline_info/provenance.yaml 0.6 KB

    git_sha · atlas DOI · container digests · params

See a rendered results page

§  05 · Hosted (soon)

Notify me when the
hosted version is live.

Note · This form does nothing yet. The hosted service is scheduled for post-funding (see the Google.org Impact Challenge). In the meantime, the pipeline runs locally (Docker), on a SLURM cluster, or on any Nextflow-compatible compute.