Skip to main content

About

What is 'snakemake'?

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Basic usage

Snakemake usage is extensively documented here.

Snakemake is organized in rules, which define specific input and output files. Files are processed using code which is for example directly deployed with the shell directive, or with external python and R scripts, or even directly rendered markdown based notebooks.

To get a first impression:

rule select_by_country:
input:
"data/worldcitiespop.csv"
output:
"by-country/{country}.csv"
shell:
"xsv search -s Country '{wildcards.country}' "
"{input} > {output}"

In this code chunk, the input table data/worldcitiespop.csv is searched by the keyword country, which is used as a wildcard to construct new file names for the output. The result is that all lines from the original table are split by Country and saved as separare files in a new output directory.

Create your own workflows

To do: Add a short description of how to create workflows. Link to the template.