We are compiling a list of commonly used datasets across various groups or projects. This compilation aids in saving space and storage quotas for your group or project.
If you believe a dataset qualifies for inclusion in this list, please contact us at m3c-support@zdv.uni-tuebingen.de
and provide information on why the dataset is considered common, groups that potentially use it, or any other relevant details that can reliably justify its placement in the common space. If encryption is required for the dataset, please indicate so.
If your dataset is commonly used but small enough to not consume significant storage space, it may be preferable for each project or group to maintain its own copy, as this can lead to faster access times.
If not noted otherwise (as currently for Kraken 2, HUMAnN and MetaPhlAn): Dataset updates can be performed on request and will be done out-of-place and announced on the mailinglist.
As we explicitly can not guarantee their correctness:
Before relying on the datasets please try to reproduce previous results.
If you do find problems, inconsistencies, etc with the provided datasets let us know as soon as possible.
Please generally note that these datasets were published under various licenses.
We provide them in LICENSE
files in the respective dataset subfolders.
/mnt/lustre/datasets/igenomes
/mnt/lustre/datasets/kraken2
and can be used by starting the Kraken 2 call withkraken2 --db /mnt/lustre/datasets/kraken2 ...
The database was downloaded/generated
/mnt/lustre/datasets/humann
and can be used with the CLI options--nucleotide-database /mnt/lustre/datasets/humann/chocophlan
# AND/OR
--protein-database /mnt/lustre/datasets/humann/uniref
Alternatively HUMAnN can be configured to permanently use these locations withhumann_config --update database_folders nucleotide /mnt/lustre/datasets/humann/chocophlan
# AND/OR
humann_config --update database_folders protein /mnt/lustre/datasets/humann/uniref
The databases were downloaded/generated
/mnt/lustre/datasets/metaphlan
and can be used with the CLI argument(s)--bowtie2db /mnt/lustre/datasets/metaphlan
# AND OPTIONALLY
--index ...
The databases were downloaded/generated
mpa_vJan21_CHOCOPhlAnSGB_202103
mpa_vOct22_CHOCOPhlAnSGB_202212
mpa_vJun23_CHOCOPhlAnSGB_202307
mpa_vJun23_CHOCOPhlAnSGB_202403
(default)mpa_vOct22_CHOCOPhlAnSGB_202403
.