wiki:HPC_storage

What is stored where

Introduction

We work with multiple storage systems ranging from large, parallel, shared storage available on multiple servers to small local storage available on a single server. Some of these storage sites are optimized for high performance (HP), others for high availability (HA) and yet others for slow, but cheap long term archiving of data. The exec summary:

 /apps/software/:   Applications centrally deployed with [http:// EasyBuild]
 /apps/modules/:    Lmod module files for the entrally deployed software
 /apps/sources/:    Source code for the centrally deployed applications
 /apps/data/:       Centrally deployed reference data sets like the human genome
 /home/${user}/:    Your small home dir for personal settings/configs only
 /${group}/prm*/:   PeRManent dirs: Your group's large, fast dirs for rawdata and final results
 /${group}/arc*/:   ARChive   dirs: Your group's larger, slow dirs for archived rawdata and final results
 /${group}/tmp*/:   TeMPorary dirs: Your group's large, fastest dirs for shared temporary data
 /${group}/scr*/:   SCRatch   dirs: Your group's fast dirs for local temporary data

Please consult the info below and make sure you know what to store where!

Top 4 of blunders, that will result in disaster rather sooner than later

  1. Use many jobs to create massive IO load on your home dir making everybody's home dir very slow or worse...
  2. Use a sub optimal data structure or experimental design resulting in many thousands of files in a directory either by using small files instead of a relatively small number of large files or by never creating sub dirs.
    As (our) large parallel file system are optimized for large files, creating many many small files will result in high load on the meta-data servers killing performance or worse...
  3. Never cleanup and run out of space crashing both your own jobs as well as those of all other users from the same group.
  4. Never finish an experiment and postpone the task of moving the final results from the HP tmp/scr file systems to the HA prm/arc file systems forever.
    As the HP tmp/scr filesystems have no backups and files older than 90 days are deleted automatically, you will loose your results automagically.

Details

List of storage devices / mount points

S Path Function Soft Quota Hard Quota Quota Timer Backup Mounted on User Interface servers Mounted on File Sharing servers Mounted on cluster nodes Status

O

/apps/modules/...
/apps/software/...
/apps/sources/...

Centrally installed applications their sources and accompanying modules.
Use the module command to locate software (and don't use hard-coded paths to /apps/software/...)

none
(write access only for admins)

none
(write access only for admins)

none

yes

yes

no

original: no
rsynced copy: yes

  • Online

O

/apps/data/...

Centrally installed reference data sets.

none
(write access only for admins)

none
(write access only for admins)

none

yes

yes

no

original: no
rsynced copy: yes

  • Online

O

/home/${user}/

Your home dir with personal settings/environment/configs.

---
(1 GB)

-
(2 GB)

7 days

yes

yes

yes

yes

  • Online

O

/groups/${group}/prm02/...

Various folders for permanent large data. Layout varies per group.

+
(Several TBs; varies per group)

++
(1,5 or 2 * soft limit)

14 days

yes

yes

no

no

  • Online.

O

/groups/${group}/prm02/rawdata/

We suggest a sub folder for raw data, which is typically organised by data type, machine on which the data was produced, data production date or...

  • Online.

O

/groups/${group}/prm02/projects/

We suggest a folder for final results of analysis / experiments, where data is typically not organised by data type, but by research question / project instead.

  • Online.

O

/groups/${group}/prm03/...

Various folders for permanent large data. Layout varies per group.

+
(Several TBs; varies per group)

++
(1,5 or 2 * soft limit)

14 days

yes

yes

no

no

  • Online.

D

/groups/${group}/arc01/...

Archive folder for older data you no longer need often,
but do want to keep for now.
Stored on slower storage systems with more capacity.

+++
(Several TBs; varies per group)

+++++
(1,5 or 2 * soft limit)

14 days

yes

none

none

none

  • Down
  • In development; Needs testing...

O

/groups/${group}/tmp02/...

High performance file system for temporary data.
Files older than 3 months are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online.

O

/groups/${group}/tmp03/...

High performance file system for temporary data.
Files older than 3 months are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online

O

/groups/${group}/tmp04/...

High performance file system for temporary data.
Files older than 3 months are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online
Last modified 7 months ago Last modified on 2017-01-10T19:30:49+01:00