wiki:HPC_storage

What is stored where

Introduction

We work with multiple storage systems ranging from large, parallel, shared storage available on multiple servers to small local storage available on a single server. Some of these storage sites are optimized for high performance (HP), others for high availability (HA) and yet others for slow, but cheap long term archiving of data. The exec summary:

 #
 # Centrally installed software and reference data:
 #
 /apps/software/:          Applications centrally deployed with [http:// EasyBuild]
 /apps/modules/:           Lmod module files for the entrally deployed software
 /apps/sources/:           Source code for the centrally deployed applications
 /apps/data/:              Centrally deployed reference data sets like the human genome
 #
 # Users:
 #
 /home/${user}/:           Your small home dir for personal settings/configs only
 #
 # Groups:
 #
 /groups/${group}/prm*/:   PeRManent dirs: Your group's large, fast dirs for rawdata and final results
 /groups/${group}/arc*/:   ARChive   dirs: Your group's larger, slow dirs for archived rawdata and final results
 /groups/${group}/tmp*/:   TeMPorary dirs: Your group's large, fastest dirs for shared temporary data
 /groups/${group}/scr*/:   SCRatch   dirs: Your group's fast dirs for local temporary data

Please consult the info below and make sure you know what to store where!

Top 4 of blunders, that will result in disaster rather sooner than later

  1. Use many jobs to create massive IO load on your home dir making everybody's home dir very slow or worse...
  2. Use a sub optimal data structure or experimental design resulting in many thousands of files in a directory either by using small files instead of a relatively small number of large files or by never creating sub dirs.
    As (our) large parallel file system are optimized for large files, creating many many small files will result in high load on the meta-data servers killing performance or worse...
  3. Never cleanup and run out of space crashing both your own jobs as well as those of all other users from the same group.
  4. Never finish an experiment and postpone the task of moving the final results from the HP tmp/scr file systems to the HA prm/arc file systems forever.
    As the HP tmp/scr filesystems have no backups and files older than 90 days are deleted automatically, you will loose your results automagically.

Details

Centrally installed software and reference data @ /apps/…

Software

We deploy software with EasyBuild in a central place on User Interface (UI)servers. From there the software is synced to various tmp or local storage devices, which may vary per cluster node. Do not use hard-coded paths to tools in your scripts: these will vary per cluster node and may change without notice in case we need to switch from one tmp storage system to another due to (un)scheduled downtime. We use the Lua based module system (Lmod) to make software transparently available on all servers.

  • To get a list of available apps:
    $> module avail
    

  • If your pet app is not available, please use the following procedure:
    1.    First try to install the software in a /groups/${group}/tmp0*/... folder (without EasyBuild).
    2.    Test the software and evaluate if it is useful to do a proper reproducible deployment.
          If yes, continue and otherwise cleanup.
    3.A   If you work here for < 6 months we don't expect you to learn how to use EasyBuild.
          Ask someone from the depad group to deploy the software with an EasyConfig.
          Ask your supervisor first; if he/she is not part of the depad group, you can send a request to the GCC helpdesk.
    3.B   If you work here for > 6 months it's time to learn how to create an EasyConfig for EasyBuild
          and deploy with EasyBuild in a /groups/${group}/tmp0*/... folder.
    4.B   Fork the easybuild-easyconfigs repo on GitHub and create pull request with your new EasyConfig.
    5.B   If you are not yet a member of the depad group: request membership by sending an email to the GCC helpdesk.
          Include in your email:
              * a link to the pull request.
              * the path to the module file created at the end of the deployment with EasyBuild in /groups/${group}/tmp0*/...
          If the EasyConfig is sane and the software was deployed properly, you've passed the test and will be added to the depad group.
    6.B   If you already are a member of the depad group: deploy with EasyBuild in /apps/...
    

Please visit this page to learn how to make an EasyConfig file.

Note: At the moment, we're using the foss-2015b toolchain.

  • To load an app in your environment if you don't care about the version:
    $> module load ModuleName
    
  • To load a specific version of an app:
    $> module load ModuleName/Version
    
  • To see which modules are currently active in your environment
    $> module list
    
    • Note that some modules may have dependencies on others. In that case the dependencies are automatically loaded. Hence module list may report more loaded modules than you loaded explicitly with module load
    • Suggested good practice: always use module list after module load in your analyses scripts and write the output to a log file. This way you can always trace back which versions of which tools and their dependencies were used.
  • If you need multiple conflicting versions of apps for different parts of you analysis you can also remove a module and load another version:
    $> module load ModuleName/SomeVersion
    $> module list
    $> [analyse some data...]
    $> module unload ModuleName
    $> module load ModuleName/AnotherVersion
    $> module list
    $> [analyse more data...]
    
  • Example for the Genome Analysis Toolkit (GATK)
    List available GATK versions:
      $> module avail GATK
    ----------------------------------- /apps/modules/bio -----------------------------------
       GATK/3.3-0-Java-1.7.0_80    GATK/3.4-0-Java-1.7.0_80    GATK/3.4-46-Java-1.7.0_80 (D)
    
    To select version 3.4-46-Java-1.7.0_80 and check what got loaded:
      $> module load GATK/3.4-46-Java-1.7.0_80
         To execute GATK run: java -jar ${EBROOTGATK}/GenomeAnalysisTK.jar
      $> module list
    Currently Loaded Modules:
      1) GCC/4.8.4                                                  13) libpng/1.6.17-goolf-1.7.20
      2) numactl/2.0.10-GCC-4.8.4                                   14) NASM/2.11.06-goolf-1.7.20
      3) hwloc/1.10.1-GCC-4.8.4                                     15) libjpeg-turbo/1.4.0-goolf-1.7.20
      4) OpenMPI/1.8.4-GCC-4.8.4                                    16) bzip2/1.0.6-goolf-1.7.20
      5) OpenBLAS/0.2.13-GCC-4.8.4-LAPACK-3.5.0                     17) freetype/2.6-goolf-1.7.20
      6) gompi/1.7.20                                               18) pixman/0.32.6-goolf-1.7.20
      7) FFTW/3.3.4-gompi-1.7.20                                    19) fontconfig/2.11.94-goolf-1.7.20
      8) ScaLAPACK/2.0.2-gompi-1.7.20-OpenBLAS-0.2.13-LAPACK-3.5.0  20) expat/2.1.0-goolf-1.7.20
      9) goolf/1.7.20                                               21) cairo/1.14.2-goolf-1.7.20
     10) libreadline/6.3-goolf-1.7.20                               22) Java/1.8.0_45
     11) ncurses/5.9-goolf-1.7.20                                   23) R/3.2.1-goolf-1.7.20
     12) zlib/1.2.8-goolf-1.7.20                                    24) GATK/3.4-46-Java-1.7.0_80
    
    The GATK was written in Java and therefore the Java dependency was loaded automatically. R was also loaded as some parts of the GATK use R for creating plots/graphs. R itself was compiled from scratch and has a large list of dependencies of its own ranging from compilers like the GCC to graphics libs like libpng. Java and R have binaries, which can be executed without specifying the path to where they are locate on the system: The module system has added the directories, where they are located, to the ${PATH} environment variable, which is used as search path for binaries. If the GATK was a binary you could now simply call it without specifying the path to it, but as the GATK is a Java *.jar we need to call the java binary and specify the path to the GATK *.jar. To make sure we don't need an absolute path to the GATK *.jar hard-coded in our jobs/scripts, the GATK module created an environment variable named ${EBROOTGATK}, so we can resolve the path to the GATK transparently even if it varies per server. The EB stands for EasyBuild, which we use to deploy software. EasyBuild creates environment variables pointing to the root of where the software was installed for each module according the the scheme EB + ROOT + [NAMEOFSOFTWAREPACKAGEINCAPITALS]. Hence for myFavoriteApp it would be ${EBROOTMYFAVORITEAPP}. Let's what's installed in ${EBROOTGATK}:
    $> ls -hl ${EBROOTGATK}
       drwxrwsr-x 2 umcg-pneerincx umcg-depad 4.0K Aug  5 15:59 easybuild
       -rw-rw-r-- 1 umcg-pneerincx umcg-depad  13M Jul  9 23:41 GenomeAnalysisTK.jar
       drwxrwsr-x 2 umcg-pneerincx umcg-depad 4.0K Aug  5 15:58 resources
    
    Hence we can now execute the GATK and verify we loaded the correct version like this:
    $> java -jar ${EBROOTGATK}/GenomeAnalysisTK.jar --version
       3.4-46-gbc02625
    
    Note that we did not have to specify a hard-coded path to java nor to the GATK *.jar.

Reference data

We deploy reference data sets like for example the human genome in a central place, which is available on all servers:

/apps/data/...

Please use them from that location as opposed to downloading yet another copy elsewhere. If your pet reference data set is missing contact the GCC to have it added.

Your personal home dir @ /home/${user}

This is were you have limited space to store your personal settings/preferences/configs. Your home is available on all servers of a cluster, but different clusters have separate homes. A typical home dir contains << 100 Mb of data:

  • ~/.bashrc file with custom settings, aliases, commands for bash.
  • ~/.ssh folder with keys and settings required for SSH access.
  • Various other (hidden) files / folders that contain settings.

Your home is designed to be a private folder; do not try to change permissions to share data located in your home with other users.

Your home is on HA and hence not on HP storage. Therefore you should try to minimize the IO load on your home to make sure everyone can enjoy a fast responsive home.

Do not abuse your home dir, so:

  • Don't waste resources by installing in your private home dir
    • yet another copy of the same software package
    • yet another copy of a (large) reference data set like the human genome
  • Do not run anything that causes massive random IO on your home dir.
    • E.g. don't store job scripts submitted to cluster nodes in homes.

Group dirs @ /groups/${group}/…

Every user is a member of at least one group. A group has access to large shared storage systems of which we have 4 types:

  • /groups/${group}/prm*/: PeRManent dirs: Your group's large, fast dirs for rawdata and final results
  • /groups/${group}/arc*/: ARChive dirs: Your group's larger, slow dirs for archived rawdata and final results
  • /groups/${group}/tmp*/: TeMPorary dirs: Your group's large, fastest dirs for shared temporary data
  • /groups/${group}/scr*/: SCRatch dirs: Your group's fast dirs for local temporary data

Not all groups have access to all types of storage systems and not all types are available on all machines. See the list of storage types for an overview and differences.

The minimal requirements / setup for a group is as follows:

  • Group leaders / PIs can request new groups. When the group is created they will be registered as the group owners.
  • Group owners are responsible for
    • Processing (accepting or rejecting) requests for group membership.
    • Securing funding and paying the bills.
    • Appointing data managers for their group.
  • Data managers are responsible for the group's data on prm and arc storage systems.
    • They ensure the group makes arrangements what to store how and where. E.g file naming conventions, file formats to use, etc.
    • They enforce the group's policy on what to store how and where by reviewing data sets produced by other group members on tmp or scr file systems before migrating/copying them to prm and arc.
    • They have read-write access to all file systems including prm and arc.
  • Other 'regular' group members:
    • Have read-only access to prm and arc file systems to check-out existing data sets.
    • Have read-write access to tmp and scr file systems to produce new results.
    • Can request a data manager to review and migration a newly produced data set to prm or arc file systems.
  • A group has at least one owner and one data manager, but to prevent delays in processing membership request and data set reviews a group has preferably more than one owner and more than one data manager.

Quota

We use a quota system, which limits how much storage space can consume. If you exceed your limits you can not write any new data before you delete something else to free up space. Home directories have user based quota, which means that if you run out of space, you are the only one affected. All other file systems use group or file set based quota, which means that if you run out of space everybody from your group (or file set) is also out of space on that file system, but other groups are not affected. There are two limits and a timer period that determines how these interact:

  • quota (soft): exceed your quota and you can still write data, but the system will start to issue warnings that you are running low on storage space.
  • limit (hard): exceed your limit and you are instantly prohibited from writing any data. You will need to delete something else to free up space before you can write new data.
  • timers: after exceeding your quota the timer kicks in and if you do not reduce your data volume to less than your quota, the soft quota will become your hard limit when the timer expires.

The combination of quota, larger limits and timers prevents users from permanently exceeding their quota while allowing them to temporarily consume more space to handle peak loads. Note that if you write a lot of data and fast it is possible to exceed both your quota and limit in a time frame that is much shorter than the quota reporting interval. In that case you may run out of disk space before you received your first warning.

Checking your quota:

  • Different types of file systems come with their own quota tools, which produce different reports. Therefore we use a custom wrapper to unify the output for various file systems:
    $> module load cluster-utils
    $> quota
    
  • The report will show 11 columns:
     1 Quota type = one of:
        (U) = user quota
        (P) = (private) group quota: group with only one user and mostly used for home dirs.
        (G) = (regular) group quota: group with multiple users.
        (F) = file set quota: in our setup different tech to also manage quota for a group with multiple users.
     2 Path/Filesystem = (part of) a storage system controlled by the quota settings listed.
     3 used   = total amount of disk space your data consumes.
     4 quota  = soft limit for space.
     5 limit  = hard limit for space.
     6 grace  = days left before the timer for space quota expires.
     7 used   = total number of files and folders your data consists of.
     8 quota  = the soft limit for the number of files and folders.
     9 limit  = the hard limit for the number of files and folders.
    10 grace  = days left before the timer for the number of files and folders quota expires.
    11 status = whether you exceed your quota or not.
    
  • We currently only use quota for the space used and not for the amount of files you can create.

List of storage devices / mount points

S Path Function Soft Quota Hard Quota Quota Timer Backup Mounted on User Interface servers Mounted on File Sharing servers Mounted on cluster nodes Status

O

/apps/modules/...
/apps/software/...
/apps/sources/...

Centrally installed applications their sources and accompanying modules.
Use the module command to locate software (and don't use hard-coded paths to /apps/software/...)

none
(write access only for admins)

none
(write access only for admins)

none

yes

yes

no

original: no
rsynced copy: yes

  • Online

O

/apps/data/...

Centrally installed reference data sets.

none
(write access only for admins)

none
(write access only for admins)

none

yes

yes

no

original: no
rsynced copy: yes

  • Online

O

/home/${user}/

Your home dir with personal settings/environment/configs.

---
(1 GB)

-
(2 GB)

7 days

yes

yes

yes

yes

  • Online

O

/groups/${group}/prm02/...

Various folders for permanent large data. Layout varies per group.

+
(Several TBs; varies per group)

++
(1,5 or 2 * soft limit)

14 days

yes

yes

no

no

  • Online.

O

/groups/${group}/prm02/rawdata/

We suggest a sub folder for raw data, which is typically organised by data type, machine on which the data was produced, data production date or...

  • Online.

O

/groups/${group}/prm02/projects/

We suggest a folder for final results of analysis / experiments, where data is typically not organised by data type, but by research question / project instead.

  • Online.

O

/groups/${group}/prm03/...

Various folders for permanent large data. Layout varies per group.

+
(Several TBs; varies per group)

++
(1,5 or 2 * soft limit)

14 days

yes

yes

no

no

  • Online.

D

/groups/${group}/arc01/...

Archive folder for older data you no longer need often,
but do want to keep for now.
Stored on slower storage systems with more capacity.

+++
(Several TBs; varies per group)

+++++
(1,5 or 2 * soft limit)

14 days

yes

none

none

none

  • Down
  • In development; Needs testing...

O

/groups/${group}/tmp02/...

High performance file system for temporary data.
Files older than 90 days are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online.

O

/groups/${group}/tmp03/...

High performance file system for temporary data.
Files older than 90 days are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online

O

/groups/${group}/tmp04/...

High performance file system for temporary data.
Files older than 90 days are automatically deleted.

+++++
(Several TBs; varies per group)

+++++++
(1,5 or 2 * soft limit)

14 days

no

yes

no

yes

  • Online

The life cycle of experimental data

  1. Generate "raw" data in the lab and upload that to a folder in /groups/${group}/prm*/rawdata/... on HA storage.
  2. Select a (sub)set of raw data you want to analyze on the cluster and stage this data by copying it from /groups/${group}/prm*/rawdata/... to /groups/${group}/tmp*/... on HP storage.
    Make sure your in-silico experiment processes a chunk of data that can easily be analysed in << 90 days.
  3. Generate jobs, which read and write to and from folders in /groups/${group}/tmp*/... on HP storage systems.
  4. Submit your jobs on the cluster.
  5. Once all jobs have finished successfully, assess the results and if they pass QC, store your final results by copying them to a folder in /groups/${group}/prm*/projects/... on HA storage.
  6. Cleanup by deleting your tmp data from /groups/${group}/tmp*/... to free up space for your next experiment.
    Failure to do so may result in temporarily running out of space; Eventually all tmp data older than 90 days will be deleted automagically.
  7. Document and publish your experiment/data.
Last modified 3 days ago Last modified on 2017-12-08T14:24:38+01:00