Storage on Sherlock#
Sherlock provides access to several file systems, each with distinct storage characteristics. Each user and PI group get access to a set of predefined directories in these file systems to store their data.
Sherlock is a compute cluster, not a storage system
Sherlock's storage resources are limited and are shared among many users. They are meant to store data and code associated with projects for which you are using Sherlock's computational resources. This space is for work actively being computed on with Sherlock, and should not be used as a target for backups from other systems.
If you're looking for a long-term storage solution for research data, Stanford Research Computing offers the Oak storage system, which is specifically intended for this usage.
Those file systems are shared with other users, and are subject to quota limits and for some of them, purge policies (time-residency limits).
Filesystem overview#
Features and purpose#
Name | Type | Backups / Snapshots | Performance | Purpose | Cost |
---|---|---|---|---|---|
$HOME , $GROUP_HOME | NFS | / | low | small, important files (source code, executable files, configuration files...) | free |
$SCRATCH , $GROUP_SCRATCH | Lustre | / | high bandwidth | large, temporary files (checkpoints, raw application output...) | free |
$L_SCRATCH | local SSD | / | low latency, high IOPS | job specific output requiring high IOPS | free |
$OAK | Lustre | option / | moderate | long term storage of research data | volume-based1 |
Access scope#
Name | Scope | Access sharing level |
---|---|---|
$HOME | cluster | user |
$GROUP_HOME | cluster | group |
$SCRATCH | cluster | user |
$GROUP_SCRATCH | cluster | group |
$L_SCRATCH | compute node | user |
$OAK | cluster (optional, purchase required) | group |
Group storage locations are typically shared between all the members of the same PI group. User locations are only accessible by the user.
Quotas and limits#
Volume and inodes
Quotas are applied on both volume (the amount of data stored in bytes) and inodes: an inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. In practice, each filesystem entry (file, directory, link) counts as an inode.
Name | Quota type | Volume quota | Inode quota | Retention |
---|---|---|---|---|
$HOME | directory | 15 GB | n/a | |
$GROUP_HOME | directory | 1 TB | n/a | |
$SCRATCH | directory | 100 TB | 20 million | time limited |
$GROUP_SCRATCH | directory | 100 TB | 20 million | time limited |
$L_SCRATCH | n/a | n/a | n/a | job lifetime |
$OAK | directory | amount purchased | function of the volume purchased |
Quota types:
- directory: based on files location and account for all the files that are in a given directory.
- user: based on files ownership and account for all the files that belong to a given user.
- group: based on files ownership and account for all the files that belong to a given group.
Retention types:
- : files are kept as long as the user account exists on Sherlock.
- time limited: files are kept for a fixed length of time after they've been last modified. Once the limit is reached, files expire and are automatically deleted.
- job lifetime: files are only kept for the duration of the job and are automatically purged when the job ends.
Global fail-safe user and quota groups on /scratch
To prevent potential issues which would result in the file system filling up completely and making it unusable for everyone, additional user and group-level quotas are in place on the /scratch
file system, as a fail-safe:
-
a user will not be able to use more than 250 TB (50M inodes) in total, in all the
/scratch
directories they have access to. -
a group will not be able to use more than 1 PB (200M inodes) in total across all the
/scratch
directories its group members have access to.
Checking quotas#
To check your quota usage on the different filesystems you have access to, you can use the sh_quota
command:
$ sh_quota
+---------------------------------------------------------------------------+
| Disk usage for user kilian (group: ruthm) |
+---------------------------------------------------------------------------+
| Filesystem | volume / limit | inodes / limit |
+---------------------------------------------------------------------------+
HOME | 9.4GB / 15.0GB [|||||| 62%] | - / - ( -%)
GROUP_HOME | 562.6GB / 1.0TB [||||| 56%] | - / - ( -%)
SCRATCH | 65.0GB / 100.0TB [ 0%] | 143.8K / 20.0M ( 0%)
GROUP_SCRATCH | 172.2GB / 100.0TB [ 0%] | 53.4K / 20.0M ( 0%)
OAK | 30.8TB / 240.0TB [| 12%] | 6.6M / 36.0M ( 18%)
+---------------------------------------------------------------------------+
Several options are provided to allow listing quotas for a specific filesystem only, or in the context of a different group (for users who are members of several PI groups). Please see the sh_quota
usage information for details:
$ sh_quota -h
sh_quota: display user and group quota information for all accessible filesystems.
Usage: sh_quota [OPTIONS]
Optional arguments:
-f FILESYSTEM only display quota information for FILESYSTEM.
For instance: "-f $HOME"
-g GROUP for users with multiple group memberships, display
group quotas in the context of that group
-n don't display headers
-j JSON output (implies -n)
Examples#
For instance, to only display your quota usage on $HOME
:
$ sh_quota -f HOME
If you belong to multiple groups, you can display the group quotas for your secondary groups with:
$ sh_quota -g <group_name>
And finally, for great output control, an option to display quota usage in JSON is provided via the -j
option:
$ sh_quota -f SCRATCH -j
{
"SCRATCH": {
"quotas": {
"type": "user",
"blocks": {
"usage": "47476660",
"limit": "21474836480"
},
"inodes": {
"usage": "97794",
"limit": "20000000"
}
}
}
}
Locating large directories#
It's not always easy to identify files and directories that take the most space when getting close to the quota limits. Some tools can help with that.
-
du
can be used to display the volume used by files and directories, in a given folder:$ cd mydir/ $ du --human-readable --summarize * 101M dir 2.0M file
Note
du
will ignore hidden entries (everything that starts with a dot (.
)). So when using it in your$HOME
directory, it will skip things like.cache
or.conda
, which can contain significant volumes. -
ncdu
is an interactive disk usage analyzer, that generates visual representation of the volume (and inode count) for directories. To run it, you need to load thencdu
module, and then run it on your directory of choice:$ ml system ncdu $ ncdu $HOME
For very large directories, running
ncdu
in an interactive shell on a compute node is recommended, viash_dev
.You'll been there presented with an interactive file browser, showing information about the volume used by your directories, which should make easy to pinpoint where most space is used.
Info
Note that any tool you use to view directory contents will only be able to show files that your user account has read access to. So on group-shared spaces, if you see a major difference between the totals from a tool like ncdu
and the information reported by sh_quota
, that can be an indicator that one of your group members has restricted permissions on a large number of items in your space.
Where should I store my files?#
Not all filesystems are equivalent
Choosing the appropriate storage location for your files is an essential step towards making your utilization of the cluster the most efficient possible. It will make your own experience much smoother, yield better performance for your jobs and simulations, and contribute to make Sherlock a useful and well-functioning resource for everyone.
Here is where we recommend storing different types of files and data on Sherlock:
- personal scripts, configuration files and software installations →
$HOME
- group-shared scripts, software installations and medium-sized datasets →
$GROUP_HOME
- temporary output of jobs, large checkpoint files →
$SCRATCH
- curated output of job campaigns, large group-shared datasets, archives →
$OAK
Accessing filesystems#
On Sherlock#
Filesystem environment variables
To facilitate access and data management, user and group storage location on Sherlock are identified by a set of environment variables, such as $HOME
or $SCRATCH
.
We strongly recommend using those variables in your scripts rather than explicit paths, to facilitate transition to new systems for instance. By using those environment variables, you'll be sure that your scripts will continue to work even if the underlying filesystem paths change.
To see the contents of these variables, you can use the echo
command. For instance, to see the absolute path of your $SCRATCH directory:
$ echo $SCRATCH
/scratch/users/kilian
Or for instance, to move to your group-shared home directory:
$ cd $GROUP_HOME
From other systems#
External filesystems cannot be mounted on Sherlock
For a variety of security, manageability and technical considerations, we can't mount external filesystems nor data storage systems on Sherlock. The recommended approach is to make Sherlock's data available on external systems.
You can mount any of your Sherlock directories on any external system you have access to by using SSHFS. For more details, please refer to the Data Transfer page.
-
For more information about Oak, its characteristics and cost model, please see the Oak Service Description page. ↩