Filesystems

The following sections describe the characteristics and best uses of each of the Sherlock's filesystems.

`$HOME`#

Summary

$HOME is your home directory. It's the best place to keep your code and important data as it provides snapshots and off-site replication. It is not meant to host data that will be actively read and written to by compute jobs.

Characteristics
Type	high speed, distributed NFS file system
Quota	15 GB for the whole `$HOME` directory
Snapshots	yes (cf. Snapshots) for more info)
Backups	off-site replication
Purge policy	not purged
Scope	all login and compute nodes

Recommended usage#

$HOME is best suited for personal configuration files, scripts, small reference files or datasets, source code and individual software installation

When you log in, the system automatically sets the current working directory to $HOME: it's the location you'll end up when connecting to Sherlock. You can store your source code and build your executables there.

We strongly recommend using $HOME to reference your home directory in scripts, rather than its explicit path.

Checking quota usage#

The sh_quota tool can be used to display quota usage on $HOME

$ sh_quota -f HOME

See the Checking Quotas section for more details.

`$GROUP_HOME`#

Summary

$GROUP_HOME is your group home directory. It's the best place to keep your group's shared code, software installations and important data as it provides snapshots and off-site replication. It is not meant to host data that will be actively read and written to by compute jobs.

$HOME and $GROUP_HOME are based on the same physical file system.

Characteristics
Type	high speed, distributed NFS file system
Quota	1 TB for the whole `$GROUP_HOME` directory
Snapshots	yes (cf. Snapshots) for more info)
Backups	off-site replication
Purge policy	not purged
Scope	all login and compute nodes

Recommended usage#

$GROUP_HOME is best suited for group shared source code, common software installations, shared data sets and scripts.

We strongly recommend using $GROUP_HOME to reference your group home directory in scripts, rather than its explicit path.

Checking quota usage#

The sh_quota tool can be used to display quota usage on $GROUP_HOME

$ sh_quota -f GROUP_HOME

See the Checking Quotas section for more details.

`$SCRATCH`#

Summary

$SCRATCH is your personal scratch space. It's the best place to store temporary files, such as raw job output, intermediate files, unprocessed results, and so on.

Purge policy

Files are automatically purged from $SCRATCH after an inactivity period:

files that are not modified after 90 days are automatically deleted,
contents need to change for a file to be considered modified. The touch command does not modify file contents and thus does not extend a file's lifetime on the filesystem.

$SCRATCH is not meant to store permanent data, and should only be used for data associated with currently running jobs. It's not a target for backups, archived data, etc. See the Expiration Policy section for details.

Characteristics
Type	Parallel, high-performance Lustre file system
Quota	100 TB / 20,000,000 inodes²
Snapshots	NO
Backups	NO
Purge policy	data not modified in the last 90 days are automatically purged
Scope	all login and compute nodes

Recommended usage#

$SCRATCH is best suited for large files, such as raw job output, intermediate job files, unprocessed simulation results, and so on. This is the recommended location to run jobs from, and to store files that will be read or written to during job execution.

Old files are automatically purged on $SCRATCH so users should avoid storing long-term data there.

Each compute node has a low latency, high-bandwidth Infiniband link to $SCRATCH. The aggregate bandwidth of the filesystem is about 75GB/s. So any job with high data performance requirements will take advantage from using $SCRATCH for I/O.

We strongly recommend using $SCRATCH to reference your scratch directory in scripts, rather than its explicit path.

Checking quota usage#

The sh_quota tool can be used to display quota usage on $SCRATCH

$ sh_quota -f SCRATCH

See the Checking Quotas section for more details.

Expiration policy#

Inactive files are automatically purged

Files that are not modified in the last 90 days will be automatically deleted from the filesystem.

To manage available space and maintain optimal performance for all jobs, all files on $SCRATCH are subject to automatic purges. Meaning that after a period of inactivity, files that are not used anymore will be automatically deleted from the filesystem.

File activity is defined based on the last time a file's contents (the actual data in the file) have been modified. Meaning that files whose contents have not been modified in the previous 90 days will be automatically deleted.

Each time a file's contents are modified, the expiration countdown is reset, and the file gets another 90-day of lifetime.

Metadata changes don't qualify as an update

Modifying a file's contents is the only way to reset the expiration countdown and extend the file's lifetime on the filesystem.

Metadata modifications such as: reading the file, renaming it, moving it to a different directory, changing its permissions or its ownership, "touching" it to update its last modification or access times, won't have any effect on the purge countdown.

Purges are based on an internal filesystem property that reflects the last date a file's data has been modified, and which is unfortunately not readily accessible by users.

Please note that tools like ls will only display the date of the last metadata¹ modification for a file, which is not necessarily relevant to determine a file's eligibility for deletion. For instance, using the touch command on a file to update its last modification date will only update the metadata, not the data, and as such, will not reset the purge countdown timer.

Filesystem purges are a continuous process: they don't run at particular times, but are carried out in a permanent background fashion. Files are not necessarily deleted right away when they become eligible for deletion. For instance, if you create a file on February 1^st and don't ever modify it afterwards, it will be automatically become eligible for deletion on May 1^st, and can be deleted anytime after this date.

Empty directory trees that stay devoid of any file for more than 90 days will be automatically cleaned up as well.

`$GROUP_SCRATCH`#

$SCRATCH and $GROUP_SCRATCH are based on the same physical file system.

Summary

$GROUP_SCRATCH is your group shared scratch space. It's the best place to store temporary files, such as raw job output, intermediate files, or unprocessed results that need to be shared among users within a group.

$GROUP_SCRATCH is NOT a backup target

$GROUP_SCRATCH is not meant to store permanent data, and should only be used for data associated with currently running jobs. It's not a target for backups, archived data, etc.

Characteristics
Type	parallel, high-performance Lustre file system
Quota	100 TB / 20,000,000 inodes²
Snapshots	NO
Backups	NO
Purge policy	data not accessed in the last 90 days are automatically purged
Scope	all login and compute nodes

Recommended usage#

$GROUP_SCRATCH is best suited for large files, such as raw job output, intermediate job files, unprocessed simulation results, and so on. This is the recommended location to run jobs from, and to store files that will be read or written to during job execution.

Old files are automatically purged on $GROUP_SCRATCH so users should avoid storing long-term data there.

We strongly recommend using $GROUP_SCRATCH to reference your group scratch directory in scripts, rather than its explicit path.

Checking quota usage#

The sh_quota tool can be used to display quota usage on $GROUP_SCRATCH

$ sh_quota -f GROUP_SCRATCH

See the Checking Quotas section for more details.

Expiration policy#

As $SCRATCH and $GROUP_SCRATCH are on the same filesystem, the same expiration policy applies to both. Please see the $SCRATCH section above for more details.

`$L_SCRATCH`#

Summary

$L_SCRATCH is local to each compute node, and could be used to store temporary files for jobs with high IOPS requirements. Files stored in $L_SCRATCH are purged at the end of the job.

Characteristics
Type	local filesystem, specific to each node, based on SSD
Quota	n/a (usable space limited by the size of the physical storage devices, typically around 150 GB)
Snapshots	NO
Backups	NO
Purge policy	data immediately purged at the end of the job
Scope	locally on each node, not shared across nodes

Recommended usage#

$L_SCRATCH is best suited for small temporary files and applications which require low latency and high IOPS levels, typically intermediate job files, checkpoints, dumps of temporary states, etc.

Files stored in $L_SCRATCH are local to each node and can't be accessed from other nodes, nor from login nodes.

Please note that an additional, job-specific environment variable, $L_SCRATCH_JOB, will be set to a subdirectory of $L_SCRATCH for each job. So, if you have two jobs running on the same compute node, $L_SCRATCH will be the same and accessible from both jobs, while $L_SCRATCH_JOB will be different for each job.

For instance, if you have jobs 98423 and 98672 running on this same nodes, the variables will be set as follows:

Job id	`$L_SCRATCH`	`L_SCRATCH_JOB`
`98423`	`/lscratch/kilian`	`/lscratch/kilian/98423`
`98672`	`/lscratch/kilian`	`/lscratch/kilian/98672`

We strongly recommend using $L_SCRATCH to reference your local scratch directory in scripts, rather than its full path.

Expiration policy#

All files stored in $L_SCRATCH_JOB are automatically purged at the end of the job, whether the job was successful or not. If you need to conserve files that were generated in $L_SCRATCH_JOB after the job ends, don't forget to add a command at the end of your batch script to copy them to one of the more persistent storage locations, such as $HOME or $SCRATCH.

Data stored in $L_SCRATCH will be purged at the end of a job, only if no other job from the same user is still running on the node. Which means that data stored in $L_SCRATCH (but in not $L_SCRATCH_JOB) will persist on the node until the last job from the user terminates.

`$OAK`#

Summary

$OAK is Stanford Research Computing's research data storage offering. It provides an affordable, longer-term storage option for labs and researchers, and is ideally suited to host large datasets, or curated, post-processed results from job campaigns, as well as final results used for publication.

Order $OAK

Oak storage can be easily ordered online using the Oak Storage Service page.

$OAK is opt-in and is available as an option on Sherlock. Meaning that only members of groups which have purchased storage on Oak can access this filesystem.

For complete details and characteristics, including pricing, please refer to the Oak Storage Service page.

Characteristics
Type	parallel, capacitive Lustre filesystem
Quota	amount purchased (in 10 TB increments)
Snapshots	NO
Backups	optional cloud backup available please contact us for details
Purge policy	not purged
Scope	all login and compute nodes also available through gateways outside of Sherlock

Recommended usage#

$OAK is ideally suited for large shared datasets, archival data and curated, post-processed results from job campaigns, as well as final results used for publication.

Although jobs can directly read and write to $OAK during execution, it is recommended to first stage files from $OAK to $SCRATCH at the beginning of a series of jobs, and save the desired results back from $SCRATCH to $OAK at the end of the job campaign.

We strongly recommend using $OAK to reference your group home directory in scripts, rather than its explicit path.

$OAK is not backed up

$OAK is not backed up or replicated, by design, and deleted files cannot be recovered. We recommend all researchers to keep an additional copy of their important files (for instance, in Google Drive).

Cloud backup option

For additional data security, Stanford Research Computing now offers "cloud backup" of Oak data as a managed service option. For an additional monthly fee, data on Oak can be backed up to the cloud (researchers are responsible for cloud storage costs). Please contact us if you'd like additional information.

Checking quota usage#

The sh_quota tool can be used to display quota usage on $OAK

$ sh_quota -f OAK

See the Checking Quotas section for more details.

Metadata are data such as a file's size, name, path, owner, permissions, etc. ↩
An inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. ↩↩

Filesystems

$HOME#

Recommended usage#

Checking quota usage#

$GROUP_HOME#

Recommended usage#

Checking quota usage#

$SCRATCH#

Recommended usage#

Checking quota usage#

Expiration policy#

$GROUP_SCRATCH#

Recommended usage#

Checking quota usage#

Expiration policy#

$L_SCRATCH#

Recommended usage#

Expiration policy#

$OAK#

Recommended usage#

Checking quota usage#

`$HOME`#

`$GROUP_HOME`#

`$SCRATCH`#

`$GROUP_SCRATCH`#

`$L_SCRATCH`#

`$OAK`#