Sherlock 2.0 transition guide#
This transition guide is intended for users already familiar with Sherlock, to help them moving to Sherlock 2.0: differences with the existing, temporary conditions, etc.
Sherlock 2.0 orders
If you are looking to buy into Sherlock 2.0, please see the Sherlock Order page (SUNet ID required).
Why Sherlock 2.0?#
Sherlock started in 2014 with 120 general use compute nodes and their associated networking (high-performance interconnect) and storage (Lustre scratch filesystem) infrastructure. Those nodes have been consistently heavily used. An additional 730+ nodes were purchased by a wide variety of Stanford and SLAC researchers and added to the initial base.
However, the Infiniband fabric (the supporting network that interconnects all of the nodes and storage) eventually reached capacity, effectively putting a halt to Sherlock's growth. With the help of new recurring funding from the University, a new, completely separate cluster has been kick-started, with its own Infiniband fabric, new management servers, and a set of more capable compute nodes. These new nodes, along with an updated software environment, forms the basis of Sherlock 2.0.
For the sake of simplicity, we'll refer to the new environment as Sherlock 2.0, and to the existing one as Sherlock 1.0.
For the whole duration of the transition process, storage will be shared between Sherlock 1.0 and Sherlock 2.0 and will be accessible from both, meaning that users will be able to access and work on the the same files whatever cluster they connect to.
To allow every user to smoothly transition to the new system, a 3-phase process will take place.
- Phase 1: coexistence
For the duration of the transition period, both Sherlock 1.0 and Sherlock 2.0 will coexist. As two separate clusters, they will use distinct login nodes and provide access to different hardware and software.
Two isolated clusters
Specific compute nodes will only be accessible from one cluster at a time: the new compute nodes from Sherlock 2.0, and the old nodes from Sherlock 1.0. It also means that jobs submitted on Sherlock 1.0 will not visible from Sherlock 2.0, and vice-versa.
- Phase 2: merge
After sufficient time for the majority of users to validate and migrate to the new environment, the original Sherlock nodes will be merged into the new cluster. We'll work with individual owners to determine the best time frame for their own nodes migration, and will announce a date for the migration of all the general nodes from the public partitions.
- Phase 3: retirement
As they age out, compute nodes older than 5 years will be retired, to comply with the data center usage guidelines. When all the remaining Sherlock 1.0 nodes have been merged into Sherlock 2.0, the Sherlock 1.0 and 2.0 distinction will disappear, and Sherlock will be a single system again.
The Sherlock 1.0 to Sherlock 2.0 transition will happen according to the following timeline:
- 05/08/2018: no new users accounts will be created on Sherlock 1.0
- 05/14/2018: the
bigmempartitions are migrated to Sherlock 2.0
- 06/25/2018: the
normalpartition is migrated to Sherlock 2.0
- 07/02/2018: the login and DTN nodes are migrated to Sherlock 2.0
This will effectively mark the end of the Sherlock 1.0 environment.
Quite a lot!
Most notable upgrades
Additional cores per CPU, increased Infiniband throughput, twice the amount of memory per node and higher memory/core ratio, increased power redundancy, a new OS and software stack, and mandatory 2FA.
As part of our documentation and communication effort, we're announcing three new web resources for Sherlock:
- a changelog for news and updates, to get a better sense of what's happening on Sherlock, beyond general email announces,
- a status dashboard, where you'll find up-to-date information about the status of the main Sherlock components and services. You could also subscribe to receive notification in case of events or outages,
- and of course the very documentation you're reading right now.
The most notable changes are visible right from the connection process.
|Main changes||Sherlock 1.0||Sherlock 2.0|
|Authentication scheme||GSSAPI (Kerberos)||SUNet ID password or GSSAPI|
- New login hostname
Sherlock 2.0 uses a new set of login nodes, under a different load-balanced hostname. To connect, you'll need to use:(note the
$ ssh <sunet>@login.sherlock.stanford.edu
loginpart of the hostname)
New server fingerprint
When you connect to the new login nodes for the first time, you'll be presented with the new servers' fingerprint (host keys). Please see the Connecting page for more details.
- New authentication scheme
GSSAPI (Kerberos) is not required anymore. To make up for the next item, we relaxed the requirement on Kerberos authentication for Sherlock 2.0. Meaning that you will now be able to connect using your SUNet ID password, without having to get a Kerberos ticket via
kinitfirst. We hope this will also ease the connection process from clients and OS which don't correctly support GSSAPI. And please note that if you still want to use Kerberos for authentication, you can.
- Two-step authentication
Two-step (or two-factor) authentication (2FA) this is now a requirement in the Minimum Security Standards mandated by the Information Security Office, and that we're implementing on Sherlock. Upon connection, you will be prompted for your password (if you're not using GSSAPI) and a two-factor prompt like this:Upon validation of the passcode, you'll be connected to Sherlock's login nodes as usual.
Duo two-factor login for johndoe Enter a passcode or select one of the following options: 1. Duo Push to XXX-XXX-4425 2. Phone call to XXX-XXX-4425 3. SMS passcodes to XXX-XXX-4425 (next code starts with: 1) Passcode or option (1-3):
For more details about the login procedure, see the Connecting page.
A number of changes differentiate Sherlock 2.0 from the first generation: technology has evolved and those changes are reflected in the new hardware specifications. Here are the main changes on typical compute nodes:
|Main changes||Sherlock 1.0||Sherlock 2.0|
|Nodes||1U Dell R630 servers||0.5U C6320 servers in a 2U chassis|
|CPUs||2x 8-core (Ivy-Bridge/Haswell)||2x 10-core (Broadwell)|
|Memory||64G (base node)||128G (base node)|
|Local storage||200G SSD (+ 500G HDD)||200G SSD|
|Interconnect||56G FDR Infiniband||100G EDR Infiniband|
Base node model moves from the Dell R630 1U server to the Dell C6320 system. More servers will fit in a single rack, which is essential for us to accommodate the ever-increasing demand for servers. Another change is that the new model provides redundant power supplies, in case one fails. Each C6320 chassis houses four independent servers in only two rack units of space.
Upgrade from Intel® Xeon® E5-2640v3 (8-core, 2.6GHz) to E5-2640v4 (10-core, 2.4GHz), meaning that each CPU gets two additional cores, and that to maintain a 90W power envelope, the clock speed changed from 2.6GHz to 2.4GHz.
Because of the 2 additional cores per CPU, a total for 4 more per node, having a 64GB minimum RAM requirement was no longer deemed to be a good fit. Originally, 64GB for 16 cores gave a ratio of 4GB/core. With the same 64GB per node, the memory per core ratio on Sherlock 2.0 would have dropped to 3.2GB/core. Usage statistics showed that more RAM per core has been needed for a wide range of applications. So for Sherlock 2.0, the minimum memory amount per node has be set to 128GB, or 6.4GB/core, a much more appropriate ratio for the typical user needs.
- Local storage
The slightly smaller form factor of each node has prompted elimination of the 500G spinning hard drive, keeping the 200G SSD. Sherlock statistics showed that the hard drive was rarely used, as users favored use of the very fast Lustre /scratch file system available to all nodes.
Substantial network improvement by way of an almost 2x increase in Infiniband (IB) speed. Sherlock 1.0 uses the FDR IB speed of 56Gbps, while Sherlock 2.0 uses EDR IB at 100Gbps.
The same organization of compute nodes will be used on Sherlock 2.0, with the same partition nomenclature, although different hardware characteristics:
|Main changes||Sherlock 1.0||Sherlock 2.0|
||116x 16c/64GB nodes||56x 20c/128GB nodes|
||2x 16c/64GB nodes||2x 20c/128GB nodes|
||2x 32c/1.5TB nodes||1x 56c/3.0TB node
1x 32c/512GB node
||6x 16c/256GB nodes
w/ 8x K20/K80/TITAN Black
|5x 20c/256GB nodes
w/ 4x P40/P100_PCIE/V100_NVLINK
After the first coexistence phase of the transition,
Sherlock 1.0 nodes will be merged in Sherlock 2.0 partitions, and the resulting
general partitions will be the combination of both. For instance, the
partition will feature 172 nodes.
Job submission parameters#
QOS and limits are slightly different too, because of the different characteristics of Sherlock 2.0, and the new version of Slurm (cf. below).
The main change will be that your jobs won't generally need to specify any QOS,
except for the long one. or in other words,
--qos=long is the only QOS
option you will need to specify in your jobs. The
gpu QOSes are
not used anymore for job submission, and will lead to errors if used. The goal
is to simplify job submission for new users and limit the number of parameters
to specify when submitting jobs.
Sherlock 2.0 runs a newer version of the Linux Operating System (OS) and as a result, most the software available on Sherlock 1.0 needs to be recompiled and reinstalled to work optimally in the new environment.
Sherlock 2.0 runs an updated software stack, from kernel to user-space applications, including management software. Most user-facing applications are new version of the same software, while the management stack has been completely overhauled and redesigned from scratch.
|Main changes||Sherlock 1.0||Sherlock 2.0|
|OS||CentOS 6.x||CentOS 7.x|
|Scheduler||Slurm 16.05.x||Slurm 17.02.x|
What stays the same?#
We're well aware that transitioning to a new system takes time and effort away from research, and we wanted to minimize disruption. So we've focused our attention on revamping and improving the required elements, but we didn't change it all: we kept the best parts as they are.
All the storage systems (including
$OAK) are shared between Sherlock 1.0 and Sherlock 2.0. So all the
files and folders available on Sherlock 1.0 will be available on Sherlock 2.0.
Be advised that the paths of the
/share/PI directories have
changed on Sherlock 2.0. They are now respectively
We strongly recommend referring to them through the
environment variables instead of using the full paths.
To transfer files, you can continue to use the existing data-transfer node (DTN): since all the filesystems are shared between the two clusters, files uploaded through the DTN will be immediately available on Sherlock 2.0.
For more information, see the Data transfer page.
In terms of scheduling, the same principles users are already familiar with on Sherlock 1.0 are conserved:
- a set of general partitions, available to all users:
- owner-specific partitions, regrouping nodes bought by PI groups,
- a global
ownerspartition, to allow other owners to use idle nodes,
- same backfilling and fair share scheduling rules,
- similar limits on most partitions,
Owners' partition on Sherlock 2.0
While the two clusters operate as separate entities (phase 1 of the
transition process), owners on Sherlock 1.0 and
Sherlock 2.0 will be considered two distinct groups. Meaning that Sherlock
1.0 owners won't be able to use the
owners partition on Sherlock 2.0, and
As we move through Phase 2, and as Sherlock 1.0
owners nodes are folded into Sherlock 2.0, Sherlock 1.0 owners will gain
access to the
owners partition on Sherlock 2.0.
For more details about the scheduler, see the Running Jobs page.
Part of the current Sherlock user software stack has been ported over to Sherlock 2.0, with a refreshed but similar modules system.
That software stack on Sherlock is an ongoing effort, and we'll be continuing to add new applications over the coming days and weeks.
How to differentiate clusters?#
Determine the Sherlock version
To check which cluster environment you're running on, check the value of
$SHERLOCK environment variable.
To ease transition to the new software environment, and to make re-using the
same scripts more convenient, we provide a way to check which cluster your
jobs are running on. By checking the value of the
variable, you'll be able to determine which environment your scripts should be
|Command||Output on Sherlock 1.0||Output on Sherlock 2.0|
So, for instance, the following batch script will check that environment variable to determine where it's running, and load the appropriate Python module:
#!/bin/bash # #SBATCH --job-name=test #SBATCH --time 10:00 # if [[ "$SHERLOCK" == "1" ]]; then echo "Running on Sherlock 1.0" module load python/2.7.5 elif [[ "$SHERLOCK" == "2" ]]; then echo "Running on Sherlock 2.0" module load python/2.7.13 else echo "Uh-oh, not sure where we are..." exit 1 fi python -c "print 'Hello Sherlock'"
That script could be submitted with
sbatch on either cluster, and will
produce the exact same output.
$SHERLOCK environment variable will remain available throughout all the
transition phases, and will stay defined even after the
two clusters have been merged.
Feedback and support#
Your feedback is very important to us, especially in the early stages of a new system. So please tell us about any issues you run into and successes you have. Feel free to contact us by email at firstname.lastname@example.org.