Common datasets
To help researchers save time on downloads, Stanford Research Computing hosts databases and models for commonly used software in $COMMON_DATASETS. This is a read-only storage space that is accessible to all Sherlock users.
Available datasets#
The following datasets are currently available in $COMMON_DATASETS:
| Dataset | Path | Description |
|---|---|---|
| AlphaFold 3 | $COMMON_DATASETS/alphafold3 | Genetic sequence and structural template databases for AlphaFold 3. See our AlphaFold documentation for instructions on running it on Sherlock. |
| NCBI BLAST databases | $COMMON_DATASETS/blast | Sequence databases for use with NCBI BLAST and related tools. Copy the databases you need to $SCRATCH, then set BLASTDB=$SCRATCH/blast before running BLAST. |
| Ollama models | $COMMON_DATASETS/ollama | Pre-downloaded LLM models for use with Ollama. Automatically integrated with the ollama module (no manual setup needed). See our Ollama documentation for more details. |
To see the full and up-to-date list of available datasets, run:
Optimizing performance#
For faster run times and optimal performance, you should NOT run jobs against $COMMON_DATASETS directly. Instead, copy your desired dataset to $SCRATCH or $GROUP_SCRATCH, and then reference that copy in your jobs.
Maintaining local copies#
Tools such as rsync or dsync can be used to restore files that may have been deleted from $SCRATCH due to the 90-day purge policy.
For example, to synchronize the AlphaFold 3 databases to $SCRATCH: