PostgreSQL is a powerful, open source object-relational database system with a strong focus on reliability, feature robustness, and performance.
The following documentation specifically intended for using PostgreSQL on Sherlock. For more complete documentation about PostgreSQL in general, please see the PostgreSQL documentation.
PostgreSQL on Sherlock#
We don't provide any centralized database service on Sherlock, but we provide a centralized installation of PostgreSQL, and each user is welcome to start their own instance of the database server to fit their jobs' needs.
The overall process to run an instance of PostgreSQL on Sherlock would look like this:
- configure and initialize your environment so you can start a database instance under your user account,
- start the database server,
- run SQL queries from the same node (via a local socket), or from other nodes and/or jobs (via the network).
In that example, the database server and client will run within the same job, on the same compute node.
You first need to let PostgreSQL know where to store its database. The commands below only need to be executed once.
Assuming you'll want to store your database files in a
db/ directory in your
$SCRATCH folder, you can run the following commands:
$ export DB_DIR=$SCRATCH/db $ mkdir $DB_DIR
Once you have your
$DB_DIR in place, you need to initialize your database with some internal data that PostgreSQL needs. In the same terminal, run the following commands:
$ ml system postgresql $ initdb $DB_DIR
Start the server#
You can now start the PostgreSQL server. For this, first get an allocation on a compute node, note the hostname of the compute node your job has been allocated, load the
postgresql module, and then run the
$ srun --pty bash $ echo $SLURM_JOB_NODELIST sh-01-01 $ ml system postgresql $ export DB_DIR=$SCRATCH/db $ postgres -D $DB_DIR [...] 2018-10-09 17:42:08.094 PDT  LOG: database system is ready to accept connections
postgres process will be blocking, meaning it will not give the prompt back for as long as the PostgreSQL server runs.
You're now ready to run queries against that PostgreSQL instance, from the same node your job is running on.
From another terminal on Sherlock, connect to your job's compute node (here, it's
sh-01-01, as shown above), load the
postgresql module, and then run the
createdb command: it will create a database that you can use as a testbed:
$ ssh sh-01-01 $ ml system postgresql $ createdb test_db
Once this is done, from the same shell, you can run the
psql command, which will open the PostgreSQL shell, ready to run your SQL queries:
$ psql test_db psql (10.5) Type "help" for help. test_db=#
Once you're done with your PostgreSQL instance, you can just terminate your job, and all the processes will be terminated automatically.
In case you need to run a more persistent instance of PostgreSQL, you can for instance submit a dedicated job to run the server, make it accessible over the network, and run queries from other jobs and/or nodes.
Enable network access#
The preparation steps are pretty similar to the single-node case, except the PostgreSQL server instance will be accessed over the network rather than through a local socket.
Network access must be secured
When running an networked instance of PostgreSQL, please keep in mind that any user on Sherlock could potentially be able to connect to the TCP ports that
postgres runs on, and that proper configuration must be done to prevent unauthrozied access.
Like in the single-node case, you need to start the
postgres server process, but with the
-i option to enable network connections, and define user access in your
$DB_DIR/pg_hba.conf file (see below).
To allow network connections to the database server, a password will need to be defined for the PostgreSQL user. That will allow this user to connect to the PostgreSQL instance from any node. Please make sure to replace the
my-secure-password string below by the actual password of your choice.
Choose a proper password
This password will only be used to access this specific instance of PostgreSQL. Note that anybody knowing that password will be allowed to connect to your PostgreSQL instances and modify data in the tables.
- do NOT use
- do NOT use your SUNet ID password
Once you've chosen your password, you can now start the PostgreSQL server on a compute, as described in the previous section, initialize the database, and set the user password:
$ srun --pty bash $ echo $SLURM_JOB_NODELIST sh-01-01 $ export DB_DIR=$SCRATCH/db $ mkdir $DB_DIR $ ml system postgresql $ initdb $DB_DIR $ createdb test_db $ psql -c "ALTER USER $USER PASSWORD 'my-secure-password';" test_db
Then, we need to edit the
$DB_DIR/ph_hba.conf file to allow network access for user
$ cat << EOF > $DB_DIR/pg_hba.conf local all all trust host all all 127.0.0.1/32 trust host all all ::1/128 trust host all $USER samenet md5 EOF
Once you've done that, you're ready to terminate that interactive job, and start a dedicated PostgreSQL server job.
$ pg_ctl stop -D $DB_DIR $ logout
Start PostgreSQL in a job#
You can use the following
postgresql.sbatch job as a template:
#!/bin/bash #SBATCH --job-name=postgresql #SBATCH --time=8:0:0 #SBATCH --dependency=singleton export DB_DIR=$SCRATCH/db ml system postgresql postgres -i -D $DB_DIR
and submit it with:
$ sbatch postgresql.sbatch
Concurrent instances will lead to data corruption
An important thing to keep in mind is that having multiple instances of a PostgreSQL server running at the same time, using the same database files, will certainly lead to catastrophic situations and the corruption of those files.
To prevent this from happening, the
--dependency=singleton job submission option will make sure that only one instance of that job (based on its name and user) will run at any given time.
Connect to the running instance#
Now, from any node on Sherlock, whether from a login node, an interactive job, or a batch job, using the
mysql CLI or any application binding in any language, you should be able to connect to your running PostgreSQL instance,
First, identify the node your job is running on with
$ squeue -u $USER -n postgresql JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21383445 normal postgresql kilian R 0:07 1 sh-01-02
and then, point your PostgreSQL client to that node:
$ ml system postgresql $ mpsql -h sh-06-34 test_db Password: psql (10.5) Type "help" for help. test_db=#
That's it! You can now run SQL queries from anywhere on Sherlock to your own PostgreSQL instance.
Persistent DB instances#
SQL data is persistent
All the data you import in your SQL databases will be persistent across jobs. Meaning that you can run a PostgreSQL server job for the day, import data in its database, stop the job, and resubmit the same PostgreSQL server job the next day: all your data will still be there as long as the location you've chosen for your database (the
$DB_DIR defined in the Preparation steps) is on a persistent storage location.
If you need database access for more than the maximum runtime of a job, you can use the instructions provided to define self-resubmitting recurring jobs and submit long-running database instances.