MariaDB
Introduction#
MariaDB is a community-developed fork of the MySQL relational database management system. It is completely compatible with MySQL and could be use as a drop-in replacement in the vast majority of cases.
More documentation#
The following documentation specifically intended for using MariaDB on Sherlock. For more complete documentation about MariaDB in general, please see the MariaDB documentation.
MariaDB on Sherlock#
We don't provide any centralized database service on Sherlock, but we provide a centralized installation of MariaDB, and each user is welcome to start their own instance of the database server to fit their jobs' needs.
The overall process to run an instance of MariaDB on Sherlock would look like this:
- configure and initialize your environment so you can start a database instance under your user account,
- start the database server,
- run SQL queries from the same node (via a local socket), or from other nodes and/or jobs (via the network).
Single-node access#
In that example, the database server and client will run within the same job, on the same compute node.
Preparation#
You first need to let MariaDB know where to store its database, where to log things, and how to allow connections from clients. The commands below only need to be executed once.
For this, you'll need to create a .my.cnf
file in your home directory. Assuming you'll want to store your database files in a db/
directory in your $SCRATCH
folder, you can run the following commands:
$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR
$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mariadb.sock
user=$USER
symbolic-links=0
skip-networking
[mysqld_safe]
log-error=$DB_DIR/mariadbd.log
pid-file=$DB_DIR/mariadbd.pid
[mysql]
socket=$DB_DIR/mariadb.sock
EOF
.my.cnf
doesn't support environment variables
Please note that if you edit your ~/.my.cnf
file directly in a file editor, without using the HEREDOC syntax above, environment variables such as $DB_DIR
, $HOME
or $USER
won't work: you will need to specify absolute paths explicitly, such as /scratch/users/kilian/db/mariadbd.log
.
If you use the HEREDOC syntax, you can verify that the resulting .my.cnf
file does actually contain full paths, and not environment variables anymore.
Once you have the .my.cnf
file in place, you need to initialize your database with some internal data that MariaDB needs. In the same terminal, run the following commands:
$ ml system mariadb
$ $MARIADB_DIR/scripts/mysql_install_db --basedir=$MARIADB_DIR --datadir=$DB_DIR
Start the server#
You can now start the MariaDB server. For this, first get an allocation on a compute node, note the hostname of the compute node your job has been allocated, load the mariadb
module, and then run the mysqld_safe
process:
$ srun --pty bash
$ echo $SLURM_JOB_NODELIST
sh-01-01
$ ml system mariadb
$ mysqld_safe
180705 18:14:27 mysqld_safe Logging to '/home/users/kilian/db/mysqld.log'.
180705 18:14:28 mysqld_safe Starting mysqld daemon with databases from /home/users/kilian/db/
The mysqld_safe
will be blocking, meaning it will not give the prompt back for as long as the MariaDB server runs.
If it does return on its own, it probably means that something went wrong, and you'll find more information about the issue in the $DB_DIR/mysqld.log
file you defined in ~/.my.cnf
.
Run queries#
You're now ready to run queries against that MariaDB instance, from the same node your job is running on.
From another terminal on Sherlock, connect to your job's compute node (here, it's sh-01-01
, as shown above), load the mariadb
module, and then run the mysql
command: it will open the MariaDB shell, ready to run your SQL queries:
$ ssh sh-01-01
$ ml system mariadb
$ mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.2.11-MariaDB Source distribution
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
Once you're done with your MariaDB instance, you can just terminate your job, and all the processes will be terminated automatically.
Multi-node access#
In case you need to run a more persistent instance of MariaDB, you can for instance submit a dedicated job to run the server, make it accessible over the network, and run queries from other jobs and/or nodes.
Enable network access#
The preparation steps are pretty similar to the single-node case, except the MariaDB server instance will be accessed over the network rather than through a local socket.
Network access must be secured
When running an networked instance of MariaDB, please keep in mind that any user on Sherlock will be able to connect to the TCP ports that mysqld
runs on, and that proper configuration must be done to prevent unauthrozied access.
Like in the single-node case, you need to create a ~/.my.cnf
file, but without the skip-networking
directive.
$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR
$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mariadb.sock
user=$USER
symbolic-links=0
[mysqld_safe]
log-error=$DB_DIR/mariadbd.log
pid-file=$DB_DIR/mariadbd.pid
[mysql]
socket=$DB_DIR/mariadb.sock
EOF
And then initiate the database:
$ ml system mariadb
$ $MARIADB_DIR/scripts/mysql_install_db --basedir=$MARIADB_DIR --datadir=$DB_DIR
Secure access#
We will now set a password for the MariaDB root
user to a random string, just for the purpose of preventing unauthorized access, since we won't need it for anything.
We will actually create a MariaDB user with all privileges on the databases, that will be able to connect to this instance from any node. This user will need a real password, though. So please make sure to replace the my-secure-password
string below by the actual password of your choice.
Choose a proper password
This password will only be used to access this specific instance of MariaDB. Note that anybody knowing that password will be allowed to connect to your MariaDB instances and modify data in the tables.
- do NOT literally use
my-secure-password
- do NOT use your SUNet ID password
Once you've chosen your password, you can start the mysqld
process on a compute node, like before:
$ srun --pty bash
$ echo $SLURM_JOB_NODELIST
sh-01-01
$ ml system mariadb
$ mysqld_safe
And then, from another terminal, run the following commands to secure access to your MariaDB database.
$ ssh sh-01-01
$ mysql -u root << EOF
UPDATE mysql.user SET Password=PASSWORD(RAND()) WHERE User='root';
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.db WHERE Db='test' OR Db='test_%';
GRANT ALL PRIVILEGES ON *.* TO '$USER'@'%' IDENTIFIED BY 'my-secure-password' WITH GRANT OPTION;
FLUSH PRIVILEGES;
EOF
Once you've done that, you're ready to terminate that interactive job, and start a dedicated MariaDB server job.
Start MariaDB in a job#
You can use the following mariadb.sbatch
job as a template:
#!/bin/bash
#SBATCH --job-name=mariadb
#SBATCH --time=8:0:0
#SBATCH --dependency=singleton
ml system mariadb
mysqld_safe
and submit it with:
$ sbatch mariadb.sbatch
Concurrent instances will lead to data corruption
An important thing to keep in mind is that having multiple instances of a MariaDB server running at the same time, using the same database files, will certainly lead to catastrophic situations and the corruption of those files.
To prevent this from happening, the --dependency=singleton
job submission option will make sure that only one instance of that job (based on its name and user) will run at any given time.
Connect to the running instance#
Now, from any node on Sherlock, whether from a login node, an interactive job, or a batch job, using the mysql
CLI or any application binding in any language, you should be able to connect to your running MariaDB instance,
First, identify the node your job is running on with squeue
:
$ squeue -u $USER -n mariadb
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
21383445 normal mariadb kilian R 0:07 1 sh-01-02
and then, point your MariaDB client to that node:
$ ml system mariadb
$ mysql -h sh-01-02 -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 15
Server version: 10.2.11-MariaDB Source distribution
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
That's it! You can now run SQL queries from anywhere on Sherlock to your own MariaDB instance.
Persistent DB instances#
SQL data is persistent
All the data you import in your SQL databases will be persistent across jobs. Meaning that you can run a PostgreSQL server job for the day, import data in its database, stop the job, and resubmit the same PostgreSQL server job the next day: all your data will still be there as long as the location you've chosen for your database (the $DB_DIR
defined in the Preparation steps) is on a persistent storage location.
If you need database access for more than the maximum runtime of a job, you can use the instructions provided to define self-resubmitting recurring jobs and submit long-running database instances.