Python

Introduction#

Python is an interpreted high-level programming language for general-purpose programming. Its design philosophy emphasizes code readability. It provides constructs that enable clear programming on both small and large scales, which makes it both easy to learn and very well-suited for rapid prototyping.

Python on Sherlock#

Sherlock features multiple versions of Python.

Some applications only work with legacy features of version 2.x, while more recent code will require specific version 3.x features. Modules on Sherlock may only be available in a single flavor (as denoted by their suffix: _py27 or _py36, because the application only supports one or the other.

You can load either version on Sherlock by doing the following commands:

$ ml python/2.7.13

or

$ ml python/3.6.1

The Python3 interpreter is python3

The Python3 executable is named python3, not python. So, once you have the "python/3.6.1" module loaded on Sherlock, you will need to use python3 to invoke the proper interpreter. python will still refer to the default, older system-level Python installation, and may result in errors when trying to run Python3 code.

This is an upstream decision detailed in PEP-394, not something specific to Sherlock.

Using Python#

Once your environment is configured (ie. when the Python module is loaded), Python can be started by simply typing python at the shell prompt:

$ python
Python 2.7.13 (default, Apr 27 2017, 14:19:21)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Python in batch jobs#

Python output is buffered by default

By default, Python buffers console output. It means that when running Python in a batch job through Slurm, you may see output less often than you would when running interactively.

When output is being buffered, the print statements are aggregated until there is a enough data to print, and then the messages are all printed at once. And as a consequence, job output files (as specified with the --output and --error job submission options) will be refreshed less often and may give the impression that the job is not running.

For debugging or checking that a Python script is producing the correct output, you may want to switch off buffering.

Switching off buffering#

For a single python script you can use the -u option, as in python -u my_script.py. The -u option stands for "unbuffered".

For instance:

#!/bin/bash
#SBATCH -n 1

python -u my_script.py

Tip

You can also use the environment variable PYTHONUNBUFFERED to set unbuffered I/O for your whole batch script.

#!/bin/bash
#SBATCH -n 1

export PYTHONUNBUFFERED=True
python my_script.py

NB: There is some performance penalty for having unbuffered print statements, so you may want to reduce the number of print statements, or run buffered for production runs.

Python packages#

The capabilities of Python can be extended with packages developed by third parties. In general, to simplify operations, it is left up to individual users and groups to install these third-party packages in their own directories. However, Sherlock provides tools to help you install the third-party packages that you need.

Among many others, the following common Python packages are provided on Sherlock:

NumPy
SciPy

Python modules on Sherlock generally follow the naming scheme below:

py-<package_name>/version_py<python_version>

For instance, NumPy modules are:

You can list all available module versions for a package with ml spider <package_name>. For instance:

$ ml spider tensorflow
-------------------------------------------------------------------------------
  py-tensorflow:
-------------------------------------------------------------------------------
    Description:
      TensorFlow™ is an open source software library for numerical computation using data flow graphs.

     Versions:
        py-tensorflow/1.6.0_py27
        py-tensorflow/1.6.0_py36
        py-tensorflow/1.7.0_py27
        py-tensorflow/1.9.0_py27
        py-tensorflow/1.9.0_py36

Dependencies are handled automatically

When you decide to use NumPy on Sherlock, you just need to load the py-numpy module of your choice, and the correct Python interpreter will be loaded automatically. No need to load a python module explicitly.

Installing packages#

If you need to use a Python package that is not already provided as a module on Sherlock, you can use the pip command. This command takes care of compiling and installing most of Python packages and their dependencies. All of pip's commands and options are explained in detail in the Pip user guide.

A comprehensive index of Python packages can be found at PyPI.

To install Python packages with pip, you'll need to use the --user option. This will make sure that those packages are installed in a user-writable location (by default, your $HOME directory). Since your $HOME directory is shared across nodes on Sherlock, you'll only need to install your Python packages once, and they'll be ready to be used on every single node in the cluster.

For example:

$ pip install --user <package_name>

For Python 3, use pip3:

$ pip3 install --user <package_name>

Python packages will be installed in $HOME/.local/lib/python<<version>/site-packages, meaning that packages for Python 2.x and Python 3.x will be kept separate. This both means that they won't interfere with each other, but also that if you need to use a package with both Python 2.x and 3.x, you'll need to install it twice, once for each Python version.

List installed packages#

You can easily see the list of the Python packages installed in your environment, and their location, with pip list:

$ pip list -v
Package    Version Location                                                            Installer
---------- ------- ------------------------------------------------------------------- ---------
pip        18.1    /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip
setuptools 28.8.0  /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip
urllib3    1.24    /home/users/kilian/.local/lib/python2.7/site-packages               pip
virtualenv 15.1.0  /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip

Alternative installation path#

Python paths

While theoretically possible, installing Python packages in alternate locations can be tricky, so we recommend trying to stick to the pip install --user way as often as possible. But in case you absolutely need it, we provide some guidelines below.

One common case of needing to install Python packages in alternate locations is to share those packages with a group of users. Here's an example that will show how to install the urllib3 Python package in a group-shared location and let users from the group use it without having to install it themselves.

First, you need to create a directory to store those packages. We'll put it in $GROUP_HOME:

$ mkdir -p $GROUP_HOME/python/

Then, we load the Python module we need, and we instruct pip to install its packages in the directory we just created:

$ ml python/2.7.13
$ PYTHONUSERBASE=$GROUP_HOME/python pip install --user urllib3

We still use the --user option, but with PYTHONUSERBASE pointing to a different directory, pip will install packages there.

Now, to be able to use that Python module, since it's not been installed in a default directory, you (and all the members of the group who will want to use that module) need to set their PYTHONPATH to include our new shared directory¹:

$ export PYTHONPATH=$GROUP_HOME/python/lib/python2.7/site-packages:$PYTHONPATH

And now, the module should be visible:

$ pip list -v
Package    Version Location                                                            Installer
---------- ------- ------------------------------------------------------------------- ---------
pip        18.1    /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip
setuptools 28.8.0  /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip
urllib3    1.24    /home/groups/ruthm/python/lib/python2.7/site-packages               pip
virtualenv 15.1.0  /share/software/user/open/python/2.7.13/lib/python2.7/site-packages pip

$PYTHONPATH depends on the Python version

The $PYTHONPATH environment variable is dependent on the Python version you're using, so for Python 3.6, it should include $GROUP_HOME/python/lib/python3.6/site-packages

$PATH may also need to be updated

Some Python package sometimes also install executable scripts. To make them easily accessible in your environment, you may also want to modify your $PATH to include their installation directory.

For instance, if you installed Python packages in $GROUP_HOME/python:

$ export PATH=$GROUP_HOME/python/bin:$PATH

Installing from GitHub#

pip also supports installing packages from a variety of sources, including GitHub repositories.

For instance, to install HTTPie, you can do:

$ pip install --user git+git://github.com/jkbr/httpie.git

Installing from a requirements file#

pip allows installing a list of packages listed in a file, which can be pretty convenient to install several dependencies at once.

In order to do this, create a text file called requirements.txt and place each package you would like to install on its own line:

requirements.txt

numpy
scikit-learn
keras
tensorflow

You can now install your modules like so:

$ ml python
$ pip install --user -r requirements.txt

Upgrading packages#

pip can update already installed packages with the following command:

$ pip install --user --upgrade <package_name>

Upgrading packages also works with requirements.txt files:

$ pip install --user --upgrade -r requirements.txt

Uninstalling packages#

To uninstall a Python package, you can use the pip uninstall command (note that it doesn't take any --user option):

$ pip uninstall <package_name>
$ pip uninstall -r requirements.txt

Virtual environments#

Work in progress

This page is a work in progress and is not complete yet. We are actively working on adding more content and information.

This line can also be added to a user's ~/.profile file, for a more permanent setting. ↩