Thursday a week ago I gave a brief introductory talk in our Heidelberg Chaostreff about the Bohrium project. Especially after the HPC day at the Niels Bohr Institute during my recent visit to Copenhagen, I became rather enthusiastic about Bohrium and wanted to pass on some of my experiences.

The main idea of Bohrium is to build a fully numpy-compatible
framework for **high-performance computing**,
which can automatically parallelise numpy array operations and/or
execute them on a general-purpose graphics cards.
The hope is that this eradicates the step of rewriting a prototypical python
implementation of a scientific model in more low-level languages like C++ or
CUDA before dealing with the actual
real-world problems in mind.

In practice Bohrium achieves this by translating the python code (via some
intermediate steps) into small pieces of C or CUDA code.
These are then automatically compiled at runtime of the script,
taking into account the current hardware setup, and afterwards executed.
The results of such a **just-in-time** compiled **kernel** are again
available in numpy-like
arrays and can be passed to other scripts for post-processing,
e.g. plotting in matplotlib.

It is important to note, that the effect of Bohrium is limited to
array operations. So for example the usual Python `for`

loops
are not touched.
This is, however, hardly a problem if the practice of
so-called **array programming**
is followed.
In array programming one avoids plain `for`

-loops and similar traditional
python language elements in preference for special
syntax which works on blocks (numpy *arrays*) of data at once.
Examples of such operations is pretty much the typical
numpy workflow:

- views and slices:
`array[1:3]`

- broadcasting:
`array[:, np.newaxis]`

- elementwise operations:
`array1 * array2`

- reduction:
`np.sum(array1, axis=1)`

A slightly bigger drawback of Bohrium is, that the just-in-time compilation takes time, where no results are produced. In other words Bohrium does only start to pay of at larger problem sizes or if exactly the same sequence of instructions is to be executed many times.

In my c¼h I demonstrate Bohrium by the means of this example script

```
#!/usr/bin/env python3
import numpy as np
import sys
import time
def moment(n, a):
avg = np.sum(a) / a.size
return np.sum((a - avg)**n) / a.size
def compute(a):
start = time.time()
mean = np.sum(a) / a.size
var = moment(2, a)
m3 = moment(3, a)
m4 = moment(4, a)
end = time.time()
fmt = "After {0:8.3f}s: {1:8.3f} {2:8.3f} {3:8.3f} {4:8.3f}"
print(fmt.format(end - start, float(mean), float(var),
float(m3), float(m4)))
def main(size, repeat=6):
for i in range(repeat):
compute(np.random.rand(size))
if __name__ == "__main__":
size = 30
if len(sys.argv) >= 2:
size = int(sys.argv[1])
main(size)
```

which is also available for download.
The script performs a very simple analysis of the (randomly generated)
input data: It computes some statistical moments and displays them to
the user.
For bigger arrays the single-threaded numpy starts to get very slow,
whereas the multi-threaded Bohrium version wins even thought it needs
to compile first.
Running the script with Bohrium does not require one to change even
a *single* line of code!
Just

```
python3 -m bohrium ./2017.07.13_moments.py
```

does kick off the **automatic parallelisation**.

The talk has been recorded and is available on youtube. Note, that the title of the talk and the description are German, but the talk by itself is in English.