PHAST Library

Overview

PHAST Library (Parallel Heterogeneous-Architecture STL-like Template Library) is a modern C++ programming library based on the classic STL "containers, iterators, algorithms" approach.

It defines three main containers: the canonical mono-dimensional vector, a bi-dimensional matrix, and a three-dimensional cube (it should be parallelepiped, but that's an ugly name!). All the containers are dynamic, as well as STL containers.

Many types of iterators are defined accordingly, so that containers can be iterated in multiple ways: algorithms can work not only on ranges of scalar elements, but also on rows, matrices, or three-dimensional blocks. In general, each container can be seen as a collection of sections of a lesser or equal dimensionality.

This abstraction level gives the user a powerful tool to natively express complex problems that can't be expressed by the means of other similar libraries.

All the algorithms are parallel under-the-hood. The whole library can be targeted on multi-core systems (standard C++ thread implementation) or NVIDIA GPUs (CUDA implementation) via a single #define statement.

PHAST Library is in fact a multi-platform parallel library, faithful to the code once philosophy.

Different platforms require different parallelization techniques. They have already been implemented in the inner layers, so users don't have to worry about them. These techniques have been summarized and translated in a bunch of parallelization parameters.

PHAST Library uses heuristics to infer the values of such parameters, trying to intercept the configuration that would lead to the best performance for the task at hand. It also allows users to manually set them. This way, custom optimization is still possible, and in fact it can be achieved with a minimum effort.

Now, let's explore PHAST Library main features!

Abstractions

In PHAST Library there three containers: vector, matrix, and cube. They are mono-, bi-, and three-dimensional, respectively. A coordinate system has been attached to them, with the three axes labelled i, j, and k.

Various kinds of iterators can be obtained from each container via begin\end methods. These iterators are completely described by:

the axes they span while moving;
the dimensionality of the sections they point to.

The axes spanned by an iterator are clearly specified in its name and the begin\end methods used ot obtain it. For instance, an iterator_i of a matrix will span axis i and can be obtained by calling begin_i or end_i.

The dimensionality of the sections pointed by an iterator can be immediately calculated by subtracting from the dimensionality of the container the number of axes spanned by the iterator. For instance, an iterator_i of a matrix will point to sections of dimensionality 1, i.e. vectors.

Some applications require accessing data in a blocking fashion with blocks of variable size. To achieve this, a grid object can be constructed on containers and iterated through grid_iterators, special iterators that point to sections of the same dimensionality of the container.

The following table shows the available iterators for each container, the nature of the sections pointed by each of them, and the axes they span.

Container	iterator_i	iterator_ij	iterator_ijk	grid_iterator

For instance, a cube defines a range of iterator_i [begin_i(), end_i()) that spans axis i and point to matrices that lay on axes j and k.

Or, a range [begin_ij(), end_ij()) in a matrix object is a range of iterator_ij pointing to scalar elements.

PHAST Library provides many STL-like algorithms that permit applying the most common computations on ranges of iterators. Here is a full list:

accumulate
accumulate_for_each
accumulate_prod
accumulate_prod_for_each
copy
count
count_if
dot_product
fill
find
find_if

for_each
generate_normal
generate_uniform
max_element
min_element
replace
replace_if
reverse
sort
sort_desc

accumulate
accumulate_for_each
accumulate_prod
accumulate_prod_for_each
copy
count
count_if
dot_product
fill
find
find_if
for_each
generate_normal
generate_uniform
max_element
min_element
replace
replace_if
reverse
sort
sort_desc

Some algorithms, like for_each, count_if, and find_if, apply a unary, binary, or ternary operation to each of the sections pointed by the iterators in the range. The particular operation performed on each section is embodied by a PHAST-functor passed to the algorithm as a parameter.

PHAST-functors are modern C++ structs with an operator() method defined in their bodies. They must derive from pre-defined base-functors that determine the nature of the functor, i.e., the number and types of parameters it accepts in its operator() method. A full list of these base-functores follows:

Unary base-structs	Binary base-structs		Ternary base-structs
func_scal	func_scal_scal	func_mat_scal	func_scal_scal_scal
func_vec	func_scal_vec	func_mat_vec
func_mat	func_scal_mat	func_mat_mat
func_cube	func_scal_cube	func_mat_cube
	func_vec_scal	func_cube_scal
	func_vec_vec	func_cube_vec
	func_vec_mat	func_cube_mat
	func_vec_cube	func_cube_cube

Unary base structs

func_scal
func_vec
func_mat
func_cube

Binary base structs

func_scal_scal
func_scal_vec
func_scal_mat
func_scal_cube
func_vec_scal
func_vec_vec
func_vec_mat
func_vec_cube
func_mat_scal
func_mat_vec
func_mat_mat
func_mat_cube
func_cube_scal
func_cube_vec
func_cube_mat
func_cube_cube

Ternary base structs

func_scal_scal_scal

operator() parameters can be references or const-references to scalar, vector, matrix, or cube containers in the phast::functor:: namespace. They are not low-level structures, but in-functor containers that can be iterated via in-functor iterators and manipulated via in-functor algorithms, the same way as containers are iterated via iterators and manipulated via algorithms. This way, code can be maintained concise and highly expressive even in functors!

All that said, the best way to understand how to write PHAST code is to check how correct code looks like! Try different containers and iterators and watch the code changing accordingly. We bet you will notice how brief and concise it remains...

As we said, an important feature of PHAST Library is the possibility to manipulate in-functor containers via in-functor algorithms.

These are methods of the inherited base-functor, and can be accessed via this-> inside a functor's operator(). Here is a full list of them:

this->accumulate
this->accumulate_for_each
this->accumulate_prod
this->accumulate_prod_for_each
this->copy
this->count
this->count_if
this->dot_product

this->fill
this->find
this->find_if
this->for_each
this->max_element
this->min_element
this->replace
this->replace_if

this->accumulate
this->accumulate_for_each
this->accumulate_prod
this->accumulate_prod_for_each
this->copy
this->count
this->count_if
this->dot_product
this->fill
this->find
this->find_if
this->for_each
this->max_element
this->min_element
this->replace
this->replace_if

Some in-functor algorithms admit a unary operation. This operation can be embodied by an inner-functor, that can be declared similarly to the main-functor.

So, PHAST Library admits a multi-layered functor structure that leads to a hierarchical, nested parallelism.

Multi-platform

PHAST code can be written once and targeted on different platforms via a single preprocessor define, a platform selection macro.

The platforms supported so far are multi-core processors and NVIDIA GPUs, that can be selected with macros _PHAST_USING_MULTI_CORE and _PHAST_USING_CUDA, respectively.

The platform selection macro must be visible to all the translation units during compilation. The best option to achieve this would be to specify it as a preprocessor define in the project Makefile. In the following example, however, it is included in the source files for clarity's sake.

multi-core CUDA

PHAST code can be written once and targeted on different platforms via a single preprocessor define, a platform selection macro.

The platforms supported so far are multi-core processors and NVIDIA GPUs, that can be selected with macros _PHAST_USING_MULTI_CORE and _PHAST_USING_CUDA, respectively.

multi-core CUDA

So, application code is not affected by the underlying platform!

It can be migrated seamlessly from a platform to another with a single macro definition: no setup code, no contexts, no explicit device manipulation, no boiler-plate related to the underlying device.

This code once philosophy has been a major concern while developing PHAST Library and will be surely honored in future developments.

Optimization

The way algorithms execute on a given platform can be tuned by setting some parallelization parameters. PHAST Library tries its best to automatically select the optimal values for them, considering the particular architecture chosen for that execution. Though, the selection process is based on heuristics, and sometimes a sub-optimal set can be selected to execute the given task.

For this reason, PHAST Library gives to its users the possibility to explicitly select some parameters' values.

These parameters are architecture-specific and non-overlapping: this way, users can specify all of them once and only the relevant ones will affect the program execution.

Try PHAST

Now that you have read the main features of PHAST Library, try it yourself!

Click here and download the library.

Abstractions

Multi-platform

Optimization

Research

Main Paper

Other Research Papers

Try PHAST