Keywords for Parallel Runs

On all systems the parallel input preparation is done automatically. Details for the parallel installation are given in Section 3.2.2. The following keyword is optional for parallel MPI runs:

`$parallel_platform `

*architecture*

Currently the following parallel platforms are supported:

`SMP`- for systems with very fast communication; all CPUs are used
for the linear algebra part.
Synonyms for
`SMP`

are:

`HP V-Class`

,`SP3-SMP`

and`HP S/X-Class`

`MPP`- for systems with fast communication like Fast-Ethernet, the number of
CPUs that will be taken for linear algebra part depends on the
size of the matrices.
Synonyms for
`MPP`

are:

`SP3`

and`linuxcluster`

`cluster`- for systems with slow communication, the linear algebra part will
be done on one single node.
Synonyms for
`cluster`

are:

`HP Cluster`

and every platform that is not known by`TURBOMOLE`.

Use this setting if you encounter problems with ScaLAPACK. ScaLAPACK routines may print warnings or errors to the master output file (lines often start with PD), especially if the number of CPUs is large and the size of the matrices is small to medium. `SGI`- similar to
`SMP`

, but here the server task is treated differently: the MPI implementation on the SGIs would cause this task to request too much CPU time otherwise.

If you want to run `mpgrad`, `$traloop` has to be equal to or a
multiple of the number of parallel workers.

For very large parallel runs it may be impossible to allocate the
scratch files in the working directory.
In this case the `$scratch files` option can be specified;
an example for a `dscf` run is given below.
The scratch directory must be accessible from all nodes.

$scratch files dscf dens /home/dfs/cd00/cd03_dens dscf fock /home/dfs/cd00/cd03_fock dscf dfock /home/dfs/cd00/cd03_dfock dscf ddens /home/dfs/cd00/cd03_ddens dscf xsv /home/dfs/cd00/cd03_xsv dscf pulay /home/dfs/cd00/cd03_pulay dscf statistics /home/dfs/cd00/cd03_statistics dscf errvec /home/dfs/cd00/cd03_errvec dscf oldfock /home/dfs/cd00/cd03_oldfock dscf oneint /home/dfs/cd00/cd03_oneint

For all programs employing density functional theory (DFT)
(i.e. `dscf`/`grad`and `ridft`/`rdgrad`)
`$pardft`

can be specified:

$pardft tasksize=1000 memdiv=0

The `tasksize`

is the approximate
number of points in one DFT task (default: 1000) and `memdiv`

says whether the nodes are dedicated exclusively to your job (memdiv=1)
or not (default: memdiv=0).

For `dscf` and `grad` runs you need a parallel statistics file
which has to be generated in advance. The filename is specified
with
`$2e-ints_shell_statistics file=DSCF-par-stat`

or
`$2e-ints'_shell_statistics file=GRAD-par-stat`

respectively.

The statistics files have to be generated with a single node `dscf`
or `grad` run. For a `dscf` statistics run one uses the keywords:

$statistics dscf parallel $2e-ints_shell_statistics file=DSCF-par-stat $parallel_parameters maxtask=400 maxdisk=0 dynamic_fraction=0.300000and for a

$statistics grad parallel $2e-ints'_shell_statistics file=GRAD-par-stat $parallel_parameters maxtask=400

`maxtask`

is the maximum number of two-electron integral tasks,
`maxdisk`

defines the maximum task size with respect to mass storage
(MBytes) and
`dynamic_fraction`

is the fraction of two-electron integral tasks
which will be allocated dynamically.
For parallel `grad` and `rdgrad` runs one can also specify:

$grad_send_densThis means that the density matrix is computed by one node and distributed to the other nodes rather than computed by every slave.

In the parallel version of `ridft`, the first client reads in the keyword
`$ricore` from the `control` file and uses the given memory
for the additional RI matrices and for RI-integral storage. All other
clients use the same amount of memory as the first client does, although
they do not need to store any of those matrices. This leads to a better
usage of the available memory per node. But in the case of a big number of
auxiliary basis functions, the RI matrices may become bigger than
the specified `$ricore` and all clients will use as much memory
as those matrices would allocate even if that amount is much larger than
the given memory. To omit this behavior one can use:

`$ricore_slave`

*integer*

specifying the number of MBs that shall be used on each client.

For parallel `jobex` runs one has to specify all the parallel keywords
needed for the different parts of the geometry optimization,
i.e. those for `dscf` and `grad`, or those for
`ridft` and `rdgrad`, or those for `dscf` and `mpgrad`.