2.2 Installation problems: How to solve

Please check your user limits!

If one or several tests of the test suite fail, it is very likely that your user limits for stack size and/or memory are too small.

sh/bash/ksh users: please do a

ulimit -a

to get your actual limits. The output should look like:

core file size (blocks)     0  
data seg size (kbytes)      unlimited  
file size (blocks)          unlimited  
max locked memory (kbytes)  unlimited  
max memory size (kbytes)    unlimited  
open files                  1024  
pipe size (512 bytes)       8  
stack size (kbytes)         unlimited  
cpu time (seconds)          unlimited  
max user processes          8191  
virtual memory (kbytes)     unlimited

The most important entries are data size, stack size, max memory size, and virtual memory. Those should be either unlimited or as big as your total RAM.

To set, e.g. the stack size to the maximum allowed size on your system (the so called hard limit), do:

ulimit -s hard

csh/tcsh users: please do limit instead of ulimit and check the output.

Again, like given above, the limits should be at least as high as your memory available. The syntax for changing the limits to unlimited using csh/tcsh is:

limit stacksize hard

And please note that on 32bit machines, unlimited can be the same as 4GB (4194303 kbytes).

If you are using a queuing system:

Note that if you are submitting jobs to a queue, the user limits might be different from what you get when you log in on the machines! To check your limits, you have to add ulimit or limit in the script that is sent to the queue:

....  
ulimit -a > mylimits.out  
jobex -ri -c 200 -statpt > jobex.out  
...

send it to the queue and check the file mylimits.out to find out which limits are set.

Parallel version:

The parallel binaries are being started by the mpirun command which often uses ssh to start a process on a remote node. The limits for the stack size can not be set by the user in such a case, so everything in $HOME/.profile, $HOME/.bashrc, etc. will not help to get rid of the problem.

To check the limits on a remote node, try (sh/bash/ksh syntax):

ssh <hostname> ulimit -a

If the ssh command gives a lower stack size than unlimited or a large number, you have to change the file

/etc/security/limits.conf

on all nodes where the parallel binaries might run, and add there the line (example for 4GB limit)

*                soft    stack           4194303

Redo ssh <hostname>ulimit -a and you should get 4GB stack size limit, as it is set in limits.conf now.