CASTEP

CASTEP on Parallel Computers

Q. How should I choose the best number of cores for a parallel run on ARCHER??

A. The most efficient way to use any parallel computer for CASTEP is to parallelise over the k-points. This means that in general you should try to maximise the k-point parallelism, to the point that each core (or set of cores -- see later) has only a single k-point each.

If CASTEP is given more cores than it has k-points for a calculation, then it will further distribute the data and workload over the plane-waves (`G-vectors'). This parallelism works well for small numbers of cores, but requires more communication between the processes so you need to be careful not to use too many cores (known as `over-parallelising').

If you do a `dryrun' of CASTEP at a high verbosity level, CASTEP will tell you the number of k-points and plane-waves for your calculation. On ARCHER (and most parallel machines with a good interconnect) you should find the plane-wave parallelism works very well down to 1000 plane-waves per core, and reasonably well down to 500 plane-waves per core.

This means that if you have a 5 k-point calculation with 60,000 plane-waves per k-point, you might reasonably choose:

  • 5 nodes (5*24 = 120 cores) which will put 1 k-point on each node, and distribute the plane-waves for that k-point over the 24 cores of that node.
  • 10 nodes, which will spread each k-point over 2 nodes. The 60,000 plane-waves are spread over 48 cores, so roughly 1250 each.

You could also choose 15 or 20 nodes (around 800 and 600 plane-waves/core) -- multiples of 5 being good for this example because there are 5 k-points. 25 nodes (500 plane-waves/core) would probably be OK, but more than that would probably be a bit too inefficient.

The only real exception to this `maximise k-point parallelism' rule is if you are running out of memory and are performing a spectral or NMR calculation -- in this case, plane-wave parallelism may be preferred. You can always force this by putting "data_distribution : Gvector" in your .param file. You might also want to investigate using OpenMP threads to do "shared-memory parallelism".

Q. My job stopped before it reached the end. How can I find out how and why it failed??

A. You should always check to see whether CASTEP has written an error file. CASTEP's error files have names which end ".err" and should explain any problems CASTEP had. On large parallel machines you will often have to use a queuing system, and this will also have an error file -- you should always check these for any system error messages (e.g. it couldn't actually find CASTEP).

Q. My parallel job fails and the the messages in the PBS log file talk about the "OOM Killer". What does this mean??

A. This is a system error from the "OOM Killer" process. OOM stands for Out Of Memory, and this message means that CASTEP tried to use more memory than was available on the nodes. You will need to find a way to reduce the memory (RAM) needed per node in order to run the calculation; possible ways include using more nodes, using OpenMP threads and, for spectral or NMR calculations, switching from k-point to G-vector parallelism (see earlier answers).

Q. How many atoms can I sensibly include in my model??

A. This depends on the parallel machine, but a few hundred atoms should not normally cause any problems. In fact it isn't the number of atoms which is most important, but the number of electrons (i.e. the number of bands you need to compute). On a large machine (e.g. ARCHER) you can run a simulation of a few thousand s- or p-block atoms on 500-2000 cores, but d- and f-elements will take more time (and memory).

Q. How can I check my input files do not have a mistake before submitting to the batch queue??

A. If you do a `dryrun' then CASTEP will check your input files for any mistakes. You can do a dryrun in serial or parallel, and since it doesn't do very much computation it is usually quick to run even on a PC. If you want to do the check on the parallel computer itself, it is usually best to run it in parallel on a short, 1-node queue (e.g. ARCHER's 20 minute "short" queue).

Q. How can I find out how much memory my run will use??

A. If you do a `dryrun' CASTEP will try to estimate how much memory it will need. This is only an estimate, but it should give you a rough idea.