Setup and Installation

Introduction

ARCIMBOLDO was designed and developed to run on a local machine accessing a local or remote grid environment. However, current releases of all of our programs include the option of running the programs in multiprocessing on a single machine. When using a grid, the program will run on a local machine distributing computationally demanding jobs on the grid. Currently, we support Condor, SGE/Opengrid, MOAB and Torque.


Software requirements

Programs are distributed through pip, which automatically installs all the required python dependencies. Also, some third party software is required:

  • Phaser Versions 2.7.x
  • Shelxe Current version
  • Python Python versions between 2.6 or newer

The required software is available from each project website and is most likely already installed in any macromolecular crystallography laboratory.


Recommended Local Hardware Requirements

The local machine should be an up to date workstation. For standalone uses, 8 or more cores should be advantageous. For grid computing, this becomes less critical. The specs of a typical machine dedicated to this task in our local setup are:

  • Processor: *Intel i7 950 (4 core x 3.07 GHz)
  • Memory: 6GB
  • Hard Drive: >150 GB
  • OS: GNU/Linux

Remote or Local Grid Machines

If available, the majority of the calculations are distributed over a grid. The grid can be local, as the one used in our lab for testing and development consisting of any available machine from our crystallography cluster (110 cores with aproximately 130 GFlops peak performance with a minimum of 2GB of memory per core coming from Intel i7 or Xeon processors) or a remote supercomputer where a Condor, SGE, Torque or MOAB installation is available.

The documentation to deploy a Condor-grid is available on the main Condor project (now HTCondor) site

The documentation to deploy a SGE-grid is available for example on the Son of Grid Engine website which is the implementation we use in our setup.

The documentation and download for Torque is available for example on the Torque website.


Scientific software on remote or local grid machines

The required scientific software should be available on all machines where jobs will run.

Phaser is distributed through CCP4 and Phenix. These suites are often updated and might introduce changes to the programs ARCIMBOLDO uses, causing unexpected results (breaking the program). In our setup we keep a separate version of phaser so we can safely perform updates to both suites without the risk of breaking anything. The required files are isolated, so there is no need to keep an extra set of the full CCP4 and Phenix suites.

Instructions for our setup are found below:

Phaser

Required libraries (located inside ccp4_folder/lib):

  • libcctbx.so
  • libiotbx_pdb.so
  • libomptbx.so
  • libcctbx_sgtbx_asu.so
  • libmmtbx_masks.so
  • libboost_filesystem.so
  • libboost_system.so
  • libboost_thread.so

Copy these libraries and the phaser binary to a folder of your choice and create a file called condor_phaser with the following content (for bash):

#! /bin/bash
# add the required libraries

export LD_LIBRARY_PATH=/your_path_of_choice:$LD_LIBRARY_PATH

# then launch phaser
/your_path_of_choice/phaser

Give condor_phaser execution permission.

chmod +x condor_phaser

Setup for the programs

When using a local or remote grid, a configuration file setup.bor is required.

The parameters defined in the setup.bor are:

[LOCAL]
path_local_phaser: /path/to/phaser
path_local_shelxe: /path/to/shelxe
# If the python interpreter is not the default one, the following variable indicates its path
python_local_interpreter: 

# Next sections depend on the local grid implementation

[CONDOR]
# Parameters for each executable under Condor (memory constraints, 
# CPU speed ...) 
requirements_shelxe: 
requirements_phaser:
requirements_borges:

[SGE]
qname: 
# If there are no special rules to use a queue, 
# there is no need to edit this value
fraction: 1

[TORQUE]
qname:

[MOAB]
partition:

[SLURM]
partition:

[GRID]
# Next parameters are independent of the grid implementation, and contain the information for connecting to a remote grid 
path_remote_phaser: 
path_remote_shelxe: 
path_remote_borgesclient: 
# If the python interpreter is not the default one the following 
# variable indicates its path
python_remote_interpreter: 
remote_frontend_username: 
remote_frontend_host:
path_remote_sgepy: 
home_frontend_directory:  
remote_frontend_port: 
# The scheduler system on the remote grid
type_remote: Condor | SGE
# Boolean variable, set to True for NFS filesystem, otherwise set to False
remote_fylesystem_isnfs: True | False
remote_frontend_prompt: $
remote_submitter_username: 
remote_submitter_host: 
remote_submitter_port: 
remote_submitter_prompt: $

Once the configuration file is set up and the external software requirements are met and configured, the program is ready to be used.