Manual



The following documentation describes the current version of the programs. We encourage upgrade to the latest version as bugs have been corrected and new features are available.



ARCIMBOLDO_LITE manual

Overview

Mainly helical proteins or those having helices as a prominent feature are particularly suited for ARCIMBOLDO phasing. Therefore, polyalanine model helices of selected lenght are internaly provided. Any other model can be externally provided through a PDB file.

Input

  1. An mtz file containing the reflection data
  2. A SHELX reflection file hkl containing the reflection data
  3. The configuration .bor file with the parameters for an ARCIMBOLDO run, which is defined as follows.
[CONNECTION]:
#Values for the following keyword are mutually excluding
distribute_computing: multiprocessing
#distribute_computing: local_grid 
#distribute_computing: remote_grid
#default is to search your rsa private keyfile into ~/.ssh/id_rsa
#remote_frontend_passkey: 
#setup_bor_path:


[GENERAL]:
working_directory:
mtz_path:
hkl_path:
#ent_path: 

[ARCIMBOLDO-LITE]
name_job:
molecular_weight:
number_of_component:
i_label:
sigi_label:
#f_label:
#sigf_label:
fragment_to_search: 2
helix_length:
search_inverted_helix: False
search_inverted_helix_from_fragment: -1
top_inverted_solution_per_cluster: 1000
#model_file:
#fixed_models_directory:
coiled_coil: False
rmsd: 0.2
resolution_rotation: 1.0
sampling_rotation: -1
resolution_translation: 1.0
sampling_translation: -1
resolution_refinement: 1.0
sampling_refinement: -1
exclude_llg: 0
exclude_zscore: 0
use_packing: True
pack_clashes: 0
pack_distance: 3.0
pack_tra: False
occ: False
tncs: True
vrms: False
update_rmsd: False
bfac: False
solution_sorting_scheme: AUTO
#solution_sorting_scheme: LLG
#solution_sorting_scheme: ZSCORE
#solution_sorting_scheme: INITCC
#solution_sorting_scheme: COMBINED
#shelxe_line:
nice: 0
save PHS: False
usepdo: False
topfrf_1: 1000
topftf_1: 150
toppack_1: 10000
toprnp_1: 1000
topexp_1: 60
topfrf_n: 200
topftf_n: 150
toppack_n: 10000
toprnp_n: 150
topexp_n: 60
force_core: -1
force_exp: False

#The following section is only required in multiprocessing mode
[LOCAL] 
path_local_phaser: 
path_local_shelxe: 

[CONNECTION]

[distribute_computing]: multiprocessing or local_grid or remote_grid. Default is multiprocessing on a single machine. If a grid is used, the next variables should be defined.
[setup_bor_path]: path to the configuration file for program setup.
[remote_frontend_passkey]: False. If you want to use your personal id_rsa key that is not stored in the default path ~/.ssh/id_rsa, then put the full path here.


[GENERAL]

These configuration variables refer to things that need to be setup in either of the modes.
[mtz_path]: path to the mtz file with reflection data.
[hkl_path]: path to the hkl file with reflection data.
[working_directory]: absolute path to working directory.


[ARCIMBOLDO-LITE]

Next variables are mandatory and should be input by the user.
[name_job]: string, should be unique to the run and have a max length of 20 non-special characters except "_" (underscore).
[molecular_weight]: it will be used by PHASER to calculate the composition in the ASU, assuming that protein/nucleic acid have the average distribution of aminoacids and bases.
[number_of_component]: number of copies of protein/nucleic acid defined by the molecular weigth.

Latest Phaser versions are able to work directly with intensities, because new likelihood targets and functions have been defined. We strongly recommend to use this feature if possible. For that purpose, use the keywords i_label and sigi_label in the bor file to indicate the columns from the mtz.
[i_label]: label for the intensities in the mtz file.
[sigi_label]: label for the standard deviation of the intensities in the mtz file.
[f_label]: label for the amplitudes in the mtz file.
[sigf_label]: label for the standard deviation of the amplitudes in the mtz file.

A search model can be defined in four different ways:
[helix_length]: number of residues. To search an ideal helix of the specified length the number of times you have defined in [fragment_to_search]
or
[model_file]: /path/to/model_file.pdb. To search the specified model the number of times defined in [fragment_to_search]
or
[helix_length_n]: number of residues. To search for as many helices of different sizes as [fragment_to_search] (e.g. [helix_length_1], [helix_length_2], [helix_length_3], etc.).
or
[model_file_n]: name_model_file. To provide as many different models as [fragment_to_search] (e.g. [model_file_1], [model_file_2], [model_file_3], etc.).

The following parameters may be modified but are not mandatory because defaults are provided (bold) and are presented in two groups (basic and advanced):

Basic

[fragment_to_search]: 2. Total number of models that you will search for.
[shelxe_line]: String, command line for SHELXE. If unset, sensible default values depending on resolution will be used.
[rmsd]: 0.2.The expected RMS deviation of the coordinates to the "real" structure, used by PHASER. As fragments are small, we can consider them very similar. In any case the range accepted is 0.2 - 2.4.


Advanced

[coiled_coil]:False or True. If True, automatic parameterisation for coiled coil cases will be used.
[search_inverted_helix]: False or True. If True, after rotation search, helices are tested both in their current orientation and in the alternative 180º.
[search_inverted_helix_from_fragment]: -1 . Fragment search cycle from which to apply the inverted helix search. Default is all fragments, if given an integer number, it starts from such fragment.
[top_inverted_solution_per_cluster]: 1000. Number of solutions to test for inversion. Default is the first 1000 rotations per cluster, sorted by LLG.
[fixed_models_directory]: path to a folder containing pdb files. Each of them will be independently used as a fixed fragment before performing the search. [fragment_to_search] must be changed accordingly.
[resolution_rotation]: 1.0
[sampling_rotation]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_translation]: 1.0
[sampling_translation]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_refinement]: 1.0
[sampling_refinement]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[exclude_llg]: 0. Threshold to exclude solutions that have an LLG below the value set.
[exclude_zscore]: 0. Threshold to exclude solutions that have a ZSCORE below the value set.
[use_packing]: True or False. If True, PHASER's packing test will be applied to the rototranslated solutions. Otherwise, packing test will be skipped.
[pack_tra]: False or True. If True, the top translation peaks will be tested for packing with the same criteria than the one used afterwards by ARCIMBOLDO. In that way, you will ensure that top solution from translation will survive the packing test.
[pack_clashes]: 0. Number of allowed clashes in the PHASER packing test.
[pack_distance]: 3.0. Threshold for atom contacts in the PHASER packing test.
[occ]: False or True. If True, activates PHASER's occupancy refinement of solutions.
[tncs]: True or False. It enables / disables the PHASER feature for translational non crystallographic symmetry.
[vrms]: False or True. If True, activates PHASER's variance rms refinement.
[bfac]: False or True. If True, activates PHASER's bfactor refinement.
[gimble]: False or True. If True, and in case model_file has been divided in rigid groups by chains, PHASER's gimble refinement of solutions will be performed.
[sigr]: 0.0. Limit in degrees for the rotation in gimble refinement.
[sigt]: 0.0. Limit in Ångström for the translation in gimble refinement.
[solution_sorting_scheme]: AUTO or LLG or ZSCORE or COMBINED. Method to prioritize solutions for shelxe expansion.
[nice]: 0. In multiprocessing mode, it can be used to invoke ARCIMBOLDO_LITE with a particular priority for the CPU time. -20 is the highest priority and 19 is the lowest one.
[savephs]: False or True. If True, the phs map files from the initial correlation coefficient calculation step will be saved.
[usepdo]: False or True. If shelxe -o optimization has been set on the shelxe line, and fragment search includes more than one fragment, on each subsequent search, the trimmed model and not the whole helix will be used as the fixed fragment for searching the next one.
[topfrf_1]: 1000. Limit of rotation solutions at the first fragment search.
[topftf_1]: 150. Limit of translation solutions at the first fragment search.
[toppack_1]: 10000. Limit of packing solutions at the first fragment search.
[toprnp_1]: 1000. Limit of refinement solutions at the first fragment search.
[topexp_1]: 60. Limit of expansion solutions at the first fragment search.
[topfrf_n]: 200. Limit of rotation solutions for the remaining fragments to search.
[topftf_n]: 150. Limit of translation solutions for the remaining fragments to search.
[toppack_n]: 10000. Limit of packing solutions for the remaining fragments to search.
[toprnp_n]: 150. Limit of refinement solutions for the remaining fragments to search.
[topexp_n]: 60. Limit of expansion solutions for the remaining fragments to search.
[force_core]: -1. Default means that in multiprocessing mode, all physical cores minus one will be used for running ARCIMBOLDO. If set to an integer value, it will use that number of cores.
[force_exp]: False or True. f True, it will expand double the number of solutions that the program would normally expand.


[LOCAL]

[path_local_phaser]: path to your local PHASER installation.
[path_local_shelxe]: path to your local SHELXE installation.


Execution

  1. Interactively:

    ARCIMBOLDO_LITE input.bor 
  2. In background:

    When using a grid in which a password is required, it may be given as input in a text file (e.g. password). Otherwhise, just redirecting the output and using & will launch the job in background.
    ARCIMBOLDO_LITE input.bor  < password >& log & 
    

Output

The .html output file created will summarize the figures of merit rendered by PHASER determining the partial structure and SHELXE autotracing of the total structure. If the procedure gave rise to an interpretable map SHELXE could trace, the correlation coefficient between the model and the data should be higher than 30%. If this is the case, the protein was probably solved. Links to the best trace and map can be found on the html output.


Go to top page



ARCIMBOLDO_BORGES manual

Overview:

ARCIMBOLDO_BORGES can phase a structure expected to contain a given small fold using a library of tertiary structure fragments and diffraction data to 2Å.

Input

  1. An mtz file containing the reflection data
  2. A SHELX reflection file hkl containing the reflection data
  3. The configuration .bor file with the parameters for ARCIMBOLDO-BORGES, which has the [CONNECTION], [GENERAL] and [LOCAL] sections defined as in an ARCIMBOLDO run, and a particular [ARCIMBOLDO-BORGES] section.
[CONNECTION]:
#Values for the following keyword are mutually excluding
distribute_computing: multiprocessing
#distribute_computing: local_grid 
#distribute_computing: remote_grid
#default is to search your rsa private keyfile into ~/.ssh/id_rsa
#remote_frontend_passkey: 
#setup_bor_path:

[GENERAL]:
working_directory:
mtz_path:
hkl_path:
ent_path:

[ARCIMBOLDO-BORGES]
name_job: 
molecular_weight: 
number_of_component:
i_label:
sigi_label:
#f_label:
#sigf_label:
library_path:
clusters: all
n_clusters: 4
prioritize_phasers: True
rmsd: 0.2
resolution_rotation: 1.0
sampling_rotation: -1
resolution_translation: 1.0
sampling_translation: -1
resolution_refinement: 1.0
sampling_refinement: -1
resolution_gyre: 1.0
sampling_gyre: -1
exclude_llg: 0
exclude_zscore: 0
use_packing: True
pack_clashes: 0
pack_distance: 3.0
pack_tra: False
occ: False
prioritize_occ: True
tncs: True
vrms: False
bfac: False
gimble: False
nma: False
sigr: 0.0
sigt: 0.0
gyre_preserve_chains: False
rotation_model_refinement: NO GYRE
#rotation_model_refinement: BOTH
#rotation_model_refinement: GYRE
solution_sorting_scheme: AUTO
#solution_sorting_scheme: LLG
#solution_sorting_scheme: ZSCORE
#solution_sorting_scheme: INITCC
#solution_sorting_scheme: COMBINED
#shelxe_line:
#nice: 0
alixe: False
alixe_mode: one_step
savephs: False
extend_with_secondary_structure: False
parameters_elongation: 4.8 60 150
#parameters_elongation: 5 150 1
topfrf: 200
topftf: 70
toppack: -1
toprnp: 200
topexp: 60
force_core: -1
force_nsol: -1
force_exp: False

[LOCAL] 
path_local_phaser:
path_local_shelxe:


[CONNECTION]

[distribute_computing]: multiprocessing or local_grid or remote_grid. Default is multiprocessing on a single machine. If a grid is used, the next variables should be defined.
[setup_bor_path]: path to the configuration file for program setup.
[remote_frontend_passkey]: False. If you want to use your personal id_rsa key that is not stored in the default path ~/.ssh/id_rsa, then put the full path here.


[GENERAL]

These configuration variables refer to things that need to be setup in either of the modes.
[mtz_path]: path to the mtz file with reflection data.
[hkl_path]: path to the hkl file with reflection data.
[working_directory]: absolute path to working directory.


[ARCIMBOLDO-BORGES]

Next variables are mandatory and should be input by the user.
[name_job]: string, should be unique to the run and have a max length of 20 non-special characters except "_" (underscore).
[number_of_component]: number of copies of protein/nucleic acid defined by the molecular weigth.
[molecular_weight]: it will be used by PHASER to calculate the composition in the ASU, assuming that protein/nucleic acid have the average distribution of aminoacids and bases.
Latest Phaser versions are able to work directly with intensities, because new likelihood targets and functions have been defined. We strongly recommend to use this feature if possible. For that purpose, use the keywords i_label and sigi_label in the bor file to indicate the columns from the mtz.
[i_label]: label for the intensities in the mtz file.
[sigi_label]: label for the standard deviation of the intensities in the mtz file.
[f_label]: label for the amplitudes in the mtz file.
[sigf_label]: label for the standard deviation of the amplitudes in the mtz file.

[library_path]: path to the library folder. [shelxe_line]: String, command line for SHELXE. If unset, sensible default values depending on resolution will be used.
[clusters]: "all" or number/list of numbers. All will perform sequentially all clusters, otherwise a list of cluster numbers separated by commas may be specified.

The following parameters may be modified but are not mandatory because defaults are provided (bold) and are presented in two groups (basic and advanced):

Basic

[fragment_to_search]: 2. Total number of models that you will search for.
[shelxe_line]: String, command line for SHELXE. If unset, sensible default values depending on resolution will be used.
[rmsd]: 0.2.The expected RMS deviation of the coordinates to the "real" structure, used by PHASER. As fragments are small, we can consider them very similar. In any case the range accepted is 0.2 - 2.4.


Advanced

[n_clusters]: 4. Number of prioritised clusters to evaluate.
[prioritize_phasers]:True or False. If True, all phaser steps are performed for all selected rotation clusters and shelxe expansions are performed at the end.
[coiled_coil]:False or True. If True, automatic parameterisation for coiled coil cases will be used.
[resolution_rotation]: 1.0
[sampling_rotation]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_translation]: 1.0
[sampling_rotation]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_refinement]: 1.0
[sampling_refinement]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_gyre]: 1.0.
[sampling_gyre]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[exclude_llg]: 0. Threshold to exclude solutions that have an LLG below the value set.
[exclude_zscore]: 0. Threshold to exclude solutions that have a ZSCORE below the value set.
[use_packing]: True or False. If True, PHASER's packing test will be applied to the rototranslated solutions. Otherwise, packing test will be skipped.
[pack_tra]: False or True. If True, the top translation peaks will be tested for packing with the same criteria than the one used afterwards by ARCIMBOLDO. In that way, you will ensure that top solution from translation will survive the packing test.
[pack_clashes]: 0. Number of allowed clashes in the PHASER packing test.
[pack_distance]: 3.0. Threshold for atom contacts in the PHASER packing test.
[occ]: False or True. If True, activates PHASER's occupancy refinement of solutions.
[prioritize_occ]: True or False. If True, solutions that have been occupancy-refined will be prioritised for expansion with shelxe.
[tncs]: True or False. It enables / disables the PHASER feature for translational non crystallographic symmetry.
[vrms]: False or True. If True, activates PHASER's variance rms refinement.
[bfac]: False or True. If True, activates PHASER's bfactor refinement.
[gimble]: False or True. If True, and in case model_file has been divided in rigid groups by chains, PHASER's gimble refinement of solutions will be performed.
[nma]: False or True. If True, PHASER's Normal Mode Analysis will be tested after the packing analysis as another approach for solution refinement. The program will compute Initial CC with shelxe for the obtained models and select the improved ones. Please, note that activate this keyword can produce thousands phasers jobs and shelxe to be performed.
[sigr]: 0.0. Limit in degrees for the rotation in gimble refinement.
[sigt]: 0.0. Limit in Ångström for the translation in gimble refinement.
[gyre_preserve_chains]: False or True. If True, the chains present in the input models will be used for gyre and gimble refinement.
[rotation_model_refinement]: NO GYRE or GYRE or BOTH.
[solution_sorting_scheme]: AUTO or LLG or ZSCORE or COMBINED. Method to prioritize solutions for shelxe expansion.
[nice]: 0. In multiprocessing mode, it can be used to invoke ARCIMBOLDO_BORGES with a particular priority for the CPU time. -20 is the highest priority and 19 is the lowest one.
[alixe]: False or True. If True, phase combination will be applied to the density-modified solutions before expansion.
[alixe_mode]: one_step.
[savephs]: False or True. If True, the phs map files from the initial correlation coefficient calculation step will be saved.
[extend_with_secondary_structure]: False or True.
[parameters_elongation]: 4.8 60 150 or 5 150 1
[topfrf]: 200. Limit of rotation solutions.
[topftf]: 70. Limit of translation solutions.
[toppack]: -1. Limit of packing solutions. By default, all surviving solutions are kept.
[toprnp]: 200. Limit of rigid body refined solutions.
[topexp]: 60. Limit of solutions to expand.
[force_core]: -1. Default means that in multiprocessing mode, all physical cores minus one will be used for running ARCIMBOLDO. If set to an integer value, it will use that number of cores.
[force_nsol]: -1. Default chooses automatically the number of solutions to keep based on available hardware. If given an integer number, will keep that number of solutions per step
[force_exp]: False or True. f True, it will expand double the number of solutions that the program would normally expand.

Execution

  1. Interactively:

    ARCIMBOLDO_BORGES  input.bor 
  2. In background:

    When using a grid in which a password is required, it may be given as input in a text file (e.g. password). Otherwhise, just redirecting the output and using & will launch the job in background.
    ARCIMBOLDO_BORGES input.bor  < password >& log & 
    

Output

The .html output file created will summarize the figures of merit rendered by PHASER determining the partial structure and SHELXE autotracing of the total structure. If the procedure gave rise to an interpretable map SHELXE could trace, the correlation coefficient between the model and the data should be higher than 30%. If this is the case, the protein was probably solved. Links to the best trace and map can be found on the html output.


Go to top page



ARCIMBOLDO-SHREDDER manual

Overview:

A model expected to have some structural similarity to the target protein is the template to generate a set of search fragments with SHREDDER for use in ARCIMBOLDO. The experimental data are used to guide removal of the more incorrect parts of the model. Various approaches for model selection are covered in the following sections.

Input

  1. An mtz file containing the reflection data
  2. A SHELX reflection file hkl containing the reflection data
  3. The configuration .bor file with the parameters for ARCIMBOLDO-SHREDDER, which has the [CONNECTION], [GENERAL] and [LOCAL] sections defined as in an ARCIMBOLDO run, and a particular [ARCIMBOLDO-SHREDDER] section.
[CONNECTION]:
#Values for the following keyword are mutually excluding
distribute_computing: multiprocessing
#distribute_computing: local_grid 
#distribute_computing: remote_grid
#default is to search your rsa private keyfile into ~/.ssh/id_rsa
#remote_frontend_passkey: 
#setup_bor_path:

[GENERAL]:
working_directory:
mtz_path:
hkl_path:
ent_path:

[ARCIMBOLDO-SHREDDER]:
name_job:
molecular_weight:
number_of_component:
i_label: 
sigi_label:
#f_label:
#sigf_label:
number_cycles_model_refinement: 2
model_file:
fragment_to_search: 1
trim_to_polyala: True
maintaincys: False
clusters: all
n_clusters: 4
prioritize_phasers: True
rmsd_shredder: 1.2
rmsd_arcimboldo: 0.8
resolution_rotation_shredder: 1.0
sampling_rotation_shredder: -1
resolution_rotation_arcimboldo: 1.0
sampling_rotation_arcimboldo: -1
resolution_translation: 1.0
sampling_translation: -1
resolution_refinement: 1.0
sampling_refinement: -1
resolution_gyre: 1.0
sampling_gyre: -1
exclude_llg: 0
exclude_zscore: 0
use_packing: True
pack_clashes: 3
pack_distance: 3.0
pack_tra: False
occ: False
prioritize_occ: True
tncs: True
vrms: False
update_rmsd: False
bfac: False
bfacnorm: True
gimble: False
sigr: 0.0
sigt: 0.0
gyre_preserve_chains: False
rotation_model_refinement: BOTH
#rotation_model_refinement: NOGYRE
#rotation_model_refinement: GYRE
solution_sorting_scheme: AUTO
#solution_sorting_scheme: LLG
#solution_sorting_scheme: ZSCORE
#solution_sorting_scheme: INITCC
#solution_sorting_scheme: COMBINED
#shelxe_line:
#nice: 0
alixe: False
alixe_mode: one_step
savephs: False
shred_method: sequential
#shred_method: spherical
shred_range: 4 20 1 omit all
topfrf: 200
topftf: 70
toppack: -1
toprnp: 200
topexp: 40
force_core: -1
force_nsol: -1
force_exp: False

[LOCAL] 
path_local_phaser:
path_local_shelxe:

The following variables are mandatory and should be input by the user.
[name_job]: string, should be unique to the run and have a max length of 20 non-special characters except "_" (underscore).
[molecular_weight]: used by PHASER to calculate the composition of the ASU.
[number_of_component]: number of copies of protein/nucleic acid defined by the molecular weigth.

Latest Phaser versions are able to work directly with intensities, because new likelihood targets and functions have been defined. We strongly recommend to use this feature if possible. For that purpose, use the keywords i_label and sigi_label in the bor file to indicate the columns from the mtz.
[i_label]: label for the intensities in the mtz file.
[sigi_label]: label for the standard deviation of the intensities in the mtz file.
[f_label]: label for the amplitudes in the mtz file.
[sigf_label]: label for the standard deviation of the amplitudes in the mtz file.

The following variables can be modified but are not mandatory because they have sensible preset defaults (bold). They are presented in two groups (basic and advanced):

Basic

[fragment_to_search]: 1. Number of model copies to be located in the asymmetric unit.
[shelxe_line]: String, command line for SHELXE. If unset, sensible default values depending on resolution will be used.
[rmsd_shredder]: 1.2. The expected RMS deviation in &A of the coordinates to the "real" structure, used by PHASER in the evaluation of the shredded models. As fragments are not very similar overall, this value is intentionally underestimated. In any case the accepted range is 0.2 - 2.4.
[rmsd_arcimboldo]: 1.0. The expected RMS deviation in &A of the coordinates to the target structure, used by PHASER in the ARCIMBOLDO runs from the best models. As models have been both reduced and optimized, a lower figure than for the SHREDDER procedure is appropriate. In any case the range accepted is 0.2 - 2.4.
[shred_method]: sequential or spherical.
If the selected method is the sequential: [shred_range]: 4 20 1 omit. Range of sizes for the shreds (4-20), step between starting residues (1). If keyword omit is chosen, shreds will be eliminated from the structure. Else, with keyword fragment, the shreds will be extracted as search fragments.
If the selected method is the spherical: sphere_definition: default 1 remove_coil 7 4 0.45 0.2. Size of the model (default based on eLLG target), step between starting residues (1), remove_coil o maintain_coil, minimum size of alpha helices in template, minimum size of beta strands in template, minimum threshold for helix annotation, minimum threshold for beta annotation.
ellg_target: 60. Target eLLG to guide selection of the models size for the given rmsd.


Advanced

[number_cycles_model_refinement]: 2. Number of gyre refinement cycles to perform on the models.
[trim_to_polyala]: True or False. If True, template model will be trimmed to polyalanine.
[maintaincys]: False or True. If True, when trimmed to polyalanine, cysteine residues will keep their sidechain .
[clusters]: "all" or number/list of numbers. All will perform sequentially all clusters, otherwise a list of cluster numbers separated by commas may be specified.
[n_clusters]: 4. Number of prioritised clusters to evaluate.
[prioritize_phasers]:True or False. If True, all phaser steps are performed for all selected rotation clusters and shelxe expansions are performed at the end.
[resolution_rotation_shredder]: 1.0. The resolution limit for the FRF from which to cluster rotations in the SHREDDER runs. Default is using 1.0 or full resolution if data are not available up to 1.0.
[sampling_rotation_shredder]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_rotation_arcimboldo]: 1.0. Is the resolution to perform the FRF in order to cluster rotations in the ARCIMBOLDO_LITE or ARCIMBOLDO_BORGES runs. Default is using 1.0 or full resolution if data are not available up to 1.0.
[sampling_rotation_arcimboldo]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_translation]: 1.0. Is the resolution limit for the translation function in the ARCIMBOLDO_LITE or ARCIMBOLDO_BORGES runs. Default is using 1.0 or full resolution if data are not available up to 1.0.
[sampling_translation]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_refinement]: 1.0. Is the resolution limit for the refinement in the ARCIMBOLDO_LITE or ARCIMBOLDO_BORGES runs. Default is using 1.0 or full resolution if data are not available up to 1.0.
[sampling_refinement]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[resolution_gyre]: 1.0. Is the resolution limit for the gyre refinement in the ARCIMBOLDO_BORGES runs. Default is using 1.0 or full resolution if data are not available up to 1.0.
[sampling_gyre]: -1. Default PHASER sampling, which is dynamically calculated considering model and data.
[exclude_llg]: 0. Threshold to exclude solutions that have an LLG below the value set.
[exclude_zscore]: 0. Threshold to exclude solutions that have a ZSCORE below the value set.
[use_packing]: True or False. If True, PHASER's packing test will be applied to the rototranslated solutions. Otherwise, packing test will be skipped.
[pack_tra]: False or True. If True, the top translation peaks will be tested for packing with the same criteria than the one used afterwards by ARCIMBOLDO. In that way, you will ensure that top solution from translation will survive the packing test.
[pack_clashes]: 3. Percentage of allowed clashes in the PHASER packing test.
[pack_distance]: 3.0. Threshold for atom contacts in the PHASER packing test.
[occ]: False or True. If True, activates PHASER's occupancy refinement of solutions.
[prioritize_occ]: True or False. If True, solutions that have been occupancy-refined will be prioritised for expansion with shelxe.
[tncs]: True or False. It enables / disables the PHASER feature for translational non crystallographic symmetry.
[vrms]: False or True. If True, activates PHASER's variance rms refinement.
[bfac]: False or True. If True, activates PHASER's bfactor refinement.
[bfacnorm]: True or False. If True, bfactors in the input template are set to a constant value.
[gimble]: False or True. If True, and in case model_file has been divided in rigid groups by chains, PHASER's gimble refinement of solutions will be performed.
[sigr]: 0.0. Limit in degrees for the rotation in gimble refinement.
[sigt]: 0.0. Limit in Ångström for the translation in gimble refinement.
[gyre_preserve_chains]: False or True. If True, the chains present in the input models will be used for gyre and gimble refinement.
[rotation_model_refinement]: NOGYRE for sequential mode, BOTH for spherical mode. Can also be GYRE.
[solution_sorting_scheme]: AUTO or LLG or ZSCORE or COMBINED. Method to prioritize solutions for shelxe expansion.
[nice]: 0. In multiprocessing mode, it can be used to invoke ARCIMBOLDO_SHREDDER with a particular priority for the CPU time. -20 is the highest priority and 19 is the lowest one.
[alixe]: False or True. If True, phase combination will be applied to the density-modified solutions before expansion.
[alixe_mode]: one_step.
[savephs]: False or True. If True, the phs map files from the initial correlation coefficient calculation step will be saved.
[topfrf]: 200. Limit of rotation solutions.
[topftf]: 70. Limit of translation solutions.
[toppack]: -1. Limit of packing solutions. Default is all the surviving ones.
[toprnp]: 200. Limit of rigid body refined solutions.
[topexp]: 40. Limit of solutions to expand.
[force_core]: -1. Default means that in multiprocessing mode, all physical cores minus one will be used for running ARCIMBOLDO. If set to an integer value, it will use that number of cores.
[force_nsol]: -1. Default chooses automatically the number of solutions to keep based on available hardware. If given an integer number, will keep that number of solutions per step
[force_exp]: False or True. f True, it will expand double the number of solutions that the program would normally expand.

Execution

  1. Interactively:

    ARCIMBOLDO_SHREDDER  input.bor
  2. In background:

    When using a grid in which a password is required, it may be given as input in a text file (e.g. password). Otherwhise, just redirecting the output and using & will launch the job in background.
    ARCIMBOLDO_SHREDDER input.bor  < password >& log & 
    

Output

Sequential mode

For each rotation cluster, shredder will generate five ARCIMBOLDO search models. The ARCIMBOLDO-SHREDDER work folder contains a set of folders called ARCI_*/, where * refers to the number of the rotation cluster. Each contains five folders with ARCIMBOLDO runs called after the search model used. Models are saved into the ./library/ folder. You will find:

  • peaks: a run using the peaks_*_0.pdb model, that is, the model created by selecting the peaks determined in the Shred LLG function.
  • overt: a run using the overt_*_0.pdb model, that is, the model created by discarding all residues that are above the minimum peak height in the Shred LLG function.
  • percentile70: a run using the percentile70_*_0.pdb model, that is, the model created by discarding all residues that are above the 70th percentile in the Shred LLG function
  • percentile75: a run using the percentile75_*_0.pdb model, that is, the model created by discarding all residues that are above the 75th percentile in the Shred LLG function
  • pklat: a run using the pklat_*_0.pdb model, that is, the model created eliminating both the peaks selection and plateau regions.

Each of the folders will contain an html file with the ARCIMBOLDO run output.

Spherical mode

All generated models are evaluated as a library in an ARCIMBOLDO_BORGES run. In the working directory there is a folder called ARCIMBOLDO_BORGES and one called models. Models contains the library, and ARCIMBOLDO_BORGES is a run of ARCIMBOLDO_BORGES using that library. In the working directory there is an html file, but only contains the echo of the bor file and a link to the ARCIMBOLDO_BORGES run html. Inside the ARCIMBOLDO_BORGES folder is where the best* files and the html of that run will be found.


Go to top page



ALEPH manual

Execution:

The ALEPH software can be run from its Graphical User Interface or from the command line. It requires Python3.

  • Graphical User Interface:

    ALEPHUI
  • Command line:

    The generic command line will be composed by the following elements:
     ALEPH ALEPH_mode --parameter1 parameter1_value --parameter2 paramter2_value...

Input:

ALEPH_mode: annotate (A), decompose (D), generate_library (L) or superpose (S).

Note: there is another ALEPH_mode parameter called find_folds. Activation for a special mode of decompose in which is perform a iterative hierarchical clustering.

Parameters are dependent of the mode selected. The following list contains all the parameters and their description. Mode(s) where they apply are specified in parentheses. Default values and parameter range are quoted:

[pdbmodel](A,D,L): /path/to/model_file.pdb. Pdb model for input.
[strictness_ah](A,D,L,S): 0.5; 0-1. Strictness threshold for accepting ah CVs. High values are used for a precise annotation while selection of low values extend secondary structure elements.
[strictness_bs](A,D,L,S): 0.3; 0-1. Strictness threshold for accepting bs CVs. High values are used for a precise annotation while selection of low values extend secondary structure elements.
[algorithm](D): fastgreedy, infomap, eigenvectors, label_propagation, community_multilevel, edge_betweenness, spinglass, walktrap. Algorithm for the community clustering procedure.
[homogeinity](D): False or True. If True, homogeneously sized clusters are favoured.
[pack_beta_sheet](D): False or True. If True, avoids splitting a beta sheet in different groups.
[work_directory](L): /path/to/working/directory.

The dataset from which generate the library can be restricted in different ways: [directory_database](L): Set the database as the given input folder.
[cath_id](L): Extract from the PDB a database of the given cath id.
[target_sequence](L): /path/to/sequence_file.seq. Extract a database from the given target sequence. The sequence is used to perform a BLAST search and for significant hits the SCOP_ID and CATH_ID are read, then structures of the same CATH_ID or SCOP_ID are downloaded to generate the input database.

ALEPH generation of libraries can be parallelized. Default running is in multiprocessing but the following options are available:
[supercomputer](L) /path/to/configuration_file.bor.
[remote_grid](L) /path/to/configuration_file.bor.
[local_grid](L) /path/to/configuration_file.bor.

[score_intra_fragment](L,S): 95; 0-100. Global geometrical secondary structure match expressed as score percentage.
[score_inter_fragments](L,S): 90; 0-100. Global geometrical tertiary structure match expressed as score percentage.
[rmsd_min](L): 0.0. Superposition threshold, minimum rmsd against the template.
[rmsd_max](L): 6.0. Superposition threshold, maximum rmsd against the template.
[clustering_model](L): no_clustering, rmsd, rmsd_range, random_sampling. Clustering modes in library generation.
[rmsd_clustering](L, rmsd): Threshold for pairwise comparison.
[number_of_ranges](L, rmsd_range): 500. Number of groups.
[number_of_clusters](L, rmsd_range, random_sampling): 7000. Number of representative models extracted from the library.
[exclude_sequence](L): False or True. If True, avoid the extraction of models from any chain that aligns within a s.i. >= 90% from this sequence.
[test](L): False or True. If True, test with a reduced sample of models to check parameterisation.
[representative](L): False or True. If True, for each structure in the PDB database extracts only the model with lowest rmsd.
[reference](S): /path/to/model_file.pdb. Reference pdb model fixed in superposition.
[target] /path/to/model_file.pdb or [targets] /path/to/directory (S): Target pdb model or directory of pdb models for moving structure in superposition.
[rmsd_thresh](S): 1.5. Rmsd threshold to accept a superposition.
[peptide_length](A,D,L,S): 3. Peptide length for computing a CV.
[width_pic](A,D,L,S): 100.0. Width in inches for pictures.
[height_pic](A,D,L,S): 20.0. Height in inches for pictures.
[min_ah_dist](A,D,L,S): 0.0. Minimum distance allowed among ah CVs in the graph.
[max_ah_dist](A,D,L,S): 20.0. Maximum distance allowed among ah CVs in the graph.
[min_bs_dist](A,D,L,S): 00.0. Minimum distance allowed among bs CVs in the graph.
[max_bs_dist](A,D,L,S): 15.0. Maximum distance allowed among bs CVs in the graph.
[write_graphml](A,D,L,S): False or True. If True, write graphml files.

Output:

ALEPH outputs annotated secondary structure fragments in a pdb file and plots geometrical properties of CVs in png files. The decomposition mode marks the groups with different chain IDs in the pdb file. If library generation is performed, ALEPH outputs the pdb files of the superposed models in a new directory called library. In cases when clustering is activated, another directory called clusters is written containing the representative models. The superposition outputs a pdb with the superposed target. All these files are displayed through ALEPHUI.


Go to top page