e16.html

PHASER: MR with Maximum Likelihood

Drs. Mariusz Jaskolski (UAM) & Alexander Wlodawer (NCI):

Although numerous crystal structures of several retroviral proteases (PR) have been solved in the past (e.g. from HIV, RSV, FIV, EIAV) and all these enzymes share significant sequence and structure homology, the new structure of human T-cell leukemia virus (HTLV) PR could not be solved (in our hands) by any of the standard MR programs (AMoRe, EPMR, MolRep). However, the structure was solved in a straightforward manner using the PHASER program, as implemented in ccp4i.

Some explanation of the failure of the "older" MR programs is provided by the comparatively low levels of sequence identity/similarity between the models and the target (ranging in simple CLUSTAL comparisons from 23.9/59.3% for FIV PR to 32.3/63.6% for HIV PR) and by the complicated architecture of the unknown structure (three homodimeric molecules forming a highly pseudosymmetric trimer), combined with high degree of uncertainty about the asymmetric unit contents.

In automatic runs, which included the full available resolution of 2.6 A, PHASER correctly solved the trimeric model, but failed (correctly again) to located any additional copies of the molecule due to packing considerations. The solutions could be unambiguously identified by their high Z-score parameter (> 10) for the more similar models (HIV PR, EIAV PR). The less similar models had Z-score values just below 8, and corresponded to a correct (7.96, RSV PR) or incorrect (7.80, FIV PR) solution. The quality of the solutions is also recognizable from the final LL-gain parameter, which at a value of 110 could distinguish between incorrect and correct solutions, increased to about 180 for "strong", unambiguous solutions (Z-score about 11), and rose sharply to 270 for the absolutely best case (Z-score 14.54).

In post factum analyses, the levels of sequence identity/similarity are reflected in the r.m.s. deviations between the Ca atoms of the corresponding models and the final HTLV PR structure, which are high even for the best models: HIV PR 1.6 A, EIAV PR 1.7 A, FIV PR 1.8 A, RSV PR 1.9 A.

In practical terms, the structure was cracked using the best available model of HIV PR determined at atomic resolution (Z-score 11.16). However, in a posteriori calculations the clearest solution was obtained with a medium-resolution HIV PR model, which in Ca comparison with the final target did not show any obvious superiority. This observation reinforces the notion that the performance of even the most robust MR algorithms may critically depend on the initial conditions and that, whenever available, different models should be tried.

Xinhua Ji (NCI): All Data and Small Search Models

My group has solved about 10 structures using PHASER. Our limited experience suggests the use of all data and small search models. At higher resolution, the solution is more likely unique. Small models are often solid and likely more accurate. Below is a script for the first, and often successful, MR attempt with PHASER, where protein-1 contains a single domain while protein-2 contains two different domains.

phaser << eof > auto.log # PHASER v1.3

TITLe your-project automatic

MODE MR_AUTO

HKLIn your-data.mtz

LABIn F=FP SIGF=SIGFP

ENSEmble model-1 PDBfile model-1.pdb IDENtity 95

ENSEmble model-2a PDBfile model-2a.pdb IDENtity 90

ENSEmble model-2b PDBfile model-2b.pdb IDENtity 90

COMPosition PROTein MW 32000 NUM 1 # protein-1

COMPosition PROTein MW 66000 NUM 1 # protein-2

SEARch ENSEmble model-1 NUM 1

SEARch ENSEmble model-2a NUM 1

SEARch ENSEmble model-2b NUM 1

ROOT auto

eof

Multiple search models (as shown below) help, especially when the search model is not that solid, as indicated by high B factors (of each model), high RMS values (between the models), and low sequence identity (to the unknown).

ENSEmble model-2b PDBfile model-2b1.pdb IDENtity 90 &

PDBfile model-2b2.pdb IDENtity 90 &

PDBfile model-2b3.pdb IDENtity 90

Dr. Mark Mayer (NICHD): Keeping up with new software is time consuming especially for small labs. The NIH X-Ray Diffraction Interest Group News Letter gives us a chance as a community to share information about newer programs, and tips for using them. On the Mac side of things the Structural Biology Grid has version of CNS and Refmac that support multithreading and fast FFT calculations on G4/G5 systems which significantly speed up calculation of composite omit maps. The binaries are available from: http://www.sbgrid.org/osx.php?software=1&id=0