ClusPro 2.0: protein-protein docking

Help

For additional examples, please refer to our Nature Protocols publication here.

Tutorials

FAQ

I keep getting errors that my file has unknown residues. What should I do?

The problem is that some record in your pdb file is marked as ATOM, but is not one of the 20 standard amino acids or an RNA base. Some programs will place HETATMs in ATOM records. You can edit the file to remove these residues or change them to HETATM records.

I see four different choices for my docking results, "Balanced", "Electrostatic-favored", and so on. Which one should I choose?

We provide many different options for docking because we believe good results go hand-in-hand with experimental knowledge of the complex. If you don't have any prior knowledge of what forces dominate in your complex, we recommend using the balanced coefficients. If your complex is antibody-antigen, we recommend using our antibody mode.

I only see my receptor in my results. Where is the ligand?

This is probably due to your molecular viewer. Some molecular viewers do not have support for multiple PDB entries in one file. There are two choices for how to proceed. The first is to switch to a molecular viewer that supports multiple entries in a single file like PyMOL. The second choice is to split the model file into receptor and ligand and load those independently into your viewer. On Linux and Mac OS X, you can do this by calling file=model.000.00.pdb;csplit --prefix=${file/pdb/} --suffix-format="%02d.pdb" $file %HEADER% /HEADER/ , substituting your chosen model for model.000.00.pdb. This should give you two files, model.000.00.00.pdb and model.000.00.01.pdb, that contain the receptor and ligand respectively. On Windows, you can open the file in Notepad or Notepad++, searching for the END record in the middle of the file and manually copy the two halves into separate files. Alternatively, simply removing the lines that say END may allow your viewer to load both the receptor and ligand into one object. On Linux and Mac OS X, this can be done by calling grep -Ev '^(HEADER)|(END)' model.000.00.pdb > model.000.00.stripped.pdb . On Windows, you can manually remove those lines in one of the text editors mentioned above. (If you have a simple way to do either of these that is built into Windows, we would love to hear about it.)

What is Piper and what is ClusPro? How does this version differ from the previous ClusPro?

Piper is the FFT-based rigid docking program developed in our lab. It provides 1000 low energy results to our clustering program, ClusPro to attempt to find the native site under the assumption that it will have a wide free-energy attractor with the largest number of results. The previous version of ClusPro used a similar clustering algorithm, but obtained 2000 results from other docking programs, not Piper.

What are these Model Scores? Should I use them to rank my results?

We only provide the scores coming from Piper for our models because a large number of people have asked for them. Our experience shows that the best way to rank models is by cluster size, which is how the models are ranked coming out of Cluspro. This is the method we've used to great success in CAPRI and on various protein docking benchmarks.

As a brief explanation, the way ClusPro works is:

We rotate the ligand with 70,000 rotations. For each rotation, we translate the ligand in x,y,z relative to the receptor on a grid. We choose the translation with the best score from each rotation.
Of the 70,000 rotations, we choose the 1000 rotation/translation combinations that have the lowest score.
We do a greedy clustering of these 1000 ligand positions with a 9 angstrom C-alpha rmsd radius. This means we find the ligand position with the most "neighbors" in 9 angstroms, and it becomes a cluster center, and its neighbors the members of the cluster. These are then removed from the set and we then look for a second cluster center. And so on.

Note that in step 1, we sample around 10⁹ positions of the ligand relative to the receptor. From this 10⁹, we choose 1000 or 10³ positions. That means these 1000 are in the top millionth of all positions of the ligand relative to the receptor. At this level, the scoring function is too rough to discriminate meaningfully between these 1000. The scoring function's purpose is to pull them out of the 10^9 positions we started from.

In summary, we strongly encourage you to not judge models based on these scores because that is not what the scoring was designed for.

I want to submit a lot of jobs. Is there a way I can do that?

ClusPro should only be used for noncommercial purposes.
Vajda Lab and ABC Group
Boston University and Stony Brook University