Michelanglo — Documentation

Starting with a gene


Getting a structure

You can search for structures corresponding to your gene name in the name query page. However, this is a simple form, so for more extensive descriptions and details for crystal structures use the PDB

Tips when using the PDB

Often there will be either none or a couple.

When you click on one, keep an eye out for the organism. Also look at the feature view (e.g. ID:P42212 ), this provides a really nice overview of your protein, showing what the lengths are of the crystals. If you cannot find your protein use your uniprot ID and add it to the end of the URL www.rcsb.org/pdb/protein/. At the bottom is the premade Swissmodel structure based on homologues and will be longer that the crystal structures, but is a threaded structure, so cannot be fully trusted without checking whether any observations are true for the template model.

Once you know the PDB code you are after, head to Michelaɴɢʟo: PDB conversion and enter your PDB code (four letters) or upload the PDB file and then choose the view that suits best.

If you want to show one or more mutations go to edit (pencil button in the description card visible when logged in) and then press the button that says "Make mutations". Enter the chain and the mutations in the form M1W or A2D etc. separated by spaces. The program will do the rest.

In the case of crystal structures, often the protein with bound partners are found. The identity of each chain can be found in the PDB entry in the "Structure summary" tab (first one) in the card "Macromolecules". Sometimes protein from different organism are bound to interesting protein and it may be worth using that instead. Which bound macromolecule or small molecule is up to you, but if in doubt, check the Uniprot entry or better still the literature to find out what the proteins are.

Modelling

If you don't find a structure, there are several servers that model protein for you. First, determine the domains by looking in Uniprot or PDB protein feature view, because no modelling program deals with sequences larger than 500 amino acids —a lazy way of getting the sequence of a region is to click on a range for a domain in Uniprot and changing the numbers in the URL. I-TASSER is tool that consistently wins a model predicting challenge (CASP), but is slow (2-3 days). Phyre2 is a lot quicker and also recently they have released a set of over 1,000 computed structures. EVFold is best for totally unknown structures, because it uses covariance to predict what should be close to what. Namely, Normally parts of a structures are threaded (the residues on the template are simply replace) or are computed ab initio using forcefield calculations. About the latter, totally unknown genes are wholly imprecise, but the accuracy is increased by using the assumption that residues that change together are likely close. Do note that if you are opting for a model, keep track of what the closest template is and what is the root mean square deviation (low single digit numbers is best) and make it clear with other users that you are using a model.

Other: add bilayers

If for illustrative purposes you want to add a lipid bilayer, I recommend using Charmm-GUI membrane builder and going a few steps in and stopping at solvating the molecule —Charmm is a MD simulator. Another feature offered is modifying residues by phosphorylation, chemical attack or linkage with a few cyanine dyes .

Not all common residue changes (e.g. methylation) are possible, for another approach (using Rosetta) see this post.