Sequence Conservation Display in GRASP (and O)


Preparing your data

In order to show sequence conservation we have to have a multiple sequence alignment. Make one in DNASTAR (VECTOR NTI might work, but I haven't tested it). Export it from DNASTAR using the GCG/PILEUP format. Transfer this file to the SGI.

Check that the top of the file looks reasonable. There seem to be a couple of format variations but it always looks something like this:

humancullins.msf  MSF: 847   Type: P Wednesday, September 27, 2000 Check: 8585 
..
cul1human   Len: 847  Check:  7781    Weight: 1.00
cul2human   Len: 847  Check:  4572    Weight: 1.00
cul3human   Len: 847  Check:  9282    Weight: 1.00
cul4Ahuman  Len: 847  Check:  6545    Weight: 1.00
cul5human   Len: 847  Check:  405    Weight: 1.00
//
            1                                                                          80
 cul1human  MSSAA..TRSQNPHGLKQIGL.........DQIWDDLRAGIQQVYTRQSMAKSRYMELYTHVYNYCTSVHQFVGLELYKR
 cul2human  MSL....KP.......RVVDF.........DETWNKLLTTIKAVVMLEYVERATWNDRFSDIYALCVAYPEPLGERLYTE
 cul3human  MSNLSKGTGSRKDTKMRIRAFPMTMDEKYVNSIWDLLKNAIQEI.QRKNNSGLSFEELYRNAYTMVLHKH...GEKLYTG
cul4Ahuman  M...........................................................................LYKQ
 cul5human  MAT.........SNLLKNKG.SLQFEDK.....WDFMRPIVLKLLRQESVTKQQWFDLFSDVHAVCLW.DDKGPAKIHQA

Nikola has written a program to calculate sequence conservation from a multiple sequence alignment and write O and GRASP macros. The program is simple and brutal, but nevertheless provides a good starting point to make your own macros. The program ignores sequence positions that are insertions on the first sequence in the file. Typically the first sequence would be your known structure and the rest of the sequences would be orthologues.

It is wise to choose the sequences in your alignment carefully - if you have a whole bunch of sequences that are nearly identical (e.g. mammalian) then your sequence conservation might nearly always be >80%, which doesn't make for a good figure.


Generating the macros

The executible for the program is ~xtal/bin/megalign_conservation.

Run this program. It will prompt for:

The last input option is important because the program implicitly counts from residue 1 in the sequence, and if that residue is not residue 1 in your PDB file you have to tell it so or the macros will not work.

The O Macro

The O macro file written out by megalign_conservation looks something like this:
 paint_zone con    5   5 cyan 
 paint_zone con    7   7 cyan 
 paint_zone con    8   8 cyan 
 paint_zone con   12  12 cyan 
This macro is written assuming your object is called "con". If it is not called con you can either create one, or use your friendly text editor to replace "con" with the actual name of the object in the macro file. Either way it is simple to invoke the macro:
@macro.file

The GRASP Macro

The GRASP macro file written out by megalign_conservation looks something like this:
Macro Name: cons
line: rn=   5, c=4
line: rn=   7, c=4
line: rn=   8, c=4
line: rn=  12, c=4
       .
       .
 (lines deleted)
       .
       .
line: vc=a
Macro End
In GRASP-speak this means: "color residue number X as color 4". You can obviously change the color number of you want. The file defines a macro name "cons" with the "Macro Name" and "Macro End" commands. In GRASP you can do READ:GRASP MACRO to read this macro file into memory, and then execute it from the MACRO menu (it's name will be cons) when you want. Or you can do the usual trick of cutting and pasting the data lines into your own macro file derived from the session.record. Remember to include the vc=a line which tells grasp to color the surface vertex color from the atom color.

One unpleasant little quirk is that GRASP ignores any line in the macro which begins with a space.

Sequence conservation projected onto an existing structure

Sometimes you want to do something a little more elaborate: e.g. calculate sequence conservation in Cul2 orthologues and project this conservation back onto the known Cul1 structure. In order to do this you need to generate a multiple sequence alignment with the known structure (Cul1) as the first sequence and the rest of the (Cul2) sequences aligned onto it. If you don't want to include the known structure in the sequence conservation calculations you can use ~xtal/bin/megalign_project_conservation which calculates the sequence conservation excluding the first sequence. Obviously you need at least 3 sequences in the file for this to make any sense at all. Apart from ignoring the first sequence in the alignment it works the same as megalign_conservation.

You can get the source code from Phil if you want to hack your own versions.

Phil Jeffrey, Sept 2002.