The latest bleeding-edge version of annotate is found in xray0/bin/annotate. Unlike the rest of the Ogre suite, you cannot copy Annotate and redistribute it unless you get permission from Nikola - his algorithm for structure propensity is not public domain.
Annotate stores data in objects. Objects have no mystical property, they are just the thing that Annotate stores the data in. Each object has a name, assigned automatically, that can be used to distinguish between multiple objects. There can be multiple objects in Annotate but at the present time I just recommend dealing with one at a time. A new object is created each time you invoke the read command, so each sequence alignment is in a different object.
Each time more derived data (DSSP, PHD output etc.) is added to a sequence object, it is put in a property. The name of the property is supposed to be related to what the data is, but the creativity of the programmer is always limited. Properties are associated with each sequence within an object, not just with the object itself, so you can load and store multiple PhD runs for multiple different sequences within an object. If you load multiple PhD runs for the same sequence, then multiple properties will be created, probably with the same name, and mayhem may result.
Annotate does not do sequence alignments so there is no way to update or build a multiple sequence alignment within the program. Use CLUSTALX or DNASTAR for that purpose. This was a design decision within the program, since multiple sequence alignments are a challenge and still major research projects for bioinformatics labs. Unless I wake up one morning and have huge insight as to how to implement multiple-sequence alignments, this lack-of-feature will remain a feature of the program.
read     | Read a sequence/alignment |
---|---|
exit     | Exit the program |
quit     | Exit the program |
?     | Show help info |
help     | Show help info |
No other commands are shown because they are unavailable: you don't have any sequences read into the program so there's no data to act on. Once you read data via the read command the more extended list is:
average | Average sequence properties |
---|---|
conservation | Calculate sequence conservation |
dssp | Annotate from DSSP results |
dump | List ALL object data |
excel | EXCEL-style data dump |
kill | Delete current object |
list | List object sequence data |
npp | Calculate structure propensity |
omac | Write Grasp and O macros |
pdj | Phil's tweaked structure propensity |
phd | Annotate from PHD results |
plot | Write data to plot file (portrait) |
lplot | Write data to plot file (landscape) |
read | Read a sequence/alignment |
start | Define sequence start number |
write | Write a sequence (alignment) |
end | Exit the program |
exit | (ditto) |
quit | (ditto) |
? | Show help info |
                         | |
---|---|
Read filename | Reads sequence data in a new object. |
Formats: MSF/PILEUP, FASTA, BLAST, PHD | |
The PHD format extracts the MAXHOM alignment from PHD outputs, but does not append the PHD and PROF data. FASTA is designed to read single sequences, although it will read multiple sequences within a file. It cannot align those sequences, however. BLAST attempts to extract multiple sequence alignments from BLAST web output. It cannot handle HTML tags. | |
Start number | Define the sequence start number. |
Often multiple sequence alignments start at 1 by default, and Annotate tries to extract the start number anyway, but you can always attempt to redefine the start point if you know better. | |
Kill | Deletes an object and all it's contents. |
It is not necessary to do this before exiting the program, so it's not obvious why you would want to do this ;) |
Reading derived data into the program
                         | |
---|---|
Phd filename | Reads PhD and PROF data from a PHD run. |
Attempts to deal with the vagaries of the format. Currently does not handle HTML format output correctly. I am working on this feature. Creates properties called obj.phd_aa (sequence), obj.phd (secondary structure prediction), obj.phd_rel (reliability of prediction), obj.phd_prH, obj.phd_prE, obj.phd_prL (probabilities of helix, strand, loop). The "obj" part is replaced by the name of your sequence object. Also now extracts PROF data from the same file and stores obj.prof (secondary structure prediction) and obj.prof_rel (reliability of prediction). The phd_aa property will probably vanish - it was created for debugging purposes. | |
Dssp filename | Extracts the deduced secondary structure from DSSP |
Stores it as obj.dssp_ss. Also stores sequence from the file as obj.dssp_aa, but this is a debug feature that will ultimately disappear. |
Calculating derived data within the program
                                                 | |
---|---|
Npp exponent | Nikola's structure propensity plot. |
You need to supply the exponent value for the smoothing (default is 1.2). Creates obj.sp_smooth for the smoothed data and obj.sp_smoothexp for the exponentially smoothed data. The version in this program corresponds to 3.6 of Nikola's stand-along program. | |
Pdj exponent | Phil's tweaked structure propensity plot. |
Creates obj.pdj_smooth for the smoothed data and obj.pdj_smoothexp for the exponentially smoothed data. This is my somewhat tweaked version of Nikola's algorithm and is very much a developmental version. Currently the parameters are much the same and do not differ too much from the results of npp. | |
Average [npp|phd|prof] sequence | Average sequence properties |
Associates averaged values with the reference sequence. One can average structure propensity (NPP), PhD (PHD) and Prof (PROF) properties. It correctly handles insertions. It inherently averages all applicable properties in all sequences in the object. For PhD and Prof averaging the averaged secondary structure is that which has the greatest summed probability across the sequences. | |
Conservation sequence | Calculate sequence conservation. |
There are two styles for doing this: either with respect to the whole object or with respect to a reference sequence. Produces obj.conservation (numerical % conservation across all sequences), obj.ref_cons (numerical % conservation with respect to reference sequence), obj.consensus (most populous amino acid at that position), and obj.identity (ASCII representation of 50, 75 and 100% identity as . : and *). |
Output
                         | |
---|---|
excel sequence | Writes an EXCEL-style data dump |
Comma-delimited with sequence numbers relative to a reference sequence. Mostly tweaked to Nikola's preferences, since he was the one who requested the feature. | |
plot | Write data to plot file as Postscript |
In portrait orientation. I strongly recommend using landscape instead. See below. Creates test.ps. | |
lplot | Write data to plot file as Postscript |
In landscape orientation. Writes sequence data as characters, character data as characters, but writes numerical data in a simple graphed format. Highlights identical sequences but otherwise does no additional data post-processing. Creates test.ps. | |
omac | Write Grasp and O macros |
To color your molecule based on sequence conservation. Conservation cutoffs are prompted for. Could be more user-friendly. | |
list | List object sequence data |
Just lists the sequence alignment. Good for sanity checking. | |
dump | Crude data dump |
Dumps all sequences and properties. Mainly for debugging, this could also be processed by a program of your own. |
Read a multiple sequence alignment and list it to the screen:
xray0/bin/annotate << EOF read tests/cullins.gcgpileup pileup list quit EOF
Read a multiple sequence alignment and calculate structure
propensity:
xray0/bin/annotate << EOF read tests/cullins.gcgpileup pileup list npp 1.2 dump quit EOF
Add some PhD/PROF output to the above:
xray0/bin/annotate << EOF read tests/cullins.gcgpileup pileup list npp 1.2 phd tests/cullins.phd dump quit EOF
Finally do something useful and plot it out:
xray0/bin/annotate << EOF read tests/cullins.gcgpileup pileup list npp 1.2 phd tests/cullins.phd lplot quit EOF
Read a MAXHOM alignment from PhD, add the PhD/PROF predictions,
add structure propensity and plot it:
xray0/bin/annotate << EOF read tests/mouse_pred.txt phd phd tests/mouse_pred.txt list npp 1.2 lplot quit EOF