A Manual of sorts for Peek2

This program is about a decade old. I figured it was about time I wrote some documentation for it.

The latest version of Peek2 is always found in xray0/bin/peek2. This is the version I use, so any obvious bugs are fixed. There is now (finally) a version of Peek2 for Linux, now that I have found a replacement for qsort(), but it lacks the finer aspects of formatting. Nevertheless it should actually work.

Peek2 grew out of a very old and primitive program I wrote as a graduate student to manipulate molecular replacement solutions. It's ballooned a little in recent years (aka creeping featurism) to accommodate numerous requests for "features" within the program, to the extent that I cannot always remember what's in there. Peek2 is short, simple and brutal. It is not elegant, but it is designed to do a small number of things relatively well. Programs like 6D_LSQMAN do some of the things Peek2 does in a more elaborate manner.

PEEK2 IS NOT PUBLIC DOMAIN in either source code or executible form. I give out the executible for the SGI freely, but the source code is not public domain and you may not make any copies of it. The successor, peek3, in development as of writing, is completely public domain.

Peek2 has a simple parser. The command line is split into words separated by spaces or other "white space" and Peek2 acts on one word at a time. Exceptions to this are when you supply things like segment IDs or chain labels, when Peek2 will take just the first word or first character and disgard the rest. Generally however you can enter commands and parameters in one continuous line. Just read the output carefully to see if it worked.

Most commands can be abbreviated to 3 or 4 characters. Peek2 will warn you if what you type is ambiguous.

Peek2 maintains two coordinate "sets" simultaneously in memory. You'll really only ever need the second one if you want to do structural alignments. You can switch between them using the COORDSET command ("coord 1" or "coord 2"). The first coordinate set (coord 1) is the default. ALL OPERATIONS APPLY TO THE CURRENT COORDINATE SET . The command CURRENT shows you which coordinate set you are using.

You can READ and WRITE PDB files. Peek2 strips off all the non-ATOM and HETATM lines, which is not always it's most endearing feature. It doesn't change chain labels or residue numbers on read/write but it does recalculate the line number in the PDB file. If you try and read a file into a coordinate set that already contains coordinates peek2 will prompt you if you want to overwrite or append the file. The default is to overwrite the existing data with the new data.

Commands EXIT, BYE, STOP, QUIT all terminate the program. Hitting control-C (^C) works well and with no side effects, too.

There are some simple commands designed to be useful for molecular replacement. ROTATE rotates the molecule using one of several formats. It rotates the molecule around the origin. Rotate can also parse O-style datablocks in which case the entire transformation is applied - rotation and translation. TRANSLATE translates the coordinates in the Cartesian reference frame - there is no facility for fractional or axial translations. CENTRE moves the centre-of-gravity of the molecule to the origin (and also reports the previous C-of-G). Since the first four characters are unique you Americans can type "center" without it causing a problem. It applies equal weight to all atoms - there's no weighting by atomic number. CA strips the coordinates to just the Calpha atoms. ALAGLY turns all non-GLY residues into ALA, and GLY remains as GLY. SPHERE deletes atoms based on a radius from the centre of gravity. SIZE gives you some information on the extent of the coordinate data.

The two commands to select ranges with the coordinate data are INCLUDE and DELETE. Peek2 allows multiple ranges of coordinates to be selected, just specify each one on a new command line and terminate the list with a blank line. Range specifications can (must) include the chain designator and the start and terminus of each range must have the same chain label. The specification "A100" means "residue 100 in chain A". "100" means "residue 100 with blank chain ID" not "residue 100 with ANY chain ID".

The CHAIN command manipulates chain labels (CNS removes them during refinement). The SEGID command manipulates segment IDs (columns 73 to 76 in the PDB file), an X-PLOR and CNS-specific concept that overlaps partially with the old PDB standard. Similarly, DNAPDB and DNACNS flip between the PDB and CNS conventions for DNA/RNA. REMOVEH removes the hydrogens that X-PLOR puts on a coordinate file (although CNS doesn't do this by default any more). SPLIT splits a coordinate data into separate files by segment ID - the filenames are derived from the segment ID itself - this used to be frequently used prior to the generate procedure for X-PLOR. This is also not so useful for CNS, which is smart enough to read them all in one file. CSPLIT does the same thing but on chain labels.

SEGID and CHAIN reprompt at what Peek2 considers chain breaks. Chain breaks are determined empirically - if the segment ID or chain label changes, that's a chain break. Also, if the distance between successive Calpha atoms is too large, then Peek2 also reprompts. If you've done a bit of rebuilding and haven't been very careful about the main-chain (e.g. with lego_ca) then Peek2 will probably find more chain breaks than you think exist. ADDTER adds X-PLOR/CNS style C-termini to any breaks that Peek2 finds. Use with caution.

RENUMBER performs renumbering of the PDB file, reprompting at chain breaks. By default the order is sequential. REVERSE will reverse the number order of selected ranges of amino acids, including resorting the file, but you still have to fix the backbone via lego_ca in O. (The equivalent command in peek3 uses my DGNL library to fix this problem).

BFACTOR shows B-factor statistics for the entire coordinate set including highest and lowest B-factors. MEANB shows residue-by-residue breakdowns of this data. BADD allows you to add (or subtract) an overall B-factor value, and BSET allows you to change all the B-factors to the specified value.

Peek2 has a couple of sanity-check commands. OCCUPANCY lists all non-unitary occupancies in the PDB file, and DUPLICATE looks for duplicate atom entries that usually screw up CNS. LISTRESIDUE just lists all the residues in the set. SEQUENCE shows the sequence in triple letter code, whereas SINGLE does the same thing in single letter code (resembles FASTA format).

ALIGN is a simple fast implementation of the Kabsch alignment routine. ALIGN requires the residue numbers and chain labels to be the same. This is naive and stupid of it, but a lot of issues with that can be fixed using the CHAIN and RENUMBER commands. For this option you need two coordinate sets, one of which you designate "fixed" and the other one ("moving") is superimposed onto it. Peek2 lists all sorts of arcane information in the output about the transformation including an O datablock that can be used with the RAVE suite, and an ODL block that can draw the axis via the O "draw" command. The newest versions of Peek2 silently write the file ".o_ncs" which contains the O datablock suitable for use in (R)AVE. The command DEVIATION shows the deviations between Calpha atoms of aligned coordinate files.

Finally, the ? command provides a simple help listing designed to jog your memory of the options available in Peek2. Here it is:

COORDSET    - change coordinate set (1 or 2)
CURRENT     - show current coord set (1 or 2)
  [all actions apply to the *current* coord set]
READ        - read a PDB file
WRITE       - write a PDB file
SPLIT       - write PDB file(s) based on SEGID
CSPLIT      - write PDB file(s) based on CHAIN
?           - this text
EXIT, BYE   - various ways to exit
STOP, QUIT
ROTATE      - rotate a coordinate set
TRANSLATE   - translate a coordinate set
CENTRE      - centre a coordinate set at (0,0,0)
REMOVE      - remove hydrogens (X-PLOR style)
DELETE      - delete coordinate ranges
INCLUDE     - include only specified ranges
CA          - reduce set to just Calpha
ALAGLY      - reduce set to Ala/Gly
SPHERE      - delete atoms based on radius
CHAIN       - change chain labels
SEGID       - change segment labels (X-PLOR,CNS)
RENUMBER    - renumber a coordinate set
ALIGN       - simple alignment method
DEVIATION   - show deviations by residue
BFACTOR     - show certain B-factor stats
MEANB       - per-residue B-factor stats
BADD        - add/subtract B-factors
BSET        - set B-factor to single value
OCCUPANCY   - show non-unit occupancies
LISTRESIDU  - list residues
SEQUENCE    - show sequence in triple code
SINGLE      - show sequence in single code
ADDTERM     - add X-PLOR-style C-termini
DNAPDB      - switch to PDB DNA convention
DNACNS      - switch to CNS DNA convention
REVERSE     - reverse order of amino acids
SIZE        - show coordinate dimensions
DUPLICATE   - look for duplicate atom entries

[there are undocumented commands best avoided]

Enjoy.
Phil Jeffrey, June 2002 - modified Jan 2006.