Originally created May 10th 2002
Revised Sept 16th 2002 to include MAD examples, another spot profile
Sundry small revisions Feb 2005, more in Nov 2006
Introduction
This is not meant to teach you anything about how to collect data on home or synchrotron sources. This is intended to teach you how to process the data once you have collected it. Collecting 180 degrees of data "just in case" is not a substitute for using one's brain and experience for figuring out how to collect it better.It is a very good idea to process your data while you are collecting it. Frequently you can catch errors in data collection strategy or machine problems by doing this. If you wait until the end of the data collection you have no recourse - you are stuck with what you have because your crystal has long since died from radiation damage.
There is an
alternative data processing tutorial accessible on the Web.
. Or at least there was.
Where are my frames ?
At the synchrotron: Best run a few test frames before you start and see where it is writing them. Beamline policies vary. At NSLS X25 and X29 the latest "upgrades" to CBASS do something asinine like write successive test images collected at 0 and 90 degrees into different directories. (Heaven knows what the software developer was thinking). However contiguous data collection wedges are usually written into one directory. This might not have the numbering you expect it to, so the best thing is to watch the status window and look for something like /img11/data1/pxuser/myname/ ...At Princeton: That's a good question. They're usually where you put them when you FTP'd them off the host machine. If there's space, put them somewhere on Xray8 temporarily, say in /usr/people5/data/public, then back them up and remove them once you have processed the data. Xray8 is currently the fastest box that can do data processing, and it's a lot faster if the images are on the local disk because I/O over the network is s-l-o-w. With the upcoming data collection machine upgrade, the images will be accessible on the data collection computer itself.
I have recently made some disk space available in the directory /usr/people5/data/public for the purposes of storing frames for data processing. Here's what you do to copy your frames:
cd /usr/people5/data/public find /collect -mtime -1 -type f -size +3000 cp /wherever/you/found/them/*.osc .the first line should be pretty obvious, the second line gives you a list of all recently created (-1 = within the last day) large (+3000 = more than 3Kblocks) files on /collect. Your frames should be amongst those. The last line simply copies the data from the desired location reported by find into the current directory (i.e. /usr/people5/data/public).
Auto-Indexing Your Data
What follows is specific to the HKL suite of programs, namely xdisp, denzo and scalepack. There are alternative data processing programs (e.g. MOSFLM, XDS) but we do not use them, or at least we're too lazy to try most of the time. One would expect very similar results from HKL and MOSFLM (or we're all in for a lot of data re-processing). Some people have suggested that MOSFLM can give slightly more complete data in difficult situations - it's certainly worth a try. Some people have suggested that DENZO and SCALEPACK are the worst of these three options, but the jury is still out on that one and the difference is generally not large even if it exists. HKL has a graphical user interface (GUI) on the program, repackaged as HKL2000. Being a Luddite I react to HKL2000 with something akin to loathing since it has some really ill-advised restrictions on how many times you can attempt to index your data that actually makes it about 10x less convenient to process troublesome data than just hacking the command files. There are also now meta-processing programs like autoPROC and Xia2 that use other data processing programs (and that's usually XDS or MOSFLM). XDS and MOSFLM are better choices than HKL if you don't already have HKL.The pattern of spots on the detector is dependent on basic detector geometry (how big it is, where it is with respect to your crystal), the X-ray wavelength, the direct beam position, the unit cell dimensions and the orientation of the crystal. Most of these things are known beforehand to some approximation, so the data processing software has to find a total of nine parameters (6 unit cell dimensions, 3 crystal "missetting" angles) with which to best describe the diffraction pattern. A method called "auto-indexing" finds these parameters. The missetting angles tell the programs what angle your crystal is sitting at relative to some standard position. Often you know your cell dimensions which turns out to be a good sanity check for the results.
I'm assuming here that you know that a "spot" on the detector is nothing more than the detector intercepting a beam of scattered X-rays that is locally intense because of X-ray diffraction and the underlying scattering by the crystal. The "spot" in real space is related to a point (region) in diffraction space that is formed from the reciprocal space unit cell. Once the crystal is rotated such that the reciprocal lattice point is in diffraction condition, where the spot occurs is a matter of simple geometry. Since the relationship between the real space unit cell and reciprocal space unit cell is fixed, it's also a matter of simple geometry (and the Ewald sphere construction) to work out what spots are in diffraction condition.
The first step in figuring out what your unit cell and crystal missetting angles is to pick spots to tell Denzo where the diffraction maxima are on the frame. Launch XDISP to view your frame with the following syntax:
Detector | Command |
---|---|
Raxis IIc | xdisp raxis name_of_file |
Raxis IV (e.g. home) | xdisp raxis4 100 name_of_file |
Quantum 4 CCD (e.g. CHESS) | xdisp ccd adsc unsupported-q4 name_of_file |
MAR 165 CCD (e.g. the old X9A) | xdisp ccd unsupported-m165 name_of_file |
ADSC Q315 (e.g. X29) | xdisp ccd unsupported-q315 binned name_of_file |
XDISP is /programs2/hkl/HKLsgi_1.97.2/xdisp on the SGIs, and /programs2/denzo/xdisp on Helium. If you've done source /labcshrc in your .cshrc, the command sethkl7 will create the appropriate aliases.
The screen on XDISP looks like this:

This is a synchrotron frame from NSLS X9A - the white horizontal band through the middle is the beamstop holder shadow. The white central disk is the beamstop shadow - synchrotrons typically have horizontal rotation axes and therefore horizontal beamstops. Atypically for a CCD detector the Mar-165 detector is round (they are usually square). At home the beamstop would be oriented vertically (vertical rotation axis) and the detector is square.
There are various options on this screen, not all of which are important here. You may prefer the color representation (color button) to the black-and-white representation.

Adjust the view of the frame to your taste using the "Dim" and "Bright" buttons. You can take a closer look at areas of the image using the zoom window and you select areas to be looked at using the middle mouse button.

Middle mouse works in both the main window and the zoom window. "Zoom In" and "Zoom Out" are obvious. If you zoom in far enough the actual numeric pixel values are displayed. For the purposes of auto-indexing, you want to press the "Peak Search" button and select "More Peaks", "Fewer Peaks" or "OK" depending on how many it finds. Generally a few hundred peaks is sufficient to get auto indexing to work OK. Red circles surround the peaks selected. It's OK to have peaks that are not on diffraction spots - Denzo is good at sorting out the real peaks from the bogus ones.

Leave the XDISP window open, and start Denzo. You may want to put XDISP into the background (via ^Z, bg or the & syntax) to run Denzo in the same terminal window. Note that the resolution estimates in XDISP are wrong until you do auto-indexing, since XDISP knows nothing about the experimental setup until Denzo tells it something.
Denzo needs to know quite a lot about your experiment to get started. Below is an annotated "index.dat" file that can be read into Denzo. Comments in Denzo are enclosed within square parentheses []. Everything else in Denzo is a command line. Start indexing by the following procedure ($ represents a Unix command line prompt here):
$ denzo @index.datThe "@" syntax tells Denzo to read command data from the file. Denzo is /programs2/hkl/HKLsgi_1.97.2/denzo on the SGIs, and /programs2/denzo/denzo on Helium and something else altogether at X9A/X29/X25/CHESS.
Here are the contents of index.dat for an Raxis IV:
format raxis4 [ Tell Denzo what type of detector you are using ] [ This also sets default parameters for detector and ] [ goniostat geometry etc ] monochromator filter [ This is the type of X-ray optics i.e. mirrors ] [ Relatively constant values for a given machine ] wavelength 1.5418 [ Wavelength of X-rays. 1.5418 is CuKalpha ] Y scale 1.000 [ The non-squareness of pixels, in this case square] film rotation 180.0 [ Relative orientation of the detector ] [ Detector values that may vary for a particular experiment ] distance 204.5 [ Distance from the crystal to the detector ] X beam 145.2 [ Where the direct beam would strike the detector ] Y beam 149.4 [ These values may change with distance, 2theta ] [ Things that need to be optimized for each crystal ] spot elliptical 0.60 0.60 0.0 [ The size/shape of the actual spot ] background elliptical 0.70 0.70 0.0 [ Size/shape of guard region around spot ] box print 2.1 2.1 [ Size/shape of background box ] overlap spot [ Reject spots that overlap ] profile fitting radius 30.0 [ Parameter that affects how spot profiles ] [ are calculated ] [ Mosaicity is a per-crystal parameter that describes spot width in the ] [ oscillation direction - set a reasonable value here and change it based ] [ on the results of Scalepack after processing the first 10 frames ] mosaicity 0.4 resolution limits 20.0 3.2 [ Not critical, but set to reasonable value ] space group P422 [ If you know this enter it here, otherwise use P1 ] [ Indexing paramter tweaks only necessary in difficult cases - consult ] [ the Denzo manual [ longest vector 200 ] [ weak level 5 ] [ Where the frames are, what they are called, how wide they are ] oscillation range 1.0 [ 1.0 is typical ] oscillation start 0.0 [ Arbitrary - usually best to leave this at zero ] [ Some people prefer to change it to the actual phi ] [ Denzo inserts integer sector numbers into ### ] raw data file '/data2/ccd/mskcc/amoac12/l1p_1_###.img' film output file 'l1p_1_###.x' TITLE 'my data' sector 1 [ This is the frame number - usually we index the ] [ first frame of the dataset - must correspond to ] [ the frame number you picked spots from ] peak search file 'peaks.file' [ Tells Denzo to do auto-indexing ] write predictions [ Some commands to control the formatting of output ] print statistics print zones go [ Start the indexing ]
Change things like the direct beam position, the crystal to detector distance, frame width and name etc and run the script in Denzo. For parameters you don't know anything about, leave them at sensible defaults, and then go read the HKL manual as to what they mean. This particular indexing file works for the RaxisIV and RaxisIV++ detectors on our home sources. Indexing files for other locations (e.g. synchrotrons) differ somewhat, especially in detector type and geometry. This script file is the one I used for 1.97 Denzo. For versions of Denzo with HKL2000 it probably makes more sense to find the corresponding command file and edit that, since parameters might change with major version changes.
Auto-indexing is most sensitive to oscillation range, direct beam position, distance and wavelength. Make sure you have these parameters correct . Synchrotrons often have partially-transparent beamstops so you can often "see" the direct beam position. On a home source the direct beam position is usually listed on the machine. Synchrotrons also have notoriously incorrect detector distances so it is wise not to put too much stock in them. Unless you know what your space group or point group for this crystal is, set the space group to P1, at least initially.
Sometimes Denzo refuses to index. It gives you a candidate list of reasons why this happens, but often it's just user error - you tried to index a different frame than the one you picked spots from, or you have bad parameters in the index.dat file (wrong distance, wrong wavelength etc). If you did everything right, then try picking more or less spots and running indexing again. Or change the mosaicity. Notice that during indexing the spots that Denzo is using for indexing turn green in the XDisp display. If there are virtually no green circles shown on the frame that's a pretty good indication of trouble.
There is an alternative indexing server at http://adder.lbl.gov/labelit/ which you might try if you are desperate. However it's MOSFLM oriented so you'll have to convert between MOSFLM and HKL conventions (there's a denzo2mosflm program on the web, but not the inverse program so you're going to have to do some hacking).
The output from Denzo is as follows (just to be awkward I used a different indexing file from the example above):
Oscillation data processing Title: Mar CCD data Wavelength (A) 1.0000 Raster size (mm) 7.93480E-02 Raster size (mm) 7.93480E-02 Film width (mm) 162.50 (default) Film length (mm) 162.50 (default) Record length (pixels) 2048 (default) Number of records 2048 (default) Top limit of useful data 0.00000E+00 (default) Left limit of useful data 0.00000E+00 (default) spots rejected when pixel overflow at value : 64000.0 pixels rejected at value: 0 Oscillation starts at 0.00000E+00 Oscillation range 1.0000 Lattice type: primitive Orientation axis 1 (vertical plane) 0*h 1*k 0*l Orientation axis 2 (spindle) 0*h 0*k 1*l Mosaicity 0.25000 CrysZ (beam) axis 0.00000E+00 (default) CrysY (vertical) axis 0.00000E+00 (default) CrysX (spindle) axis 0.00000E+00 (default) unit cell parameters not entered Detector (mis)orientation angles: CassZ (beam) axis 0.00000E+00 (default) CassY (vertical) axis 0.00000E+00 (default) CassX (spindle) axis 0.00000E+00 (default) Detector 2 theta 0.00000E+00 (default) Detector rotation -90.000 Flat detector (default) Detector to crystal distance 350.00 X beam 81.500 Y beam 82.400 Beam polarization 0.98000 Detector absorption 100.00 (default) Air absorption length 3450.0 Crossfire y 0.00000E+00 (default) Crossfire x 0.00000E+00 (default) Crossfire xy 0.00000E+00 (default) Horizontal box size 2.3804 Vertical box size 2.3804 Overlap type : none Raw data file /xtreme4/data2/raxis/X9A/ctmp4/ctmp4_2_001.img Error increase due to pixel overflow 1.58696E-02 (default) Error in measurments of optical density 0.15000 Minimum positional error 5.00000E-02 Error increase when too close to X axis 0.20000 (default) Error of partiality 0.10000 Systematic error factor 5.0000 Resolution in the corner 3.153 edge 4.334 half corner 3.605 Highest resolution 2.8000 Lowest resolution 30.000 Spot too weak for refinement when below sigma * 3.0000 Beam spot not used in refinement (default) Profile fitting radius 20.000 peak search file peaks.file has 192 peaks Vector lengths in autoindexing from 25.0 to 667 Angstroms Volume of the primitive cell 1560777. Lattice Metric tensor Best cell (symmetrized) distortion index Best cell (without symmetry restrains) primitive cubic 95.80% 58.11 58.38 515.01 90.22 89.67 63.31 210.50 210.50 210.50 90.00 90.00 90.00 I centred cubic 130.58% 517.95 61.13 518.09 87.11 169.02 92.83 365.72 365.72 365.72 90.00 90.00 90.00 F centred cubic 128.39% 61.13 99.161031.28 89.92 92.87 90.30 397.19 397.19 397.19 90.00 90.00 90.00 primitive rhombohedral 2.07% 518.07 517.95 515.01 6.44 6.78 6.46 517.01 517.01 517.01 6.56 6.56 6.56 59.75 59.751547.65 90.00 90.00 120.00 primitive hexagonal 1.61% 58.11 61.13 515.01 90.53 89.67 121.43 59.62 59.62 515.01 90.00 90.00 120.00 primitive tetragonal 11.85% 58.11 58.38 515.01 90.22 89.67 63.31 58.24 58.24 515.01 90.00 90.00 90.00 I centred tetragonal 11.93% 58.11 58.381031.28 88.43 91.44 63.31 58.24 58.241031.28 90.00 90.00 90.00 primitive orthorhombic 11.85% 58.11 58.38 515.01 90.22 89.67 63.31 58.11 58.38 515.01 90.00 90.00 90.00 C centred orthorhombic 0.25% 61.13 99.16 515.01 89.94 89.47 90.30 61.13 99.16 515.01 90.00 90.00 90.00 I centred orthorhombic 11.93% 58.11 58.381031.28 88.43 91.44 63.31 58.11 58.381031.28 90.00 90.00 90.00 F centred orthorhombic 1.19% 61.13 99.161031.28 89.92 92.87 90.30 61.13 99.161031.28 90.00 90.00 90.00 primitive monoclinic 0.22% 58.11 515.01 58.38 90.22 116.69 90.33 58.11 515.01 58.38 90.00 116.69 90.00 C centred monoclinic 0.13% 61.13 99.16 515.01 90.06 90.53 90.30 61.13 99.16 515.01 90.00 90.53 90.00 primitive triclinic 0.00% 58.11 58.38 515.01 90.22 90.33 116.69 autoindex unit cell 57.25 57.25 514.99 90.00 90.00 120.00 crystal rotx, roty, rotz -2.261 53.683 -91.468 Autoindex Xbeam, Ybeam 81.63 82.60 position 179 chi**2 x 3.26 y 8.38 pred. decrease: 0.000 * 179 = 0.0 partiality 179 chi**2 4.16 pred. decrease: 0.000 * 179 = 0.0
I've highlighted the "interesting" parts in red. The top of the output just reiterates what you've put in with index.dat, plus showing all the default values Denzo has set that you've not modified. If you are having real problems indexing look through that section carefully to check for weird parameter values. The most interesting part of the output usually is the lattice table. Primitive triclinic is always a valid indexing for a lattice. Denzo lists the primitive auto-indexing cell it finds under "primitive triclinic" at the bottom of the lattice table.
In the remainder of the table, Denzo tries to fit this primitive triclinic cell into the symmetry constraints in other lattice systems (e.g. in primitive hexagonal, a=b, alpha=beta=90, gamma=120). The metric tensor distortion index is a measure of how much Denzo has to mangle the primitive triclinic cell to do this. Low numbers mean that the indexed lattice is consistent with a lattice type. High numbers mean that it is not.
The actual values of "good" % numbers depend on how accurate your direct beam position etc. was, and how strong your data is. Generally anything less than 2% is a good candidate. Anything less than 3% should at least be considered. In this case the true space group is P6122 (primitive hexagonal) with a distortion value of 1.61% (a little high - that large unit cell dimension doesn't help).
Note that Denzo only looks at the physical dimensions of your lattice . Since you haven't integrated the data it has no idea what the intensities are, so it cannot know which of the possible lattice types are the correct one for your data. You can only test data symmetry upon scaling later on with Scalepack. The best approach is to integrate 10-15 frames using the highest symmetry lattice that gives a low distortion index, scale it, and if this doesn't work then try something further down the list. Note that all data is compatible with P1 so if the data does not scale in P1 something has gone awry. For large cell dimensions as in the above example, the correct direct beam position is essential.
Once you have selected what you think your lattice is, run auto-indexing again with a space group consistent with this lattice. It doesn't really make much difference which of the several possible space groups you choose within a given lattice, since Denzo only worries about lattice dimensions, not the symmetry of the data.
Lattice system | Space groups | Point Group(s) |
---|---|---|
Primitive cubic | P23, P213 | 23 |
P432, P4132, P4332 | 432 | |
I centred cubic | I23, I213 | 23 |
I432, I4132 | 432 | |
F centred cubic | F23 | 23 |
F432, F4132 | 432 | |
Primitive rhombohedral | R3, R32 | 3, 32 |
Primitive hexagonal | P3, P31, P32 | 3 |
P321, P3121, P3221 | 32 | |
P312, P3112, P3212 | 32 | |
P6, P61, P62, P63, P64, P65 | 6 | |
P622, P6122, P6222, P6322, P6422, P6522 | 622 | |
Primitive tetragonal | P4, P41, P42, P43, | 4 |
P422, P4122, P4222, P4322, P4212, P41212, P42212, P43212 | 422 | |
Primitive orthorhombic | P222, P2221, P21212, P212121 | 222 |
C centred orthorhombic | C222, C2221 | 222 |
I centred orthorhombic | I222, I212121 | 222 |
F centred orthorhombic | F222 | 222 |
Primitive monoclinic | P21, P2 | 2 |
C centred monoclinic | C2 | 2 |
Primitive triclinic | P1 | 1 |
Only space groups compatible with chiral molecules (proteins etc) are listed. Look at the number of potential space groups in primitive hexagonal ! There are actually four point groups here: 3, 32 (i.e. 312 and 321) 6, 622 each with it's own set of space groups. We'll discuss how to distinguish them later, but the point is that it doesn't matter which one you pick since the lattice indexing and geometry is identical between all possibilities - symmetry constraints on intensity only come into play in Scalepack.
The nine parameters that we were originally in search of are listed as the six from "autoindex unit cell" and three from "crystal rotx, roty, rotz". Check that the autoindex Xbeam and Ybeam correspond relatively closely to the ones you used in index.dat - usually if they differ considerably the autoindexing has failed.
Now we need to test that the lattice really is consistent with the lattice system we pick. We do this by improving the fit between the observed diffraction peak positions and the corresponding predicted ones. To do this we use a refinement procedure embodied in the file integrate.dat:
start refinement refine partiality use partials position weak level 10. print no profiles resolution limits 20. 4.6 fix all fit crystal rotx roty rotz fit x beam y beam distance go go go go go go resolution limits 20. 4.6 go go go go go go fix distance fit cell go go go go go resolution limits 20.0 4.0 [fit cassette rotx roty - probably not necessary to refine this] go go go fix cell fit distance [fit Y scale - do not refine this ] go go go go go go go go go go go print profiles 1 1 go calculate go [ this actually tells Denzo to integrate the frame ] end of pack [ this increments the file counter and oscillation range ] [ in Denzo ready for the next frame ]
The syntax of this file should be pretty clear - we slowly increase the resolution of the data to be refined against, slowly increasing the number of parameters to be fit. The position Chi**2 ("chi squared") gives an idea of how good the fit is: if chi**2 is at or around 1.0, then the agreement between observed and calculated spot positions is as expected (this is good). If chi**2 is << 1.0 then the fit is "too good" and this is unusual. If the chi**2 is >> 1.0 then the fit is poor and you should look very closely to see if the lattice is indeed correctly predicted. The partiality Chi**2 should also be close to unity, but this has to do with the width of the peak along the direction of rotation (i.e. mosaicity/beam divergence).
The progress of the fit through refinement is taken from the output of Denzo:
position 282 chi**2 x 1.71 y 5.15 pred. decrease: 0.000 * 282 = 0.0 partiality 282 chi**2 1.19 pred. decrease: 0.000 * 282 = 0.0 Highest resolution 4.6000 (input) Lowest resolution 20.000 (input) Spot too weak for refinement when below sigma * 10.000 (input) position 261 chi**2 x 0.74 y 0.63 pred. decrease: 0.000 * 261 = 0.1 partiality 283 chi**2 1.40 pred. decrease: 0.001 * 283 = 0.2 CrysZ (beam) -91.453 shift 0.020 error 0.017 CrysY (vertical) 54.814 shift -0.182 error 0.006 CrysX (spindle) -2.448 shift -0.073 error 0.006 distance 349.689 shift -0.311 error 0.102 X beam 81.573 shift -0.022 error 0.016 Y beam 82.703 shift 0.105 error 0.016 Highest resolution 4.6000 (input) Lowest resolution 20.000 (input) position 263 chi**2 x 0.74 y 0.62 pred. decrease: 0.000 * 263 = 0.0 partiality 285 chi**2 1.45 pred. decrease: 0.000 * 285 = 0.0 CrysZ (beam) -91.454 shift 0.000 error 0.017 CrysY (vertical) 54.813 shift -0.001 error 0.006 CrysX (spindle) -2.451 shift -0.004 error 0.006 distance 349.686 shift -0.003 error 0.101 X beam 81.573 shift 0.000 error 0.016 Y beam 82.703 shift 0.000 error 0.016 position 265 chi**2 x 0.61 y 0.48 pred. decrease: 0.000 * 265 = 0.0 partiality 285 chi**2 1.40 pred. decrease: 0.000 * 285 = 0.0 CrysZ (beam) -91.450 shift 0.002 error 0.017 CrysY (vertical) 54.815 shift 0.001 error 0.006 CrysX (spindle) -2.446 shift 0.002 error 0.006 Cell, a 57.21 b 57.21 c 514.80 alpha 90.00 beta 90.00 gamma 120.00 shifts 0.02 -0.30 errors 0.02 0.17 X beam 81.573 shift -0.001 error 0.016 Y beam 82.705 shift 0.002 error 0.016 Highest resolution 4.0000 (input) Lowest resolution 20.000 (input) position 280 chi**2 x 0.43 y 0.34 pred. decrease: 0.002 * 280 = 0.5 partiality 301 chi**2 1.42 pred. decrease: 0.000 * 301 = 0.1 CrysZ (beam) -91.450 shift 0.000 error 0.016 CrysY (vertical) 54.814 shift -0.001 error 0.006 CrysX (spindle) -2.444 shift 0.002 error 0.006 Cell, a 57.23 b 57.23 c 514.70 alpha 90.00 beta 90.00 gamma 120.00 shifts 0.02 -0.10 errors 0.02 0.16 CassY (vertical) -0.046 shift -0.046 error 0.133 CassX (spindle) 0.232 shift 0.232 error 0.118 X beam 81.594 shift 0.021 error 0.020 Y beam 82.704 shift -0.001 error 0.018 position 280 chi**2 x 0.44 y 0.32 pred. decrease: 0.000 * 280 = 0.0 partiality 305 chi**2 1.40 pred. decrease: 0.000 * 305 = 0.0 CrysZ (beam) -91.452 shift -0.001 error 0.016 CrysY (vertical) 54.814 shift 0.000 error 0.006 CrysX (spindle) -2.444 shift 0.002 error 0.006 CassY (vertical) -0.043 shift 0.003 error 0.133 CassX (spindle) 0.242 shift 0.010 error 0.118 distance 349.629 shift -0.056 error 0.097 X beam 81.593 shift -0.001 error 0.020 Y beam 82.704 shift -0.001 error 0.018 position 280 chi**2 x 0.43 y 0.32 pred. decrease: 0.000 * 280 = 0.0 partiality 306 chi**2 1.40 pred. decrease: 0.000 * 306 = 0.0 CrysZ (beam) -91.452 shift 0.000 error 0.016 CrysY (vertical) 54.814 shift 0.000 error 0.006 CrysX (spindle) -2.444 shift 0.001 error 0.006 CassY (vertical) -0.043 shift 0.000 error 0.133 CassX (spindle) 0.242 shift 0.000 error 0.118 distance 349.625 shift -0.004 error 0.097 X beam 81.593 shift -0.001 error 0.020 Y beam 82.703 shift 0.000 error 0.018 position 280 chi**2 x 0.43 y 0.32 pred. decrease: 0.000 * 280 = 0.0 partiality 305 chi**2 1.40 pred. decrease: 0.000 * 305 = 0.0Notice that more reflections are added to the refinement as the fit improves because there are fewer extreme outliers. Denzo uses only strong spots for positional refinement based on your definition of weak level - you might consider reducing this from (e.g.) 7 to (e.g.) 3 for really weak data. Really weak data often causes real problems in auto-indexing for all sorts of reasons.
The chi**2 on partiality should also be 1.0, and this is primarily affected by your estimate of moasaicity. However there are other good ways to estimate mosaicity rather than monitoring chi**2, as discussed below.
You should always check the correspondance for predicted and observed spot locations in XDISP . If a pattern of green, red and yellow circles don't already show on the window, press "Update Predictions" until they do. Green circles are fully recorded reflections ("fulls") that pass completely through diffraction condition during the frame. If your mosaicity is greater than the frame size then you will have no fulls. Yellow circles are partial reflections ("partials") that are clipped at one or the other end of the frame but Denzo can add reflections from adjacent frames. Red circles are reflections with problems. Sometimes they are at or near the backstop. Sometimes they may have background problems (e.g. reflections near zingers on CCD detectors). Sometimes they will be very close to other spots (i.e. overlaps). Overlaps are bad - you've possibly mis-collected your data. Sometimes they will be too intense (i.e. overloads ). A few overloads are normal on CCD data, but almost unheard of on image plate data. If you have a lot of overloads you probably have a problem with your exposure time. If you have a lot of red circles then there is a problem with either your indexing or your data collection strategy. Note that until you've refined the crystal orientation using the integrate script above, many reflections may be flagged with red circles because the observed and predicted positions differ too much.
Press on the zoom window and take a close look at the predicted and observed locations of the diffraction data. If the fit is not precise, then you may need to re-index, or tweak the integration script. If there are more predicted reflections than on the frame than you've over-estimated the mosaicity value. Decrease it and try again. If there are more observed reflections than predicted frame than you've under-estimated the mosaicity value. Decrease it and try again.
This is an example of a prediction based on an indexing without refinement. In the zoom window you can see discrepancies between the predicted and observed spot positions. There are some spots flagged (red circles) due to the bad fit:

This is an example of the same frame after refinement, you can see how the predicted and observed spots are in close agreement.

In this case some spots are still flagged red, mainly because they are too close together for this very long cell dimension (i.e. overlaps). Denzo also flags reflections that lie outside the active area of the detector due to excessively optimistic resolution limits in integration.
This next part of the frame illustrates what we call a lune - the ellipsoidal arrangement of spots arises from a single lattice plane in reciprocal space cutting the Ewald sphere. A plane cutting a sphere generates an ellipse, but this ellipse is "fat" because the crystal is rocking through a solid angle. The "full" reflections with green circles lie in the middle of the lune because they come pass completely through diffraction condition during the frame. The "partials" lie on the edge of the lune because they are either not fully in or not fully out of diffraction condition at the start or end of the frame.
Here is the section without the predicted spots (i.e. just the observed data):

Here is the same section with the predictions overlayed:

Again check all those red-flagged reflections - in this case we have overlap problems but it's unusual to see too many "bad" reflections.
Selecting a Spot Profile and Mosaicity
The best way to optimize integration parameters is to do a small trial integration and look carefully at the output. In particular initial estimates for the spot size and the mosaicity may be somewhat non-optimal, so I always check these after integrating several frames and adjust as necessary.The spot profile in this particular example is atypical. Denzo prints out the spot profiles for each of nine regions of the detector. The ones in the center of the detector (e.g. 2,2) should be pretty good, but sometimes the ones at the edge of the detector look crummy because you have little or very weak data out there. Spot profiles are constructed using strong reflections in each region. If you set the profile fitting radius too low, you can lose a lot of reflections because they have no viable spot profile - anisotropic data is prone to this since there may be no nearby strong spots to form the profile from. Make sure this parameter is set to at least 20, and probably more like 30.
Averaged spot profile in sector 1, 2 (x,y) # of spots 139 Weighted position of the spots 56.631, 95.020 (x,y) -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . . . . . . . . . . . . . . . -2 -2 -1 0 0 1 1 1 1 1 0 0 0 -1 -1 . . . . . . . . . . . . . . . -1 -1 0 1 2 3 4 5 5 4 3 2 1 0 -1 . . . . . . . . . . . . . . . -1 -1 1 3 5 8 11 12 11 8 6 5 3 1 0 . . . . . - - - - - . . . . . -1 0 2 4 8 14 21 23 19 13 9 7 5 2 0 . . . . - + + + + + - . . . . -1 0 2 5 10 19 30 33 26 16 10 8 5 2 0 . . . - + + + + + + + - . . . -1 0 2 5 11 22 38 45 34 18 10 7 4 1 0 . . . - + + + + + + + - . . . 0 1 2 4 10 23 46 58 40 18 8 5 2 0 -1 . . . - + + + + + + + - . . . 0 1 2 4 9 21 45 55 35 15 7 2 1 0 -1 . . . - + + + + + + + - . . . 0 1 2 3 7 16 33 35 21 10 4 1 0 0 -1 . . . - + + + + + + + - . . . 0 0 1 3 5 10 16 16 10 5 2 1 0 0 -1 . . . . - + + + + + - . . . . 0 0 1 2 3 5 6 5 3 2 1 0 -1 0 -1 . . . . . - - - - - . . . . . -1 0 0 1 1 2 2 1 1 0 0 -1 -1 -1 -1 . . . . . . . . . . . . . . . -1 -1 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 . . . . . . . . . . . . . . . -1 -1 0 -1 0 0 0 0 0 0 -1 -1 -1 -1 -1 . . . . . . . . . . . . . . .This is a fairly typical spot profile, although perhaps the size of the spot box is a little too small (it's area should be 2-3x larger than the spot). But the subtracted background is relatively flat (near zero) and the spot itself is within the region marked with "+" indicating that it's actually going to be integrated.
Averaged spot profile in sector 2, 2 (x,y) # of spots 17 Weighted position of the spots 77.869, 91.588 (x,y) 0 0 0 1 0 0-1-1 0-1-1 0 1 0 0-1 0 0-1 0 1 1 0 0 0 1 0-1 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0-1 0 0 0-1-1 0 0-1-1 0 0 0-1 0 0-1-1 0 0 0 0 0 0 1 0-1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0-1-1-1-1 0-1 0 0 0-1-1 0 0 0 0 0 0-1-1-1 0-1 0-1 0 0-1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0-1-1-1-1-1 0 0 0-1-1-1-1-1 0 0 1 0-2-1-1-1 0 0 0 0-2-1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0-1-1 0-1 0-1 0-1-1-1 0-1 0 0 0 0 0-1 0 0 0 0 0 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 0 0-1 0 0-1-1 0 0-1 0 0 0-1-1 0 0 0 0 0 0 0-1-1-1-1 0 0-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 0-1 0 0 0-1-1 0 0 0 0 1 0 0 0 0 1 1 0-1-1-1-1 0 0 0 0-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 0 0-1 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0-1 0 0 0 0 1 1 0-1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1-1 0 0 1 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 1 1 1 0 0 0-1 0 1 . . . . . . . . . . . . . - - - - . . . . . . . . . . . . . 0 0-1 0 0 1 1 1 1 0 0 0 1 2 2 3 3 2 1 1 0 0 1 1 1 1 1 0 0 0 . . . . . . . . . . . - - - - - - - - . . . . . . . . . . . 1 1 0 0 1 2 3 3 3 1 1 1 2 3 6 7 5 3 2 1 0 0 1 2 2 1 1 0 0 0 . . . . . . . . . . - - - - - - - - - - . . . . . . . . . . 1 1 1 1 2 3 6 6 4 2 2 2 2 6141510 5 2 2 0 1 3 4 4 1 1 1 1 0 . . . . . . . . . - - - - - + + - - - - - . . . . . . . . . 2 1 1 1 2 61314 7 3 3 3 412283419 7 3 2 1 2 5 9 9 4 2 2 0 1 . . . . . . . . . - - - + + + + + + - - - . . . . . . . . . 3 2 2 2 310242512 4 4 3 61851633111 5 3 2 2 71615 6 2 2 1 2 . . . . . . . . - - - - + + + + + + - - - - . . . . . . . . 4 2 1 1 415374017 6 4 4 72574924413 6 3 2 3 92122 9 2 2 1 2 . . . . . . . . - - - + + + + + + + + - - - . . . . . . . . 4 1 1 2 416444519 6 3 4 72781994713 5 3 2 3102222 9 3 2 1 2 . . . . . . . . - - - + + + + + + + + - - - . . . . . . . . 3 1 1 2 415363414 4 3 3 62262723410 4 3 2 3 81616 7 2 1 1 2 . . . . . . . . - - - - + + + + + + - - - - . . . . . . . . 2 1 1 2 3102017 7 3 2 2 414313617 5 3 2 2 2 4 9 9 3 1 0 0 1 . . . . . . . . . - - - + + + + + + - - - . . . . . . . . . 1 0-1 0 2 4 7 7 3 1 1 1 2 71313 7 2 1 0 1 1 2 3 3 1 0 0-1 0 . . . . . . . . . - - - - - + + - - - - - . . . . . . . . . 0 0 0 0 1 2 3 2 1 1 1 0 1 3 4 4 3 1 1 0 1 1 0 1 1 1 0 0-1 0 . . . . . . . . . . - - - - - - - - - - . . . . . . . . . . 0 1 1 0 1 1 1 1 0 0 1 0 0 1 2 3 2 0 1 1 0 0 0 1 0 0 0-1 0 0 . . . . . . . . . . . - - - - - - - - . . . . . . . . . . . 0-1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1-1 0 0 1 0 0 1 . . . . . . . . . . . . . - - - - . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0-1-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1 0 0-1-1 0 1 1 0 0 0 0 0 0 1 1 1-1 0 1 0-1 0 1 0 0-1-1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 0-1 0 0 0-1 0 1 0 0 0 0 0 0 0 0 0 0 0-1 0 0-1 0 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0-1 0 0 0 0 0 0 0-1 0 1 0-1 0 0-1-1 0 0 0-1 0 0 0-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 0 0 0 0-1-1-1-1 0 0 0-1 0-1-1-1 0 0-1-1 0 0-1 0 0 1 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 0 0-1 0 0 0 0 0 0 0 0 0-1 0 0-1-2 0 0-1 0 0-1-1-2-1-1 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1-1-1-1-1-1-1-1 0-1 0-1-1-1-1-1-1-1 0 0-1 0-1-2-1 0-1 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1 0 0-1-1-1-1-1-1-1 0 0-1 0-1 0-1-1 0 1-1-1-1-1 0 0-1 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .This is an atypical spot profile illustrating some problems - the background is fairly flat (good) and most of the spot is inside the "+" region i.e. it's going to be integrated, but there's some positive values outside the spot. This might mean you have satellite reflections that are not on integral Miller indices - some sort of supercell/subcell issue, or you might have mis-indexed your data and one of your unit cell axes is actually smaller than it needs to be. In this specific case I was lying to the program about spot size so I could integrate some data with a very long unit cell axis. Sometimes data processing is about the best compromise, but in the vast majority of cases you want the spots separated enough that you don't have to introduce systematic errors in the data just to get it processed.
The alternate lines show the spot profile values and the assignment of a pixel (. = background, - = guard region, + = spot).
The background region (".") is used to fit a least squares plane with which to subtract the background counts from the spot. If this has gone well all values in the background should be zero or close to it. In the second case there is intensity from neighboring spots in the background, but the subtraction still has worked since non-spot regions are still in the range -2 to +2. These neighboring spots do not contribute to the integrated intensity for this reflection because they are not in the spot region.
The guard region ("-") is simply an area near the spot that you want excluded from background or spot calculation (e.g. if your spot is smeary). The spot itself ("+") should fit the observed spot relatively closely, but should not clip it. In the second case the spot is defined somewhat too tight to try and reduce neighboring spots from being thrown out as overlaps. Sometimes you play this trade-off to get better (more complete) data, but it's nearly always better to make the spot too large, and let the learnt profile take out the excess pixels. Generally the guard region is 0.1 or 0.15 larger than the spot radius.
Spot parameters in the second case were:
ibox 30 30 spot elliptical 0.30 0.30 background elliptical 0.55 0.55whereas typical synchrotron parameters would be:
ibox 21 21 spot elliptical 0.35 0.35 background elliptical 0.45 0.45and those at home more like:
ibox 27 27 spot elliptical 0.45 0.45 background elliptical 0.55 0.55although these parameters vary somewhat with the size of the beam, size of the crystal etc. The first spot profile is pretty good, although I would usually make the ibox larger and the spot a little larger (10-20%) to be on the safe side. What's confusing in Denzo is that "background" really means the size of the guard region, and "ibox" really defines the size of the area that's used to determine the background to be subtracted.
We typically use "overlap spot" in Denzo, where spots are only rejected if the actual spot region ("+") overlaps. This helps in cases where the spots are relatively close. There are more conservative overlap schemes, but overlap spot seems to work well enough.
For most well-behaved data the numerical spot size should match the observed spot size, with perhaps a small amount of extra space. Since Denzo does profile fitting the spot profile is learnt from strong spots and assumed to apply to all spots local to it. However for badly-behaved data one may want to play certain tricks: large unit cell dimensions often put the spots very close together on the detector - in this case one might systematically underestimate the spot size until you can integrate the data so that the data does not get thrown away due to overlaps; smeary spots often have a bright center with large trails on them due to crystal disorder/splitting - here also we (often substantially) underestimate the size of the true spot to only integrate the center of the smear which often gives better integrated intensities than when including all the disordered junk.
Crystal mosaicity can vary widely, from 0.2 for well-behaved crystals at synchrotron sources, to greater than 1.5 for badly-behaved crystals at home sources. Bad cryo conditions and heavy atom soaking often increase crystal mosaicity. If you get high mosaicity for native data, you may want to explore other stabilisation solutions and cryo buffers. If your crystal is thin, high mosaicities may also indicate handling problems (e.g. thin plate crystals often bend/warp during manipulation).
There's nothing inherently wrong with high mosaicity - it smears the data out along the rotation axis a bit making it a little weaker, and it can cause problems with overlaps, but data with high mosaicity can be processed perfectly well assuming you make a correct estimate of the true moscaicity. Mosaicity does cause more crowding on a frame, because more spots are spread over more frames, so high mosaicities can cause more overlap problems. HKL2000 refines mosaicity on a per-frame basis by default which for problem crystals means that the mosaicity estimate can inappropriately blow up due to modelling of spot smearing. In this case you should not refine mosaicity during data integration - this is another instance of the trade-off between "correctly" modeling a parameter and modeling it so that you can actually integrate the data. If the mosaicity is too high all the spots on the frame will overlap with those on subsequent and successive frames, and all your data will be thrown out.
One of the easiest ways to estimate mosaicity is to process the data and scale it with Scalepack and then reprocess it with the refined moscaicity value out of Scalepack. However it's not a bad thing to have the mosaicity estimated approximately correctly from the start.
Denzo shows a partiality histogram for each frame:
zone av. part. -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140 -0.125 - -0.112 1.364 +- 0.259 .******************************************************* -0.112 - -0.100 1.194 +- 0.350 .************************************************ -0.100 - -0.087 0.644 +- 0.277 .************************** -0.087 - -0.075 1.071 +- 0.214 .******************************************* -0.075 - -0.062 0.874 +- 0.245 .*********************************** -0.062 - -0.050 1.525 +- 0.223 .******************************************************* -0.050 - -0.037 0.970 +- 0.180 .*************************************** -0.037 - -0.025 0.629 +- 0.195 .************************* -0.025 - -0.012 0.796 +- 0.253 .******************************** -0.012 - 0.000 0.961 +- 0.143 .************************************** 0.000 - 0.013 0.431 +- 0.113 .***************** 0.013 - 0.025 0.637 +- 0.097 .************************* 0.025 - 0.038 0.321 +- 0.078 .************* 0.038 - 0.050 0.237 +- 0.063 .********* 0.050 - 0.063 0.186 +- 0.057 .******* 0.063 - 0.075 0.239 +- 0.044 .********** 0.075 - 0.088 0.035 +- 0.037 .* 0.088 - 0.100 0.050 +- 0.025 .** 0.100 - 0.113 0.129 +- 0.022 .***** 0.113 - 0.125 0.082 +- 0.024 .***which adopts an approximately sigmoidal shape, although it's frequently noisy, as here. If your mosaicity estimate is much too low, it looks more like:
zone av. part. -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140 -0.050 - -0.045 0.403 +- 0.711 .**************** -0.045 - -0.040 1.698 +- 0.572 .******************************************************* -0.040 - -0.035 0.478 +- 0.293 .******************* -0.035 - -0.030 0.492 +- 0.382 .******************** -0.030 - -0.025 1.501 +- 0.360 .******************************************************* -0.025 - -0.020 0.224 +- 0.472 .********* -0.020 - -0.015 1.010 +- 0.342 .**************************************** -0.015 - -0.010 0.771 +- 0.285 .******************************* -0.010 - -0.005 0.530 +- 0.312 .********************* -0.005 - 0.000 0.927 +- 0.576 .************************************* 0.000 - 0.005 0.605 +- 0.169 .************************ 0.005 - 0.010 0.931 +- 0.147 .************************************* 0.010 - 0.015 0.592 +- 0.103 .************************ 0.015 - 0.020 0.284 +- 0.082 .*********** 0.020 - 0.025 0.388 +- 0.065 .**************** 0.025 - 0.030 0.109 +- 0.055 .**** 0.030 - 0.035 0.993 +- 0.059 .**************************************** 0.035 - 0.040 0.133 +- 0.041 .***** 0.040 - 0.045 0.152 +- 0.046 .****** 0.045 - 0.050 0.145 +- 0.046 .******i.e. fairly flat with not much of a trail, and if it is too high it looks more like:
zone av. part. -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140 -0.300 - -0.270 1.045 +- 0.205 .****************************************** -0.270 - -0.240 1.092 +- 0.241 .******************************************** -0.240 - -0.210 0.654 +- 0.223 .************************** -0.210 - -0.180 1.021 +- 0.179 .***************************************** -0.180 - -0.150 0.877 +- 0.209 .*********************************** -0.150 - -0.120 0.921 +- 0.179 .************************************* -0.120 - -0.090 1.331 +- 0.158 .***************************************************** -0.090 - -0.060 1.077 +- 0.129 .******************************************* -0.060 - -0.030 1.225 +- 0.115 .************************************************* -0.030 - 0.000 0.732 +- 0.098 .***************************** 0.000 - 0.030 0.445 +- 0.083 .****************** 0.030 - 0.060 0.245 +- 0.068 .********** 0.060 - 0.090 0.205 +- 0.061 .******** 0.090 - 0.120 0.060 +- 0.042 .** 0.120 - 0.150 0.098 +- 0.037 .**** 0.150 - 0.180 0.051 +- 0.023 .** 0.180 - 0.210 0.015 +- 0.021 .* 0.210 - 0.240 0.023 +- 0.017 .* 0.240 - 0.270 0.018 +- 0.017 .* 0.270 - 0.300 0.020 +- 0.016 .*i.e. too quickly tapering off toward zero. Find a happy medium before you process the data.
Integrating Your Data
Most of the work has already been done - the auto-indexing procedure gives us initial estimates for the unit cell and the so-called "crystal missetting angles" (i.e the orientation of your crystal in some reference frame). The refinement procedure improves these estimates for the first frame. Once you've tweaked the spot size and the mosaicity, you are ready to integrate your entire dataset .
$ denzo @index.dat sector 1 to 100 @integrate.dat stopassuming you have 100 frames to integrate. This will take a while, and the output is voluminous, but if you haven't processed data before it may be worth saving a log file and looking through it. Many labs use a different integrate.dat for refinement to that for integration. A stripped down version of the above integrate.dat suitable for bulk integration would be as follows - it assumes that the orientation and unit cell parameters have been refined after auto-indexing, since this script does only limited refinement:
start refinement refine partiality use partials position weak level 10. resolution limits 20.0 4.0 fit cassette rotx roty go go go fix cell fit distance go go go fit all go go go go go go go go print profiles 1 1 go calculate go [ this actually tells Denzo to integrate the frame ] end of pack [ this increments the file counter and oscillation range ] [ in Denzo ready for the next frame ]
i.e. no extension from low resolution, but it's possible to get too fussy about this. The "fit all" line can sometimes cause problems with data that is weak or low resolution since parameters can be too highly correlated with each other and/or not well-defined.
What matters, critically, is that Denzo must correctly assign the correct index to every reflection it integrates, and the predicted and observed reflection centroids must be close enough to let Denzo integrate the spot. Beyond that, it's not really important just how you acheive that.
The integrated data is written into files that we defined by "film output file" in index.dat (aka the ".x" files in conventional lab nomenclature). These contain the integrated reflection data plus the current values of the parameters for each frame. Since Denzo refines only on a per-frame basis the parameters are often in local minima rather than their global optimal values. Some drift in distance, cell dimensions etc is to be expected. Do:
$ grep cell *.x $ grep dist *.x $ grep beam *.xto see how some parameters vary with frame. The ".x" files are ASCII, so you can view them using cat, more or your favorite editor.
Scaling Your Data
Assuming all Hell hasn't broken loose during integration, all you have to do is scale the data using Scalepack. Scaling is driven by comparing the intensities of symmetry related reflections whose intensities are expected to be identical within error.For example: the unique reflection (h,k,l) in point group 222 should be identical in intensity to (h,-k,-l), (-h,k,-l), and (-h,-k,l) within experimental error. In the absence of measurable anomalous scattering, Friedel's Law applies, giving rise to a center of symmetry in the data, which means that another four reflections are equivalent: (-h,-k,-l), (-h,k,l), (h,-k,l) and (h,k,-l). These are often referred to as symmetry-related reflections, although they are also referred to as the Bijvoet (pron. Boy-vert) Pairs or Friedel (pron. Free-del) Pairs. Scalepack ultimately merges all the symmetry-related reflections into set of unique reflections (i.e. those which are not related to each other by symmetry).
Technically, Friedel Pairs are pairs of reflections related by (h,k,l) to (-h,-k,-l) whereas Bijvoet Pairs are related by a combination of crystallographic symmetry and Friedel's Law, but of course the usage tends to be pretty equivalent. Also technically, point group 222 in the presence of Friedel's Law becomes point group mmm, since this reflects the effect of the centrosymmetric symmetry in the diffraction pattern (point group 2 becomes 2/m etc.). These designations are used somewhat flexibly in the non-technical literature.
One crude monitor of data quality is the extent to which observations of each unique reflection deviate from being equal to each other. The percentage deviation is termed Rsymm. As a precise monitor of data quality, Rsymm is very flawed but nevertheless widely used. Sometimes people use the term Rmerge when they mean Rsymm, but I prefer to use the former when merging datasets from multiple crystals and the latter when referring to scaling within the same crystal. Community-wide, there's no clear consensus of the usage of Rsymm vs Rmerge. Ironically Rsymm gets worse as the multiplicity/redundancy of your data increases, which is actually at odds with the fact that your data is getting better by virtue of this multiplicity. Two measures that attempt to compensate for this behavior are Rpim and Rmeas - both of which are discussed elsewhere and neither of which are reported by Scalepack.
Scalepack refines a number of parameters during scaling. There is a per-frame scale factor (k) that models things like beam decay at synchrotrons, and the volume of the crystal in the beam. It is defined as relative to the reference frame (normally frame #1). If it increases your data is getting stronger. If it decreases your data is getting weaker. Therefore it is sensitive to things like beam intensity variation at synchrotrons and particularly sensitive to crystal mis-centering. There is also a per-frame B-factor (B) that models radiation damage in crystals. The (relative) B-factor models the extent to which the fall-off of diffracted intensity with resolution varies during the dataset. As the B-factor increases, your high resolution data is getting weaker relative to your low resolution data - a classical hallmark of radiation damage as order in the crystal is lost. Radiation damage happens at synchrotrons (especially at places like CHESS and APS) but frozen crystals are essentially immortal at home sources over timeframes of several days. The k and B (and other) parameters are refined by minimising Rsymm (or a residual resembling it). Scales (k) and B-factors can be poorly determined in the early stages of data collection when there are few symmetry-related reflections, so it's often useful to use scale restraints and B-factor restraints when scaling (keywords SCALE RESTRAIN and B RESTRAIN lines in the example below). These values should be set to greater than or equal to the expected variation of the scale factors and B factors during scaling. You can actually hurt scaling by setting them too tight but you can improve scaling when giving them a reasonable estimate. On home sources I use SCALE RESTRAIN 0.02 and B RESTRAIN 0.1. At synchrotrons were things are more variable, I use values perhaps twice as large in the SCALE parameter - it depends a bit on beam strength variability and how fast you expect your crystal to die in the beam. At CHESS, in particular, the beam intensity can vary rapidly with frame number (70 minute beam lifetimes at CHESS last time I was there vs 12 hours at Brookhaven).
It can be important to monitor radiation damage at synchrotrons to set the appropriate exposure time for your crystal. Generally we do not like the per-frame B-factor correction to get much above 5 A2 for high resolution datasets (2.5 Angstrom or greater) and 10 A2 for low resolution datasets. For MAD datasets these values should probably be less. Per-frame B-factors can model other things (like your crystal rolling out of the beam) so treat them with caution - they are not a good guide for radiation damage but during data collection they're one of the few things you have.
Scalepack also does post-refinement in which it optimizes the unit cell and crystal orientation (aka "missetting angles") based on a knowledge of the observed locations of all the reflections in the entire dataset. Typically this improves the quality of the partial reflections by an accurate estimate of their "partiality". Scalepack uses partials by adding them across one or more adjacent frames. For mosaic crystals partials can be the majority of the reflections measured (as opposed to fulls). Post-refined unit cell dimensions are probably closest to the "true" ones since the locally-refined ones estimated by Denzo tend to be only a local approximation to the unit cell. Scalepack can also post-refine mosaicity which can give you an improved estimate for re-integration - generally speaking you should reintegrate if the mosaicity used in integration is more than 10% different from the one from postrefinement.
Under certain circumstances it's not always a good idea to post-refine mosaicity - with some crystals the mosaicity estimate will blow up and Scalepack will discard most of your data. This most often happens in crystals with smeary spot profiles - in these cases it pays to try a variety of different mosaicities to see what gives the best data. At other times you lie about (systematically underestimate) your mosaicity to reduce spot overlap problems. In these cases comment out the mosaicity line in the post-refinement block. At all other times, one should post-refine mosaicity, and re-integrate the data if the post-refined value and original value are more than 10% different .
Post-refinement is especially powerful in adding partials. Scalepack can add partial reflections split across several frames to reconstitute an intact reflection intensity. Since sometimes partials make up close to 100% of your data, this is a rather nice feature, but it does require that you integrate your data carefully so that the partiality estimates are as correct as possible - this is where post-refinement is important since this usually provides the most accurate values of the geometric parameters.
Scalepack throws out rejections which are reflections that differ excessively from the estimated mean value of the reflection, and are thus possibly erroneously measured. For good data only a percent of observations are discarded. Once the percentage gets more than 5% this is a pretty good indication of systematic errors in your data. You should examine your integration carefully in such cases. With these "rejections" it often pays to read them back in as exclusions from the next round of scaling, thus improving the scaling parameters for the subsequent round(s).
Scalepack is /programs2/hkl/HKLsgi_1.97.2/scalepack on the SGIs, and /programs2/denzo/scalepack on Helium.
A typical scalepack script file is as follows:
NUMBER OF ZONES 10 ESTIMATED ERROR [ per-shell error estimates ] 0.02 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.04 RESOLUTION 4.0 IGNORE OVERLOADS [ throw out overloaded reflections ] REJECTION PROBABILITY 0.001 [ parameter controls # of reflections rejected ] WRITE REJECTION FILE 0.5 ERROR SCALE FACTOR 1.500 [ overall scalar for error estimates ] ADD PARTIALS 1 TO 180 [ frames over which partials should be added ] SPACE GROUP P6122 [SCALE RESTRAIN 0.05] [ if scaling is problematic, restrain k and B ] [B RESTRAIN 1.5] POSTREFINE 10 [ only do postrefinement once you have >5 frames ] FIT CRYSTAL A* 1 TO 180 [ parameters to include in postrefinement ] FIT CRYSTAL C* 1 TO 180 FIT BATCH ROTX 1 TO 180 FIT BATCH ROTY 1 TO 180 FIT CRYSTAL MOSAICITY 1 TO 180 END FIT @reject OUTPUT FILE 'ctmp7.hkl' REFERENCE FILM 1 FORMAT DENZO_IP SECTOR 1 TO 180 FILE 1 'ctmp7_1_###.x'When adding partials it's sometimes advantageous to not add them across "a break" in data collection. For instance if you collect frames 1-180 across two beam-fills at CHESS at frames 80 and 140 you might change the ADD PARTIALS line above to:
ADD PARTIALS 1 TO 79, 80 TO 139, 140 TO 180at the cost of losing partials that bridge the gap between 79/80 and 139/140. Scalepack reads this and doesn't expect any SCALE RESTRAIN or B RESTRAIN keywords you used to apply across this gap.
However you should not break up the post-refinement of parameters like the unit cell or the mosaicity into blocks, because these things should have a single value for the crystal that does not vary by frame (exception: sometimes mosaicity will vary by rotation angle, but refining per-frame mosaicities is something ripe for abuse even if it is the default in HKL2000).
Scalepack makes a big deal over the error model. The basic dogma is that the error model should be modified until the chi**2 for the data is approximately unity. The error model is thus a mixture of the theoretical (the mix of errors and the approximate expected distribution) and empiricism (the match between what is expect and what is observed). The estimated error in zones is one way to do that, although I don't change this much. The error scale factor is the main way to do this. If chi**2 is >> 1.0, increase the error scale factor. If chi**2 is << 1.0, decrease the error scale factor . Personal experience suggests that the error scale factor is ~1.5 for good crystals at synchrotrons, and ~2.0 at home. However for badly behaved crystals this value will generally be higher. If your error scale factor is 3, however, you should question if you are collecting data or noise. Higher error scale factors reflect the reality that there is a greater error in your data than would be expected from the strength of your data. Usually this is systematic errors due to split/multiple crystals giving rise to poor spot profiles. This data can be used, but you should be aware that it contains more error than well-behaved crystals of comparable strength.
Selected parts of a Scalepack log file are as follows:
Number of rejected reflections 11567Scalepack uses a file typically called "reject" to store the reflections rejected as outliers. If you run scalepack multiple times it reads this reject file from the previous run and leaves out those reflections from scaling. Therefore you must run scalepack several times in succession to acheive convergence and actually reject the outlier data. Each time you add more frames or change resolution you should remove the reject file (typically called "reject") and run Scalepack several times until the number of rejected rejections remains stable (i.e. scaling reaches convergence with no more rejected reflections). Scalepack lists rejected reflections for each frame as the frames are read in so you can see where all your data is going:
reading from a file: ctmp7_1_001.x 0.0 2 0 121 0 -4.7 -10.4 1.51 13.0 0.970 137.1 1254.8 0.505 10.4 0.0 2 0 118 1 65.6 60.6 1.52 12.6 0.972 160.4 1252.7 0.493 18.5 0.0 0 -1 99 0 69.4 78.8 1.48 9.7 0.980 291.5 1036.8 0.421 21.5 0.0 2 1 82 0 285.2 270.7 1.51 9.4 0.985 442.7 1292.1 0.337 20.0 0.0 -2 -2 57 0 207.0 209.5 1.27 5.6 0.991 590.8 811.1 0.254 67.2 0.0 5 3 60 1 14.1 21.0 1.87 6.3 0.983 632.7 1614.3 0.232 26.2etc.
This is the output from post-refinement showing the refined parameters:
Hkl's refined: 53162 N.Chi**2: 2.648 Decrease: 0.004 * 53162 = 213.6 Film # a b c alpha beta gamma crysz crysy crysx mosaicity 1 57.725 57.725 518.853 90.000 90.000 120.000 -84.352-147.227 -5.776 0.370 2 57.725 57.725 518.853 90.000 90.000 120.000 -84.348-147.223 -5.781 0.370 3 57.725 57.725 518.853 90.000 90.000 120.000 -84.353-147.212 -5.774 0.370 4 57.725 57.725 518.853 90.000 90.000 120.000 -84.361-147.204 -5.772 0.370 5 57.725 57.725 518.853 90.000 90.000 120.000 -84.352-147.204 -5.776 0.370 6 57.725 57.725 518.853 90.000 90.000 120.000 -84.359-147.193 -5.775 0.370 7 57.725 57.725 518.853 90.000 90.000 120.000 -84.357-147.185 -5.773 0.370 8 57.725 57.725 518.853 90.000 90.000 120.000 -84.362-147.180 -5.765 0.370 9 57.725 57.725 518.853 90.000 90.000 120.000 -84.360-147.169 -5.752 0.370 10 57.725 57.725 518.853 90.000 90.000 120.000 -84.361-147.170 -5.756 0.370Notice that the cell dimensions are post-refined CRYSTAL which means they are the same for all frames, whereas CrysX, CrysY, CrysZ are post-refined BATCH so that if the crystal slips they can absorb the error - in this case crysx/y/z are very similar which indicates a stable integration and also that the crystal was not slipping. If CrysX/Y/Z move too much it may reflect a problem with locally inaccurate estimates of cell dimensions (scalepack can get better global estimates via postrefinement) or that your crystal is slipping.
This next table shows the scale factor and B-factor for each frame (the reference frame has values of 1.0 and 0.0 respectively) along with the per-frame rejection counts and other factors. Large numbers of reflections getting thrown out in columns 2 and 3 might indicate problems.
1 - count of observations deleted manually 2 - count of observations deleted due to zero sigma or profile test 3 - count of non-complete profiles (e.g. overloaded) observations 4 - count of observations deleted due to sigma cutoff 5 - count of observations deleted below low resolution limit, 6 - count of observations deleted above high resolution limit, 7 - count of partial observations 8 - count of fully recorded observations used in scaling 1 2 3 4 5 6 7 8 IP fitted, no o 1 1.0000 0.00 0 0 10 14 0 86 530 69 IP fitted, no o 2 1.0251 0.03 0 0 8 22 0 94 561 70 IP fitted, no o 3 1.0010 3.47 0 0 3 14 0 97 558 72 IP fitted, no o 4 0.9552 -0.97 0 1 3 16 0 98 559 73 IP fitted, no o 5 0.9540 1.44 0 0 5 10 0 93 583 71 IP fitted, no o 6 0.8678 -0.99 0 0 4 11 0 94 572 71 IP fitted, no o 7 0.8164 -1.36 0 0 6 17 0 101 558 73 IP fitted, no o 8 0.7598 1.63 0 0 7 14 0 94 541 76 IP fitted, no o 9 0.7580 1.14 0 0 6 11 0 85 542 77 IP fitted, no o 10 0.7767 0.99 0 30 5 8 0 100 529 67 IP fitted, no o 11 0.6735 -6.04 0 0 4 10 0 91 544 77 IP fitted, no o 12 0.7765 1.46 0 0 2 20 0 93 598 86 IP fitted, no o 13 0.7682 1.22 0 0 3 14 0 87 592 74 IP fitted, no o 14 0.7554 2.79 0 0 4 18 0 71 548 70 IP fitted, no o 15 0.7511 1.59 0 1 9 12 0 94 538 85 IP fitted, no o 16 0.7673 -1.46 0 0 4 13 0 95 512 73 IP fitted, no o 17 0.7224 0.54 0 1 6 14 0 82 506 56 IP fitted, no o 18 0.7446 2.19 0 1 3 15 0 81 605 75 IP fitted, no o 19 0.6930 -0.18 0 10 4 11 0 89 640 101 IP fitted, no o 20 0.7186 1.99 0 8 1 8 0 76 615 74 IP fitted, no o 21 0.7007 2.12 0 1 1 15 0 88 571 77Notice that this table immediately suggests something non-optimal about the scaling - the per-frame B-factors and scale factors vary too greatly between frames (B-factor changes -6.02 to 1.46 on successive frames). One should use SCALE RESTRAIN and B RESTRAIN in this case and rescale the data. I typically use SCALE RESTRAIN 0.1 (or 0.05) and B RESTRAIN 0.1 in cases where the scaling is not so well-behaved.
Scalepack usually gives very verbose details about reflections rejected on each pass - most of us don't look hard at those listings, but they might be useful in pathological cases. But of particular interest is this table which gives frame-by-frame breakdowns on data quality (chi**2 and R-factor), data strength etc.
Summary of reflection intensities and R-factors by batch number All data Linear Batch # obs # obs > 1 < I/sigma> N. Chi**2 R-fac 1 150 147 9.9 1.048 0.059 2 245 243 10.3 1.081 0.071 3 255 255 10.5 1.279 0.102 4 268 268 10.6 1.084 0.072 5 256 256 8.8 1.196 0.089 6 267 267 10.6 1.082 0.077 7 254 254 9.2 1.202 0.096 8 269 269 9.6 1.113 0.081 9 259 258 9.5 1.130 0.075 10 246 239 8.8 1.136 0.089 11 257 247 7.3 1.097 0.102 12 277 266 8.1 1.154 0.100 . . . . . . . . . . . . . . . . . . . . . . . . 130 76 64 5.0 3.113 0.288 131 76 70 4.4 4.853 0.523 132 65 59 5.9 1.629 0.281 133 63 58 5.4 2.546 0.198 134 55 48 4.7 1.970 0.252 135 70 65 4.6 3.052 0.304 136 77 61 5.7 8.570 0.351 137 83 81 3.9 20.434 0.748 138 67 55 4.2 7.657 0.407 All films 30407 30112 6.7 1.458 0.177The first thing to notice is that there should be the same number of observations (# obs) on each frame, approximately. In this example there are missing observations on the last frames due to severe spot profile overlaps. Such phenomena should always be checked carefully in case you can recover this lost data. The chi**2 and R-factor for these latter frames is also pretty bad, indicating problems, and the data appears weak in these frames. Again, time to check that you've done the integration correctly.
This table tells us a lot about our data and should be checked carefully. It can also be used as a primitive strategy tool for data completeness. You are only adding new unique data when the #obs value is much greater than the #obs>1 value. If not, you are just adding redundancy. Extra redundancy is not a bad thing if you want to collect those extra frames (in fact it does modestly improve the quality of your dataset), but completeness is a more critical concern. Datasets that are less than 85% complete overall are rarely useful for anything. You can often identify if you are collecting more of the same data by looking for the discrepancy between those columns - if the #obs is > #obs>1, then you're inevitably adding more unique data.
Finally, Scalepack prints out a whole raft of tables for our amusement, broken down by resolution. The overall redundancy here is 30407 (above table) divided by 4876 (below) so we have reasonably redundant data (6.2x). This is reflected in the first two tables where most of the data is observed a substantial number of times.
Shell Summary of observation redundancies by shells: Lower Upper No. of reflections with given No. of observations limit limit 0 1 2 3 4 5-6 7-8 9-12 13-19 >19 total 99.00 8.62 145 46 53 43 59 185 55 74 4 0 519 8.62 6.84 36 55 46 27 51 167 51 114 12 0 523 6.84 5.98 22 27 38 34 55 139 63 133 9 0 498 5.98 5.43 20 38 46 52 36 133 49 140 9 0 503 5.43 5.04 9 21 46 43 56 115 61 130 13 0 485 5.04 4.74 28 32 36 27 49 110 69 131 14 0 468 4.74 4.50 15 24 49 49 38 123 46 140 6 0 475 4.50 4.31 26 20 53 50 45 112 62 131 7 0 480 4.31 4.14 19 13 23 36 53 104 45 144 17 0 435 4.14 4.00 30 19 53 52 49 89 70 148 10 0 490 All hkl 350 295 443 413 491 1277 571 1285 101 0 4876 Shell Summary of observation redundancies: Lower Upper % of reflections with given No. of observations limit limit 0 1 2 3 4 5-6 7-8 9-12 13-19 >19 total 99.00 8.62 21.8 6.9 8.0 6.5 8.9 27.9 8.3 11.1 0.6 0.0 78.2 8.62 6.84 6.4 9.8 8.2 4.8 9.1 29.9 9.1 20.4 2.1 0.0 93.6 6.84 5.98 4.2 5.2 7.3 6.5 10.6 26.7 12.1 25.6 1.7 0.0 95.8 5.98 5.43 3.8 7.3 8.8 9.9 6.9 25.4 9.4 26.8 1.7 0.0 96.2 5.43 5.04 1.8 4.3 9.3 8.7 11.3 23.3 12.3 26.3 2.6 0.0 98.2 5.04 4.74 5.6 6.5 7.3 5.4 9.9 22.2 13.9 26.4 2.8 0.0 94.4 4.74 4.50 3.1 4.9 10.0 10.0 7.8 25.1 9.4 28.6 1.2 0.0 96.9 4.50 4.31 5.1 4.0 10.5 9.9 8.9 22.1 12.3 25.9 1.4 0.0 94.9 4.31 4.14 4.2 2.9 5.1 7.9 11.7 22.9 9.9 31.7 3.7 0.0 95.8 4.14 4.00 5.8 3.7 10.2 10.0 9.4 17.1 13.5 28.5 1.9 0.0 94.2 All hkl 6.7 5.6 8.5 7.9 9.4 24.4 10.9 24.6 1.9 0.0 93.3The second table also shows the % completeness. Although most of the shells of data are complete in this example, the low resolution shell lacks completeness and this will affect the quality of the electron density map. This next table allows us to see how strong the data is with resolution.
Shell I/Sigma in resolution shells: Lower Upper No. of reflections with I / Sigma less than limit limit 0 1 2 3 5 10 20 >20 total 99.00 8.62 16 25 33 49 69 103 172 347 519 8.62 6.84 20 40 60 89 130 202 325 198 523 6.84 5.98 17 39 60 91 143 250 393 105 498 5.98 5.43 12 33 52 79 129 248 384 119 503 5.43 5.04 12 26 45 73 119 222 360 125 485 5.04 4.74 5 16 27 51 95 198 352 116 468 4.74 4.50 5 13 28 41 91 194 346 129 475 4.50 4.31 12 21 31 46 89 227 382 98 480 4.31 4.14 17 34 57 84 140 272 390 45 435 4.14 4.00 16 36 66 107 193 347 452 38 490 All hkl 132 283 459 710 1198 2263 3556 1320 4876And this table presents much the same data except also shows data completeness . Data less than 80% complete overall is mostly worthless and there really is no excuse for collecting such data. Check this table carefully and aim for at least 90% completeness in all shells.
Shell I/Sigma in resolution shells: Lower Upper % of reflections with I / Sigma less than limit limit 0 1 2 3 5 10 20 >20 total 99.00 8.62 2.4 3.8 5.0 7.4 10.4 15.5 25.9 52.3 78.2 8.62 6.84 3.6 7.2 10.7 15.9 23.3 36.1 58.1 35.4 93.6 6.84 5.98 3.3 7.5 11.5 17.5 27.5 48.1 75.6 20.2 95.8 5.98 5.43 2.3 6.3 9.9 15.1 24.7 47.4 73.4 22.8 96.2 5.43 5.04 2.4 5.3 9.1 14.8 24.1 44.9 72.9 25.3 98.2 5.04 4.74 1.0 3.2 5.4 10.3 19.2 39.9 71.0 23.4 94.4 4.74 4.50 1.0 2.7 5.7 8.4 18.6 39.6 70.6 26.3 96.9 4.50 4.31 2.4 4.2 6.1 9.1 17.6 44.9 75.5 19.4 94.9 4.31 4.14 3.7 7.5 12.6 18.5 30.8 59.9 85.9 9.9 95.8 4.14 4.00 3.1 6.9 12.7 20.6 37.1 66.7 86.9 7.3 94.2 All hkl 2.5 5.4 8.8 13.6 22.9 43.3 68.0 25.3 93.3Finally the summary table presents the mean I/sigmaI for each shell, the chi**2 (here a little above 1.0, so we should increase the error scale factor), and the R-factor. Although traditionally the most-quoted value, the R-factor is so dependent on redundancy that it's actual utility for assessing the quality of the data is questionable. The R-factor generally increases with increasing redundancy (implying higher error) while the actual quality of the data usually increases (lower error). If we were fastidious about adhering to the Scalepack error model the overall chi**2 of 1.5 would induce us to increase the Error Scale Factor and rerun Scalepack.
Summary of reflections intensities and R-factors by shells R linear = SUM ( ABS(I - mean(I))) / SUM (I) R square = SUM ( (I - mean(I)) ** 2) / SUM (I ** 2) Chi**2 = SUM ( (I - mean(I)) ** 2) / (Error ** 2 * N / (N-1) ) ) In all sums single measurements are excluded Shell Lower Upper Average Average Norm. Linear Square limit Angstrom I error stat. Chi**2 R-fac R-fac 99.00 8.62 617.4 22.7 20.8 2.357 0.086 0.435 8.62 6.84 260.2 18.2 17.2 1.516 0.162 0.459 6.84 5.98 162.3 13.8 13.3 1.324 0.250 0.771 5.98 5.43 199.6 16.8 16.2 1.431 0.206 0.314 5.43 5.04 212.0 14.9 14.2 1.479 0.171 0.196 5.04 4.74 266.2 19.8 18.3 1.558 0.179 0.223 4.74 4.50 303.9 21.0 19.2 1.532 0.172 0.204 4.50 4.31 259.2 20.0 18.8 1.444 0.189 0.217 4.31 4.14 157.0 16.7 16.2 1.378 0.263 0.304 4.14 4.00 138.4 17.6 17.1 1.325 0.321 0.404 All reflections 260.5 18.2 17.1 1.521 0.177 0.388Finally, Denzo lists the observed intensities and I/sigmaI for any systematic absences present in the space group you specified. Even if the I/sigmaI looks high, you should compare it to the mean intensity for the resolution shell in the above table. For example (0,0,31) has an I/sigmaI of 2.9 but it is a 18 Angstrom resolution reflection, and it's intensity (24.7) is only 4% of the average intensity in that shell of data. So it's still pretty weak, and probably absent. Notably, however, I've been fooled by systematic absences in data. It's better to determine the space group definitively via determining the heavy atom substructure or molecular replacement solution - these are sensitive tests involving all the data, not just 10-100 or so reflections along the reciprocal space lattice axes.
Intensities of systematic absences h k l Intensity Sigma I/Sigma 0 0 26 6.6 2.8 2.4 0 0 27 -5.9 3.3 -1.8 0 0 28 -1.2 2.8 -0.4 0 0 29 10.2 6.0 1.7 0 0 31 24.7 8.5 2.9 0 0 32 0.0 2.8 0.0 0 0 33 -4.7 2.5 -1.9
The Denzo manual does a good job of showing different Scalepack input files to be used in a variety of different situations. Rather than re-invent the wheel here, just go look at the examples for Scalepack in this manual.
Figuring Out Your Space Group
Denzo indexes and integrates data based on the physical dimensions of the lattice it found during auto-indexing, but it doesn't compare the intensities of reflections during integration. Therefore one must use Scalepack to determine the point group symmetry during scaling. Most of the time you cannot uniquely determine which space-group you have solely from Scalepack statistics because several space groups obey the same point group (e.g. P222, P2221, P21212, P212121 all belong to point group 222). Systematic absences along the principal axes can sometimes help you to distinguish then (e.g. P222 has no systematic absences, but P2221 has 00L absent for L=odd), but the best way to tell is to find the heavy atom substructure or Molecular Replacement solution in a specific space group. As ever, try all possible space groups and if you get one that is obviously better than the other, that's a good indication. Them all looking the same indicates that you haven't found any solution at all.What follows is a description of how to tell what point group and space group you have. For some lattices there is only one point group (one line in the table above) but for certain others there is an inherent ambiguity.
I'm going to use primitive hexagonal as an example lattice here, but the concepts apply to all lattices. With lattices displaying a physical shape consistent with hexagonal, there are 4 possible point groups, sorted here in ascending symmetry:
3 < 32 (312 + 321) = 6 < 622
If you have a full or partial dataset, you can get a good handle on the point group by scaling in different space groups. For example, all hexagonal lattices are compatible with point group 3 . So if you give Scalepack the space group P3, then you can get some baseline statistics. If it doesn't scale in P3, either you have a real problem with integration (check by scaling in P1) or it's really not a trigonal/hexagonal lattice at all. Try reintegrating with a different lattice if it fails to scale well in P3.
Point group 3 is possible, but point groups 32, 6 and 622 all scale perfectly well in point group 622 also. So you should try each point group to see which point groups your data scales well in. In Scalepack you do this by trying a representative space group from each of these, so this could be: P3, P321, P312, P6, P622. Since you know what a "reasonable" overall chi**2 is (1.0, or the value from the scaling in P3), it'll become very obvious when you've got the wrong point group. In a quirk, space groups P321 and P312 are incompatible with each other, even though both belong to point group 32 - I think this is the only example of this, and I tend to treat point group 32 as really being two point groups: 312 and 321 to deal with this issue. If it scales in space group P321, it's not going to scale in the other (P312) unless it's really point group 622. So you simply try all possible point groups and typically choose the one with the highest symmetry that is consitent with your data. Note that your Rsymm will probably increase a little if you scale in P622 instead of P3, because your redundancy is typically four times higher in P622 than P3. Point groups 622 and 32 (321 type) are common, 6 less common, 3 and 32 (312) type border on rare.
Then you can try your luck at chosing space groups. The only way to do this is to inspect the intensities and I/sigmaI values of absences. In the table below I give the expected systematic absences for certain symmetries along certain crystallographic axes. If you have strong reflections in the absences list, then it's probably not this symmetry. However nearly all of us have been fooled by this phenomenon, which is sensitive to the correct integration of just a few reflections out of tens of thousands - I saw systematic absences in a P212121 space group that made it look a lot like P21212, over multiple datasets, despite the fact that the structure was really P212121. It's best not to assume too much about the space group, even though your estimate of the point group may be reliable.
Symmetry axis | Along | Condition | Notes |
---|---|---|---|
21 | A | H00 absent for H=odd | |
21 | B | 0K0 absent for K=odd | |
21 | C | 00L absent for L=odd | |
31 | C | 00L absent for L<>3n | |
32 | C | 00L absent for L<>3n | |
41 | C | 00L absent for L<>4n | |
42 | C | 00L absent for L<>2n | Like 21 |
43 | C | 00L absent for L<>4n | |
61 | C | 00L absent for L<>6n | |
62 | C | 00L absent for L<>3n | Like 32 |
63 | C | 00L absent for L<>2n | Like 21 |
64 | C | 00L absent for L<>3n | Like 32 |
65 | C | 00L absent for L<>6n |
Pure rotation axes (2,3,4,6) NEVER give rise to systematic absences along H00, 0K0, 00L - only screw axes do . Note also that enantiomorphic space groups (e.g. P3121 and P3221; P41212 and P43212) cannot be distinguished by any method within Scalepack. You have to try both pairs of enantiomorphs when solving the structure.
Note that non-primitive lattices like C, F and I have global systematic absences because their lattices a centro-symmetric (e.g. for C-lattice space groups like C2, h+k=odd is missing for the entire dataset). However denzo doesn't even attempt to integrate the "missing" reflections since they are not present and therefore these reflections don't even show up in the reflection lists.
Overlaps, Overloads and Other Glitches
Overlaps occur when one spot overlaps the spot region of another spot. This occurs mainly in three cases: you have your detector to crystal distance set too short for the cell dimensions; the oscillation range for each frame is too large; the mosaicity is large. You can't change the last one, but the first two are within your control. Move the detector further back and the spots separate more, at the cost of reduced maximum resolution at the edge of the detector. Alternatively, reduce the oscillation range in each frame - we rarely use more than 1.0 degree per frame except for really small unit cells, and we have used as little as 0.4 degrees per frame. However you won't gain anything if your mosaicity is much larger than the frame size. Generally speaking you don't gain anything if your frame size is less than 2/3 of the mosaicity.Overloads - i.e. saturated pixels. CCDs at synchrotrons have a limited dynamic range. If you expose your crystal for a long time, typically the more intense low-resolution spots will saturate and be rejected since they exceed the pixel overload value in the file (for a 16-bit data frame the overload value is ~65,000). There are two methods to avoid this: reduce your exposure time (your data will get weaker); or collect a second low-resolution pass with larger frame widths and shorter exposure times to collect the data that was overloaded the first time around. Always collect the high resolution pass first since radiation damage affects high resolution reflections more than low resolution reflections. You then process your second low-res pass at a lower resolution than your high-res pass and merge the whole lot together in Scalepack.
Missing low resolution data occurs typically because the beam stop is too close. The closer the beam stop is to the crystal, the large the beam stop shadow becomes and the more low resolution data is lost. Although it is often an advantage to keep the beam stop reasonably close, it's also good to keep your eye on this phenomenon. Note: for partial datasets it's normal for the data completeness in higher resolution shells to be a little higher than lower resolution shells during data collection - this is a consequence of the curvature of the Ewald Sphere.
Other glitches are myriad: beam instability (keep an eye on those ion counters) at synchrotrons, mis-centered xtals (the frames start to appear weak with low scale factors in Scalepack), split spots, twinned crystals, beam dumps etc. The key factor is the more attention you pay to your data collection and processing the better your data will be . The values of the per-frame scale and B-factors reveal a lot about basic data collection performance. The values of chi**2 and Rsymm reveal a lot about the quality of your crystal. There are plenty of bad crystals around, but there's no excuse for collecting bad data on good crystals.
Processing MAD Data
Processing MAD data has the same basic challenges as processing conventional data (it is, after all, still diffracton data). However the anomalous signal in MAD data is often very small so a GREAT deal of care must be taken during data integration and scaling.One of the important things is that the same crystal orientation angles must be used for all 3 wavelengths during integration otherwise conceivably the anomalous signal may get mixed up (this shouldn't be an issue with Denzo, which should index the data in the same absolute hand each time). To do this one auto-indexes the first dataset in the conventional manner, but then use these auto-indexed values to integrate the subsequent datasets. To do this create a known.dat file that looks like an index.dat file with a few modifications:
format ccd adsc unsupported-q4 box print 2.4 2.4 spot elliptical 0.5 0.60 0.0 background elliptical 0.6 0.7 0.0 overlap spot wavelength 1.009315 monochromator 0.9 air absorption length 3450 overload value 55000 error density 0.15 error systematic 5.0 partiality 0.1 positional 0.015 Y scale -1.0 skew 0.000 film rotation 90.0 film length 187.8 width 187.8 top margin 0.4 left margin 0.4 resolution limits 20. 3.7 mosaicity 1.0 profile fitting radius 20.0 space group C222 distance 250.00 X beam 88.6 Y beam 97.2 oscillation start 0.0 step -1.00 range 1.00 [oscillation start 0.0 range -1.0] [Note - oscillation step is negative for F1, F2; positive for A1] TITLE 'on F1 q4 at chess' raw data file '../mb150_1_###.img' film output file 'mb150_1_###.x' write predictions unit cell 162.525 228.512 81.630 90.000 90.000 90.000 crystal rotx -91.623 roty -10.938 rotz -141.952 cassette rotx -0.01 roty -0.57 rotz 0.00 2 theta 0.00 distance 252.48 x beam 88.658 y beam 97.158 crossfire y -0.027 x 0.002 xy 0.014 oscillation start 0. sector 1First note that we have got rid of the peak search file 'peaks.file' line since this causes Denzo to auto-index data, which is what we want to avoid. Then we take some information from a ".x" file that we trust the integration of. Minimally we must supply the unit cell and crystal rotx/y/z values. It's also a good idea to update the cassette rotx/y/z (detector skew), distance and direct beam positions. Create a new known.dat file for each dataset, and change the file names and wavelength. If you collect MAD data in the way we normally do, then things like distance and starting position will always be the same.
We then use this to integrate as follows:
denzo @known.dat oscillation start 0. sector 1 to 20 @integrate.dat stopFor scaling MAD data it is often useful to be exceptionally careful about scaling parameters. We often use a method which scales all three wavelengths together in one scale.com, then use these scale factors in subsequent scale files for each wavelength:
NUMBER OF ZONES 10 ESTIMATED ERROR 0.030 0.030 0.030 0.030 0.030 0.030 0.040 0.040 0.040 0.040 RESOLUTION 2.5 IGNORE OVERLOADS REJECTION PROBABILITY 0.000001 WRITE REJECTION FILE 0.5 ERROR SCALE FACTOR 2.2 ADD PARTIALS 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 SPACE GROUP P212121 [SCALE RESTRAIN 0.01] [B RESTRAIN 0.1] POSTREFINE 10 FIT CRYSTAL MOSXX 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 FIT BATCH ROTX 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 FIT BATCH ROTY 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 FIT CRYSTAL A* 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 FIT CRYSTAL B* 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 FIT CRYSTAL C* 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625 END FIT @reject OUTPUT FILE 'scale_all.hkl' REFERENCE FILM 1 FORMAT DENZO_IP SECTOR 1 to 60 FILE 1 '../l1p/l1p_1_###.x' SECTOR 1 to 60 FILE 101 '../l1n/l1n_1_###.x' SECTOR 1 to 60 FILE 201 '../l2p/l2p_1_###.x' SECTOR 1 to 60 FILE 301 '../l2n/l2n_1_###.x' SECTOR 1 to 60 FILE 401 '../l3p/l3p_1_###.x' SECTOR 1 to 60 FILE 501 '../l3n/l3n_1_###.x' SECTOR 1 to 25 FILE 601 '../l4p/l4p_1_###.x'which scales data from all three wavelengths (l1p/n, l2p/n, l3p/n) together. We then hack the refined scale and B-factors out of the Scalepack log file to create per-wavelength scaling files:
NUMBER OF ZONES 10 ESTIMATED ERROR 0.030 0.030 0.030 0.030 0.030 0.030 0.040 0.040 0.040 0.040 RESOLUTION 2.7 IGNORE OVERLOADS REJECTION PROBABILITY 0.000001 WRITE REJECTION FILE 0.5 ERROR SCALE FACTOR 1.500 ADD PARTIALS 1 to 60, 101 to 160 SPACE GROUP p212121 INITIAL SCALE 1 1.0000 2 1.0079 3 0.9966 4 1.0029 5 0.9920 6 0.9959 7 1.0082 8 1.0019 9 0.9992 10 1.0365 11 1.0105 12 0.9950 13 1.0243 14 1.0335 15 1.0359 16 0.9671 17 0.9618 18 0.9776 19 0.9566 20 0.9814 21 0.9874 22 0.9947 23 1.0099 24 1.0183 25 1.0447 26 1.0490 27 1.0816 28 1.0907 29 1.1033 30 1.1307 31 1.0189 32 1.0490 33 1.0856 34 1.0513 35 1.1064 36 1.1247 37 1.1401 38 1.1623 39 1.1721 40 1.2152 41 1.2312 42 1.2234 43 1.2600 44 1.2924 45 1.3380 46 1.2820 47 1.3188 48 1.3429 49 1.3778 50 1.4128 51 1.4299 52 1.5028 53 1.4968 54 1.5213 55 1.5223 56 1.5417 57 1.5847 58 1.5760 59 1.5821 60 1.6183 101 1.0130 102 1.0164 103 1.0041 104 1.0118 105 1.0064 106 0.9940 107 1.0103 108 0.9947 109 1.0230 110 1.0071 111 1.0090 112 1.0181 113 1.0161 114 1.0291 115 1.0348 116 0.9611 117 0.9629 118 0.9729 119 0.9644 120 0.9955 121 0.9918 122 0.9948 123 1.0200 124 1.0187 125 1.0491 126 1.0649 127 1.0775 128 1.1079 129 1.1104 130 1.1417 131 1.0692 132 1.1063 133 1.1350 134 1.1436 135 1.1895 136 1.2045 137 1.2318 138 1.2695 139 1.2832 140 1.3234 141 1.3504 142 1.3780 143 1.4327 144 1.4762 145 1.4917 146 1.4543 147 1.5094 148 1.5126 149 1.5599 150 1.5513 151 1.5989 152 1.6041 153 1.6317 154 1.6185 155 1.6073 156 1.6294 157 1.6461 158 1.6244 159 1.6360 160 1.6160 INITIAL B FACTOR 1 0.00 2 0.13 3 -0.07 4 -0.13 5 -0.18 6 -0.24 7 0.16 8 -0.41 9 -0.25 10 0.39 11 -0.57 12 -0.89 13 -0.41 14 0.02 15 -0.35 16 0.56 17 0.00 18 0.94 19 -0.28 20 0.30 21 -0.06 22 0.44 23 0.15 24 0.04 25 0.76 26 0.53 27 0.48 28 0.36 29 0.80 30 1.44 31 1.41 32 1.48 33 2.25 34 1.47 35 2.47 36 2.15 37 2.41 38 2.61 39 2.32 40 3.09 41 3.10 42 2.66 43 3.34 44 2.83 45 3.56 46 3.36 47 3.37 48 3.51 49 4.06 50 4.53 51 3.61 52 4.50 53 4.30 54 4.00 55 4.25 56 4.02 57 4.77 58 4.69 59 4.76 60 5.20 101 0.54 102 0.42 103 0.18 104 0.31 105 0.27 106 -0.16 107 0.36 108 -0.09 109 0.46 110 0.18 111 -0.20 112 0.07 113 -0.18 114 0.19 115 -0.12 116 0.42 117 0.33 118 0.83 119 0.33 120 0.61 121 0.00 122 0.35 123 0.39 124 0.13 125 0.78 126 0.83 127 0.75 128 0.86 129 0.58 130 1.32 131 1.13 132 1.38 133 1.58 134 1.44 135 2.01 136 1.67 137 2.00 138 2.40 139 2.06 140 2.26 141 2.49 142 2.86 143 2.97 144 3.46 145 3.62 146 3.72 147 4.38 148 3.89 149 4.81 150 4.72 151 4.83 152 5.06 153 5.68 154 5.23 155 5.27 156 5.53 157 5.87 158 5.87 159 5.74 160 5.89 NUMBER OF ITERATIONS 0 POSTREFINE 10 FIT CRYSTAL MOSXX 1 to 60, 101 to 160 FIT BATCH ROTX 1 to 60, 101 to 160 FIT BATCH ROTY 1 to 60, 101 to 160 FIT CRYSTAL A* 1 to 60, 101 to 160 FIT CRYSTAL B* 1 to 60, 101 to 160 FIT CRYSTAL C* 1 to 60, 101 to 160 END FIT @reject_l1pn OUTPUT FILE 'l1pn_ano.hkl' [REFERENCE FILM 1] FORMAT DENZO_IP SECTOR 1 to 60 FILE 1 '../l1p/l1p_1_###.x' SECTOR 1 to 60 FILE 101 '../l1n/l1n_1_###.x' [omit this in the first few passes, then include it later] ANOMALOUSNote that the NUMBER OF ITERATIONS 0 line prevents refinement of these scale/B-factors. You just cut and paste the relevant scale factors from the scale_all log file. (These files were from Woo Joo's successful MAD phasing of p53:BP1 complex using Zn signal collected on X4A using a data with embedded ice).
In order to use the anomalous data for phasing within CCP4 or some other progam you need to write out .hkl files with the anomalous data (I+ and I-) separate. The keyword ANOMALOUS just writes out the I+ and I- data separately in the output file, but treats I+ and I- as equivalent during scaling. This works fine for small anomalous scatering signals and is especially important for data with low redundancies. SCALE ANOMALOUS treats I+ and I- as distinct during scaling, and should not be used unless you have a lot of redundancy and/or you have a relatively large anomalous signal. For Solve, output the data with no merge original index specified in the scalepack file.