X-ray Data Processing

Originally created May 10th 2002
Revised Sept 16th 2002 to include MAD examples, another spot profile
Sundry small revisions Feb 2005, more in Nov 2006

Introduction

This is not meant to teach you anything about how to collect data on home or synchrotron sources. This is intended to teach you how to process the data once you have collected it. Collecting 180 degrees of data "just in case" is not a substitute for using one's brain and experience for figuring out how to collect it better.

It is a very good idea to process your data while you are collecting it. Frequently you can catch errors in data collection strategy or machine problems by doing this. If you wait until the end of the data collection you have no recourse - you are stuck with what you have because your crystal has long since died from radiation damage.

~~There is an alternative data processing tutorial accessible on the Web.~~ . Or at least there was.

Where are my frames ?

At the synchrotron: Best run a few test frames before you start and see where it is writing them. Beamline policies vary. At NSLS X25 and X29 the latest "upgrades" to CBASS do something asinine like write successive test images collected at 0 and 90 degrees into different directories. (Heaven knows what the software developer was thinking). However contiguous data collection wedges are usually written into one directory. This might not have the numbering you expect it to, so the best thing is to watch the status window and look for something like /img11/data1/pxuser/myname/ ...

At Princeton: That's a good question. They're usually where you put them when you FTP'd them off the host machine. If there's space, put them somewhere on Xray8 temporarily, say in /usr/people5/data/public, then back them up and remove them once you have processed the data. Xray8 is currently the fastest box that can do data processing, and it's a lot faster if the images are on the local disk because I/O over the network is s-l-o-w. With the upcoming data collection machine upgrade, the images will be accessible on the data collection computer itself.

I have recently made some disk space available in the directory /usr/people5/data/public for the purposes of storing frames for data processing. Here's what you do to copy your frames:

cd /usr/people5/data/public
find /collect -mtime -1 -type f -size +3000
cp /wherever/you/found/them/*.osc .

the first line should be pretty obvious, the second line gives you a list of all recently created (-1 = within the last day) large (+3000 = more than 3Kblocks) files on /collect. Your frames should be amongst those. The last line simply copies the data from the desired location reported by find into the current directory (i.e. /usr/people5/data/public).

Auto-Indexing Your Data

What follows is specific to the HKL suite of programs, namely xdisp, denzo and scalepack. There are alternative data processing programs (e.g. MOSFLM, XDS) but we do not use them, or at least we're too lazy to try most of the time. One would expect very similar results from HKL and MOSFLM (or we're all in for a lot of data re-processing). Some people have suggested that MOSFLM can give slightly more complete data in difficult situations - it's certainly worth a try. Some people have suggested that DENZO and SCALEPACK are the worst of these three options, but the jury is still out on that one and the difference is generally not large even if it exists. HKL has a graphical user interface (GUI) on the program, repackaged as HKL2000. Being a Luddite I react to HKL2000 with something akin to loathing since it has some really ill-advised restrictions on how many times you can attempt to index your data that actually makes it about 10x less convenient to process troublesome data than just hacking the command files. There are also now meta-processing programs like autoPROC and Xia2 that use other data processing programs (and that's usually XDS or MOSFLM). XDS and MOSFLM are better choices than HKL if you don't already have HKL.

The pattern of spots on the detector is dependent on basic detector geometry (how big it is, where it is with respect to your crystal), the X-ray wavelength, the direct beam position, the unit cell dimensions and the orientation of the crystal. Most of these things are known beforehand to some approximation, so the data processing software has to find a total of nine parameters (6 unit cell dimensions, 3 crystal "missetting" angles) with which to best describe the diffraction pattern. A method called "auto-indexing" finds these parameters. The missetting angles tell the programs what angle your crystal is sitting at relative to some standard position. Often you know your cell dimensions which turns out to be a good sanity check for the results.

I'm assuming here that you know that a "spot" on the detector is nothing more than the detector intercepting a beam of scattered X-rays that is locally intense because of X-ray diffraction and the underlying scattering by the crystal. The "spot" in real space is related to a point (region) in diffraction space that is formed from the reciprocal space unit cell. Once the crystal is rotated such that the reciprocal lattice point is in diffraction condition, where the spot occurs is a matter of simple geometry. Since the relationship between the real space unit cell and reciprocal space unit cell is fixed, it's also a matter of simple geometry (and the Ewald sphere construction) to work out what spots are in diffraction condition.

The first step in figuring out what your unit cell and crystal missetting angles is to pick spots to tell Denzo where the diffraction maxima are on the frame. Launch XDISP to view your frame with the following syntax:

Detector Command
Raxis IIc xdisp raxis name_of_file
Raxis IV (e.g. home) xdisp raxis4 100 name_of_file
Quantum 4 CCD (e.g. CHESS) xdisp ccd adsc unsupported-q4 name_of_file
MAR 165 CCD (e.g. the old X9A) xdisp ccd unsupported-m165 name_of_file
ADSC Q315 (e.g. X29) xdisp ccd unsupported-q315 binned name_of_file

Detector	Command
Raxis IIc	xdisp raxis name_of_file
Raxis IV (e.g. home)	xdisp raxis4 100 name_of_file
Quantum 4 CCD (e.g. CHESS)	xdisp ccd adsc unsupported-q4 name_of_file
MAR 165 CCD (e.g. the old X9A)	xdisp ccd unsupported-m165 name_of_file
ADSC Q315 (e.g. X29)	xdisp ccd unsupported-q315 binned name_of_file

XDISP is /programs2/hkl/HKLsgi_1.97.2/xdisp on the SGIs, and /programs2/denzo/xdisp on Helium. If you've done source /labcshrc in your .cshrc, the command sethkl7 will create the appropriate aliases.

The screen on XDISP looks like this:

This is a synchrotron frame from NSLS X9A - the white horizontal band through the middle is the beamstop holder shadow. The white central disk is the beamstop shadow - synchrotrons typically have horizontal rotation axes and therefore horizontal beamstops. Atypically for a CCD detector the Mar-165 detector is round (they are usually square). At home the beamstop would be oriented vertically (vertical rotation axis) and the detector is square.

There are various options on this screen, not all of which are important here. You may prefer the color representation (color button) to the black-and-white representation.

Adjust the view of the frame to your taste using the "Dim" and "Bright" buttons. You can take a closer look at areas of the image using the zoom window and you select areas to be looked at using the middle mouse button.

Middle mouse works in both the main window and the zoom window. "Zoom In" and "Zoom Out" are obvious. If you zoom in far enough the actual numeric pixel values are displayed. For the purposes of auto-indexing, you want to press the "Peak Search" button and select "More Peaks", "Fewer Peaks" or "OK" depending on how many it finds. Generally a few hundred peaks is sufficient to get auto indexing to work OK. Red circles surround the peaks selected. It's OK to have peaks that are not on diffraction spots - Denzo is good at sorting out the real peaks from the bogus ones.

Leave the XDISP window open, and start Denzo. You may want to put XDISP into the background (via ^Z, bg or the & syntax) to run Denzo in the same terminal window. Note that the resolution estimates in XDISP are wrong until you do auto-indexing, since XDISP knows nothing about the experimental setup until Denzo tells it something.

Denzo needs to know quite a lot about your experiment to get started. Below is an annotated "index.dat" file that can be read into Denzo. Comments in Denzo are enclosed within square parentheses []. Everything else in Denzo is a command line. Start indexing by the following procedure ($ represents a Unix command line prompt here):

$ denzo
@index.dat

The "@" syntax tells Denzo to read command data from the file. Denzo is /programs2/hkl/HKLsgi_1.97.2/denzo on the SGIs, and /programs2/denzo/denzo on Helium and something else altogether at X9A/X29/X25/CHESS.

Here are the contents of index.dat for an Raxis IV:

format raxis4          [ Tell Denzo what type of detector you are using ]
                       [ This also sets default parameters for detector and ]
                       [ goniostat geometry etc ]
monochromator filter   [ This is the type of X-ray optics i.e. mirrors ]

   [ Relatively constant values for a given machine ]
wavelength 1.5418      [ Wavelength of X-rays.  1.5418 is CuKalpha ]
Y scale  1.000         [ The non-squareness of pixels, in this case square]
film rotation 180.0    [ Relative orientation of the detector ]

   [ Detector values that may vary for a particular experiment ]
distance 204.5         [ Distance from the crystal to the detector ]
X beam 145.2           [ Where the direct beam would strike the detector ]
Y beam 149.4           [ These values may change with distance, 2theta ]


   [ Things that need to be optimized for each crystal ]
spot elliptical 0.60 0.60 0.0        [ The size/shape of the actual spot ]
background elliptical 0.70 0.70 0.0  [ Size/shape of guard region around spot ]
box print 2.1 2.1                    [ Size/shape of background box ]
overlap spot                         [ Reject spots that overlap ]
profile fitting radius 30.0          [ Parameter that affects how spot profiles ]
                                     [ are calculated ]

   [ Mosaicity is a per-crystal parameter that describes spot width in the ]
   [ oscillation direction - set a reasonable value here and change it based ]
   [ on the results of Scalepack after processing the first 10 frames ]
mosaicity 0.4

   
resolution limits 20.0 3.2    [ Not critical, but set to reasonable value ]
space group P422              [ If you know this enter it here, otherwise use P1 ]

   [  Indexing paramter tweaks only necessary in difficult cases - consult ]
   [  the Denzo manual
[ longest vector 200 ]
[ weak level 5 ]


   [   Where the frames are, what they are called, how wide they are ]
oscillation range 1.0   [ 1.0 is typical ]
oscillation start 0.0	[ Arbitrary - usually best to leave this at zero ]
                        [ Some people prefer to change it to the actual phi ]

                        [ Denzo inserts integer sector numbers into ### ]
raw data file '/data2/ccd/mskcc/amoac12/l1p_1_###.img'
film output file 'l1p_1_###.x'

TITLE 'my data'

sector 1                [ This is the frame number - usually we index the ]
                        [ first frame of the dataset - must correspond to ]
                        [ the frame number you picked spots from ]

peak search file 'peaks.file'    [ Tells Denzo to do auto-indexing ]

write predictions       [ Some commands to control the formatting of output ]
print statistics
print zones

go                      [ Start the indexing ]

Change things like the direct beam position, the crystal to detector distance, frame width and name etc and run the script in Denzo. For parameters you don't know anything about, leave them at sensible defaults, and then go read the HKL manual as to what they mean. This particular indexing file works for the RaxisIV and RaxisIV++ detectors on our home sources. Indexing files for other locations (e.g. synchrotrons) differ somewhat, especially in detector type and geometry. This script file is the one I used for 1.97 Denzo. For versions of Denzo with HKL2000 it probably makes more sense to find the corresponding command file and edit that, since parameters might change with major version changes.

Auto-indexing is most sensitive to oscillation range, direct beam position, distance and wavelength. Make sure you have these parameters correct . Synchrotrons often have partially-transparent beamstops so you can often "see" the direct beam position. On a home source the direct beam position is usually listed on the machine. Synchrotrons also have notoriously incorrect detector distances so it is wise not to put too much stock in them. Unless you know what your space group or point group for this crystal is, set the space group to P1, at least initially.

Sometimes Denzo refuses to index. It gives you a candidate list of reasons why this happens, but often it's just user error - you tried to index a different frame than the one you picked spots from, or you have bad parameters in the index.dat file (wrong distance, wrong wavelength etc). If you did everything right, then try picking more or less spots and running indexing again. Or change the mosaicity. Notice that during indexing the spots that Denzo is using for indexing turn green in the XDisp display. If there are virtually no green circles shown on the frame that's a pretty good indication of trouble.

There is an alternative indexing server at http://adder.lbl.gov/labelit/ which you might try if you are desperate. However it's MOSFLM oriented so you'll have to convert between MOSFLM and HKL conventions (there's a denzo2mosflm program on the web, but not the inverse program so you're going to have to do some hacking).

The output from Denzo is as follows (just to be awkward I used a different indexing file from the example above):

 Oscillation data processing


 Title:  Mar CCD  data                                                                                  
 Wavelength (A)  1.0000                    
 Raster size (mm) 7.93480E-02                
 Raster size (mm) 7.93480E-02                
 Film width (mm)  162.50     (default)      
 Film length (mm)  162.50     (default)      
 Record length (pixels)  2048 (default)      
 Number of records  2048 (default)      
 Top limit of useful data 0.00000E+00 (default)      
 Left limit of useful data 0.00000E+00 (default)      
 spots rejected when pixel overflow at value :  64000.0
 pixels rejected at value:     0
 Oscillation starts at 0.00000E+00                
 Oscillation range   1.0000                    
 Lattice type: primitive    
 Orientation axis 1 (vertical plane)  0*h  1*k  0*l                
 Orientation axis 2 (spindle)  0*h  0*k  1*l                
 Mosaicity   0.25000                    
 CrysZ (beam) axis     0.00000E+00 (default)      
 CrysY (vertical) axis 0.00000E+00 (default)      
 CrysX (spindle) axis  0.00000E+00 (default)      
 unit cell parameters not entered
 Detector (mis)orientation angles:
 CassZ (beam) axis     0.00000E+00 (default)      
 CassY (vertical) axis 0.00000E+00 (default)      
 CassX (spindle) axis  0.00000E+00 (default)      
 Detector 2 theta 0.00000E+00 (default)      
 Detector rotation  -90.000                    
 Flat detector (default)      
 Detector to crystal distance   350.00                    
 X beam    81.500                    
 Y beam    82.400                    
 Beam polarization 0.98000                    
 Detector absorption  100.00     (default)      
 Air absorption length  3450.0                    
 Crossfire y  0.00000E+00 (default)      
 Crossfire x  0.00000E+00 (default)      
 Crossfire xy 0.00000E+00 (default)      
 Horizontal box size  2.3804                    
 Vertical box size    2.3804                    
 Overlap type : none                  
 Raw data file /xtreme4/data2/raxis/X9A/ctmp4/ctmp4_2_001.img
 Error increase due to pixel overflow 1.58696E-02 (default)      
 Error in measurments of optical density 0.15000                    
 Minimum positional error 5.00000E-02                
 Error increase when too close to X axis 0.20000     (default)      
 Error of partiality 0.10000                    
 Systematic error factor  5.0000                    
 Resolution in the corner   3.153 edge   4.334 half corner   3.605
 Highest resolution  2.8000                    
 Lowest resolution   30.000                    
 Spot too weak for refinement when below sigma *  3.0000                    
 Beam spot not used in refinement (default)      
 Profile fitting radius  20.000                    
 peak search file peaks.file has    192 peaks
 Vector lengths in autoindexing from   25.0 to   667 Angstroms

 Volume of the primitive cell  1560777.


 Lattice          Metric tensor    Best cell (symmetrized)
               distortion index    Best cell (without symmetry restrains)

 primitive cubic         95.80%  58.11  58.38 515.01  90.22  89.67  63.31
                                210.50 210.50 210.50  90.00  90.00  90.00

 I centred cubic        130.58% 517.95  61.13 518.09  87.11 169.02  92.83
                                365.72 365.72 365.72  90.00  90.00  90.00

 F centred cubic        128.39%  61.13  99.161031.28  89.92  92.87  90.30
                                397.19 397.19 397.19  90.00  90.00  90.00

 primitive rhombohedral   2.07% 518.07 517.95 515.01   6.44   6.78   6.46
                                517.01 517.01 517.01   6.56   6.56   6.56
                                 59.75  59.751547.65  90.00  90.00 120.00

 primitive hexagonal      1.61%  58.11  61.13 515.01  90.53  89.67 121.43
                                 59.62  59.62 515.01  90.00  90.00 120.00

 primitive tetragonal    11.85%  58.11  58.38 515.01  90.22  89.67  63.31
                                 58.24  58.24 515.01  90.00  90.00  90.00

 I centred tetragonal    11.93%  58.11  58.381031.28  88.43  91.44  63.31
                                 58.24  58.241031.28  90.00  90.00  90.00

 primitive orthorhombic  11.85%  58.11  58.38 515.01  90.22  89.67  63.31
                                 58.11  58.38 515.01  90.00  90.00  90.00

 C centred orthorhombic   0.25%  61.13  99.16 515.01  89.94  89.47  90.30
                                 61.13  99.16 515.01  90.00  90.00  90.00

 I centred orthorhombic  11.93%  58.11  58.381031.28  88.43  91.44  63.31
                                 58.11  58.381031.28  90.00  90.00  90.00

 F centred orthorhombic   1.19%  61.13  99.161031.28  89.92  92.87  90.30
                                 61.13  99.161031.28  90.00  90.00  90.00

 primitive monoclinic     0.22%  58.11 515.01  58.38  90.22 116.69  90.33
                                 58.11 515.01  58.38  90.00 116.69  90.00

 C centred monoclinic     0.13%  61.13  99.16 515.01  90.06  90.53  90.30
                                 61.13  99.16 515.01  90.00  90.53  90.00

 primitive triclinic      0.00%  58.11  58.38 515.01  90.22  90.33 116.69

 autoindex unit cell   57.25   57.25  514.99   90.00   90.00  120.00

 crystal rotx, roty, rotz   -2.261   53.683  -91.468

  Autoindex Xbeam, Ybeam   81.63   82.60

 position  179 chi**2 x   3.26 y   8.38 pred. decrease:   0.000 *  179 =      0.0
 partiality  179 chi**2            4.16 pred. decrease:   0.000 *  179 =      0.0

I've highlighted the "interesting" parts in red. The top of the output just reiterates what you've put in with index.dat, plus showing all the default values Denzo has set that you've not modified. If you are having real problems indexing look through that section carefully to check for weird parameter values. The most interesting part of the output usually is the lattice table. Primitive triclinic is always a valid indexing for a lattice. Denzo lists the primitive auto-indexing cell it finds under "primitive triclinic" at the bottom of the lattice table.

In the remainder of the table, Denzo tries to fit this primitive triclinic cell into the symmetry constraints in other lattice systems (e.g. in primitive hexagonal, a=b, alpha=beta=90, gamma=120). The metric tensor distortion index is a measure of how much Denzo has to mangle the primitive triclinic cell to do this. Low numbers mean that the indexed lattice is consistent with a lattice type. High numbers mean that it is not.

The actual values of "good" % numbers depend on how accurate your direct beam position etc. was, and how strong your data is. Generally anything less than 2% is a good candidate. Anything less than 3% should at least be considered. In this case the true space group is P6122 (primitive hexagonal) with a distortion value of 1.61% (a little high - that large unit cell dimension doesn't help).

Note that Denzo only looks at the physical dimensions of your lattice . Since you haven't integrated the data it has no idea what the intensities are, so it cannot know which of the possible lattice types are the correct one for your data. You can only test data symmetry upon scaling later on with Scalepack. The best approach is to integrate 10-15 frames using the highest symmetry lattice that gives a low distortion index, scale it, and if this doesn't work then try something further down the list. Note that all data is compatible with P1 so if the data does not scale in P1 something has gone awry. For large cell dimensions as in the above example, the correct direct beam position is essential.

Once you have selected what you think your lattice is, run auto-indexing again with a space group consistent with this lattice. It doesn't really make much difference which of the several possible space groups you choose within a given lattice, since Denzo only worries about lattice dimensions, not the symmetry of the data.

Lattice system Space groups Point Group(s)

Primitive cubic P23, P2₁3 23

P432, P4₁32, P4₃32 432

I centred cubic I23, I2₁3 23

I432, I4₁32 432

F centred cubic F23 23

F432, F4₁32 432

Primitive rhombohedral R3, R32 3, 32

Primitive hexagonal P3, P3₁, P3₂ 3

P321, P3₁21, P3₂21 32

P312, P3₁12, P3₂12 32

P6, P6₁, P6₂, P6₃, P6₄, P6₅ 6

P622, P6₁22, P6₂22, P6₃22, P6₄22, P6₅22 622

Primitive tetragonal P4, P4₁, P4₂, P4₃, 4

P422, P4₁22, P4₂22, P4₃22, P42₁2, P4₁2₁2, P4₂2₁2, P4₃2₁2 422

Primitive orthorhombic P222, P222₁, P2₁2₁2, P2₁2₁2₁ 222

C centred orthorhombic C222, C222₁ 222

I centred orthorhombic I222, I2₁2₁2₁ 222

F centred orthorhombic F222 222

Primitive monoclinic P2₁, P2 2

C centred monoclinic C2 2

Primitive triclinic P1 1

Lattice system	Space groups	Point Group(s)
Primitive cubic	P23, P2₁3	23
	P432, P4₁32, P4₃32	432
I centred cubic	I23, I2₁3	23
	I432, I4₁32	432
F centred cubic	F23	23
	F432, F4₁32	432
Primitive rhombohedral	R3, R32	3, 32
Primitive hexagonal	P3, P3₁, P3₂	3
	P321, P3₁21, P3₂21	32
	P312, P3₁12, P3₂12	32
	P6, P6₁, P6₂, P6₃, P6₄, P6₅	6
	P622, P6₁22, P6₂22, P6₃22, P6₄22, P6₅22	622
Primitive tetragonal	P4, P4₁, P4₂, P4₃,	4
	P422, P4₁22, P4₂22, P4₃22, P42₁2, P4₁2₁2, P4₂2₁2, P4₃2₁2	422
Primitive orthorhombic	P222, P222₁, P2₁2₁2, P2₁2₁2₁	222
C centred orthorhombic	C222, C222₁	222
I centred orthorhombic	I222, I2₁2₁2₁	222
F centred orthorhombic	F222	222
Primitive monoclinic	P2₁, P2	2
C centred monoclinic	C2	2
Primitive triclinic	P1	1

Only space groups compatible with chiral molecules (proteins etc) are listed. Look at the number of potential space groups in primitive hexagonal ! There are actually four point groups here: 3, 32 (i.e. 312 and 321) 6, 622 each with it's own set of space groups. We'll discuss how to distinguish them later, but the point is that it doesn't matter which one you pick since the lattice indexing and geometry is identical between all possibilities - symmetry constraints on intensity only come into play in Scalepack.

The nine parameters that we were originally in search of are listed as the six from "autoindex unit cell" and three from "crystal rotx, roty, rotz". Check that the autoindex Xbeam and Ybeam correspond relatively closely to the ones you used in index.dat - usually if they differ considerably the autoindexing has failed.

Now we need to test that the lattice really is consistent with the lattice system we pick. We do this by improving the fit between the observed diffraction peak positions and the corresponding predicted ones. To do this we use a refinement procedure embodied in the file integrate.dat:

start refinement
	
refine partiality
use partials position
weak level 10.

print no profiles
resolution limits 20. 4.6
fix all
fit crystal rotx roty rotz
fit x beam y beam distance
go go go go go go

resolution limits 20. 4.6
go go go go go go
fix distance
fit cell
go go go go go

resolution limits 20.0 4.0
[fit cassette rotx roty - probably not necessary to refine this]
go go go 
fix cell
fit distance
[fit Y scale - do not refine this ]
go go go
go go go go go go go go

print profiles 1 1
go
calculate go        [ this actually tells Denzo to integrate the frame ]
end of pack         [ this increments the file counter and oscillation range ]
                    [ in Denzo ready for the next frame ]

The syntax of this file should be pretty clear - we slowly increase the resolution of the data to be refined against, slowly increasing the number of parameters to be fit. The position Chi**2 ("chi squared") gives an idea of how good the fit is: if chi**2 is at or around 1.0, then the agreement between observed and calculated spot positions is as expected (this is good). If chi**2 is << 1.0 then the fit is "too good" and this is unusual. If the chi**2 is >> 1.0 then the fit is poor and you should look very closely to see if the lattice is indeed correctly predicted. The partiality Chi**2 should also be close to unity, but this has to do with the width of the peak along the direction of rotation (i.e. mosaicity/beam divergence).

The progress of the fit through refinement is taken from the output of Denzo:


 position  282 chi**2 x   1.71 y   5.15 pred. decrease:   0.000 *  282 =      0.0
 partiality  282 chi**2            1.19 pred. decrease:   0.000 *  282 =      0.0

 Highest resolution  4.6000     (input)        
 Lowest resolution   20.000     (input)        
 Spot too weak for refinement when below sigma *  10.000     (input)        

 position  261 chi**2 x   0.74 y   0.63 pred. decrease:   0.000 *  261 =      0.1
 partiality  283 chi**2            1.40 pred. decrease:   0.001 *  283 =      0.2
 CrysZ (beam)        -91.453 shift    0.020 error    0.017
 CrysY (vertical)     54.814 shift   -0.182 error    0.006
 CrysX (spindle)      -2.448 shift   -0.073 error    0.006
 distance            349.689 shift   -0.311 error    0.102
 X beam               81.573 shift   -0.022 error    0.016
 Y beam               82.703 shift    0.105 error    0.016
 Highest resolution  4.6000     (input)        
 Lowest resolution   20.000     (input)        

 position  263 chi**2 x   0.74 y   0.62 pred. decrease:   0.000 *  263 =      0.0
 partiality  285 chi**2            1.45 pred. decrease:   0.000 *  285 =      0.0
 CrysZ (beam)        -91.454 shift    0.000 error    0.017
 CrysY (vertical)     54.813 shift   -0.001 error    0.006
 CrysX (spindle)      -2.451 shift   -0.004 error    0.006
 distance            349.686 shift   -0.003 error    0.101
 X beam               81.573 shift    0.000 error    0.016
 Y beam               82.703 shift    0.000 error    0.016

 position  265 chi**2 x   0.61 y   0.48 pred. decrease:   0.000 *  265 =      0.0
 partiality  285 chi**2            1.40 pred. decrease:   0.000 *  285 =      0.0
 CrysZ (beam)        -91.450 shift    0.002 error    0.017
 CrysY (vertical)     54.815 shift    0.001 error    0.006
 CrysX (spindle)      -2.446 shift    0.002 error    0.006
 Cell, a   57.21    b   57.21    c  514.80 alpha  90.00 beta  90.00 gamma 120.00
 shifts     0.02                     -0.30                                      
 errors     0.02                      0.17                                      
 X beam               81.573 shift   -0.001 error    0.016
 Y beam               82.705 shift    0.002 error    0.016
 Highest resolution  4.0000     (input)        
 Lowest resolution   20.000     (input)        

 position  280 chi**2 x   0.43 y   0.34 pred. decrease:   0.002 *  280 =      0.5
 partiality  301 chi**2            1.42 pred. decrease:   0.000 *  301 =      0.1
 CrysZ (beam)        -91.450 shift    0.000 error    0.016
 CrysY (vertical)     54.814 shift   -0.001 error    0.006
 CrysX (spindle)      -2.444 shift    0.002 error    0.006
 Cell, a   57.23    b   57.23    c  514.70 alpha  90.00 beta  90.00 gamma 120.00
 shifts     0.02                     -0.10                                      
 errors     0.02                      0.16                                      
 CassY (vertical)     -0.046 shift   -0.046 error    0.133
 CassX (spindle)       0.232 shift    0.232 error    0.118
 X beam               81.594 shift    0.021 error    0.020
 Y beam               82.704 shift   -0.001 error    0.018

 position  280 chi**2 x   0.44 y   0.32 pred. decrease:   0.000 *  280 =      0.0
 partiality  305 chi**2            1.40 pred. decrease:   0.000 *  305 =      0.0
 CrysZ (beam)        -91.452 shift   -0.001 error    0.016
 CrysY (vertical)     54.814 shift    0.000 error    0.006
 CrysX (spindle)      -2.444 shift    0.002 error    0.006
 CassY (vertical)     -0.043 shift    0.003 error    0.133
 CassX (spindle)       0.242 shift    0.010 error    0.118
 distance            349.629 shift   -0.056 error    0.097
 X beam               81.593 shift   -0.001 error    0.020
 Y beam               82.704 shift   -0.001 error    0.018

 position  280 chi**2 x   0.43 y   0.32 pred. decrease:   0.000 *  280 =      0.0
 partiality  306 chi**2            1.40 pred. decrease:   0.000 *  306 =      0.0
 CrysZ (beam)        -91.452 shift    0.000 error    0.016
 CrysY (vertical)     54.814 shift    0.000 error    0.006
 CrysX (spindle)      -2.444 shift    0.001 error    0.006
 CassY (vertical)     -0.043 shift    0.000 error    0.133
 CassX (spindle)       0.242 shift    0.000 error    0.118
 distance            349.625 shift   -0.004 error    0.097
 X beam               81.593 shift   -0.001 error    0.020
 Y beam               82.703 shift    0.000 error    0.018


 position  280 chi**2 x   0.43 y   0.32 pred. decrease:   0.000 *  280 =      0.0
 partiality  305 chi**2            1.40 pred. decrease:   0.000 *  305 =      0.0

Notice that more reflections are added to the refinement as the fit improves because there are fewer extreme outliers. Denzo uses only strong spots for positional refinement based on your definition of weak level - you might consider reducing this from (e.g.) 7 to (e.g.) 3 for really weak data. Really weak data often causes real problems in auto-indexing for all sorts of reasons.

The chi**2 on partiality should also be 1.0, and this is primarily affected by your estimate of moasaicity. However there are other good ways to estimate mosaicity rather than monitoring chi**2, as discussed below.

You should always check the correspondance for predicted and observed spot locations in XDISP . If a pattern of green, red and yellow circles don't already show on the window, press "Update Predictions" until they do. Green circles are fully recorded reflections ("fulls") that pass completely through diffraction condition during the frame. If your mosaicity is greater than the frame size then you will have no fulls. Yellow circles are partial reflections ("partials") that are clipped at one or the other end of the frame but Denzo can add reflections from adjacent frames. Red circles are reflections with problems. Sometimes they are at or near the backstop. Sometimes they may have background problems (e.g. reflections near zingers on CCD detectors). Sometimes they will be very close to other spots (i.e. overlaps). Overlaps are bad - you've possibly mis-collected your data. Sometimes they will be too intense (i.e. overloads ). A few overloads are normal on CCD data, but almost unheard of on image plate data. If you have a lot of overloads you probably have a problem with your exposure time. If you have a lot of red circles then there is a problem with either your indexing or your data collection strategy. Note that until you've refined the crystal orientation using the integrate script above, many reflections may be flagged with red circles because the observed and predicted positions differ too much.

Press on the zoom window and take a close look at the predicted and observed locations of the diffraction data. If the fit is not precise, then you may need to re-index, or tweak the integration script. If there are more predicted reflections than on the frame than you've over-estimated the mosaicity value. Decrease it and try again. If there are more observed reflections than predicted frame than you've under-estimated the mosaicity value. Decrease it and try again.

This is an example of a prediction based on an indexing without refinement. In the zoom window you can see discrepancies between the predicted and observed spot positions. There are some spots flagged (red circles) due to the bad fit:

This is an example of the same frame after refinement, you can see how the predicted and observed spots are in close agreement.

In this case some spots are still flagged red, mainly because they are too close together for this very long cell dimension (i.e. overlaps). Denzo also flags reflections that lie outside the active area of the detector due to excessively optimistic resolution limits in integration.

This next part of the frame illustrates what we call a lune - the ellipsoidal arrangement of spots arises from a single lattice plane in reciprocal space cutting the Ewald sphere. A plane cutting a sphere generates an ellipse, but this ellipse is "fat" because the crystal is rocking through a solid angle. The "full" reflections with green circles lie in the middle of the lune because they come pass completely through diffraction condition during the frame. The "partials" lie on the edge of the lune because they are either not fully in or not fully out of diffraction condition at the start or end of the frame.

Here is the section without the predicted spots (i.e. just the observed data):

Here is the same section with the predictions overlayed:

Again check all those red-flagged reflections - in this case we have overlap problems but it's unusual to see too many "bad" reflections.

Selecting a Spot Profile and Mosaicity

The best way to optimize integration parameters is to do a small trial integration and look carefully at the output. In particular initial estimates for the spot size and the mosaicity may be somewhat non-optimal, so I always check these after integrating several frames and adjust as necessary.

The spot profile in this particular example is atypical. Denzo prints out the spot profiles for each of nine regions of the detector. The ones in the center of the detector (e.g. 2,2) should be pretty good, but sometimes the ones at the edge of the detector look crummy because you have little or very weak data out there. Spot profiles are constructed using strong reflections in each region. If you set the profile fitting radius too low, you can lose a lot of reflections because they have no viable spot profile - anisotropic data is prone to this since there may be no nearby strong spots to form the profile from. Make sure this parameter is set to at least 20, and probably more like 30.

 Averaged spot profile in sector 1, 2 (x,y) # of spots    139
 Weighted position of the spots   56.631,   95.020 (x,y)
  -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
  -2 -2 -1  0  0  1  1  1  1  1  0  0  0 -1 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
  -1 -1  0  1  2  3  4  5  5  4  3  2  1  0 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
  -1 -1  1  3  5  8 11 12 11  8  6  5  3  1  0
   .  .  .  .  .  -  -  -  -  -  .  .  .  .  .
  -1  0  2  4  8 14 21 23 19 13  9  7  5  2  0
   .  .  .  .  -  +  +  +  +  +  -  .  .  .  .
  -1  0  2  5 10 19 30 33 26 16 10  8  5  2  0
   .  .  .  -  +  +  +  +  +  +  +  -  .  .  .
  -1  0  2  5 11 22 38 45 34 18 10  7  4  1  0
   .  .  .  -  +  +  +  +  +  +  +  -  .  .  .
   0  1  2  4 10 23 46 58 40 18  8  5  2  0 -1
   .  .  .  -  +  +  +  +  +  +  +  -  .  .  .
   0  1  2  4  9 21 45 55 35 15  7  2  1  0 -1
   .  .  .  -  +  +  +  +  +  +  +  -  .  .  .
   0  1  2  3  7 16 33 35 21 10  4  1  0  0 -1
   .  .  .  -  +  +  +  +  +  +  +  -  .  .  .
   0  0  1  3  5 10 16 16 10  5  2  1  0  0 -1
   .  .  .  .  -  +  +  +  +  +  -  .  .  .  .
   0  0  1  2  3  5  6  5  3  2  1  0 -1  0 -1
   .  .  .  .  .  -  -  -  -  -  .  .  .  .  .
  -1  0  0  1  1  2  2  1  1  0  0 -1 -1 -1 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
  -1 -1  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
  -1 -1  0 -1  0  0  0  0  0  0 -1 -1 -1 -1 -1
   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

This is a fairly typical spot profile, although perhaps the size of the spot box is a little too small (it's area should be 2-3x larger than the spot). But the subtracted background is relatively flat (near zero) and the spot itself is within the region marked with "+" indicating that it's actually going to be integrated.

 Averaged spot profile in sector 2, 2 (x,y) # of spots     17
 Weighted position of the spots   77.869,   91.588 (x,y)
  0 0 0 1 0 0-1-1 0-1-1 0 1 0 0-1 0 0-1 0 1 1 0 0 0 1 0-1 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  1 0-1 0 0 0-1-1 0 0-1-1 0 0 0-1 0 0-1-1 0 0 0 0 0 0 1 0-1 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0 0-1-1-1-1 0-1 0 0 0-1-1 0 0 0 0 0 0-1-1-1 0-1 0-1 0 0-1 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0 0-1-1-1-1-1 0 0 0-1-1-1-1-1 0 0 1 0-2-1-1-1 0 0 0 0-2-1 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0 0 0-1-1 0-1 0-1 0-1-1-1 0-1 0 0 0 0 0-1 0 0 0 0 0 0 0 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 -1 0 0-1 0 0-1-1 0 0-1 0 0 0-1-1 0 0 0 0 0 0 0-1-1-1-1 0 0-1
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0 1 0-1 0 0 0-1-1 0 0 0 0 1 0 0 0 0 1 1 0-1-1-1-1 0 0 0 0-1
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 -1 0 0-1 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0-1 0 0 0 0 1 1 0-1 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0-1-1 0 0 1 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 1 1 1 0 0 0-1 0 1
  . . . . . . . . . . . . . - - - - . . . . . . . . . . . . .
  0 0-1 0 0 1 1 1 1 0 0 0 1 2 2 3 3 2 1 1 0 0 1 1 1 1 1 0 0 0
  . . . . . . . . . . . - - - - - - - - . . . . . . . . . . .
  1 1 0 0 1 2 3 3 3 1 1 1 2 3 6 7 5 3 2 1 0 0 1 2 2 1 1 0 0 0
  . . . . . . . . . . - - - - - - - - - - . . . . . . . . . .
  1 1 1 1 2 3 6 6 4 2 2 2 2 6141510 5 2 2 0 1 3 4 4 1 1 1 1 0
  . . . . . . . . . - - - - - + + - - - - - . . . . . . . . .
  2 1 1 1 2 61314 7 3 3 3 412283419 7 3 2 1 2 5 9 9 4 2 2 0 1
  . . . . . . . . . - - - + + + + + + - - - . . . . . . . . .
  3 2 2 2 310242512 4 4 3 61851633111 5 3 2 2 71615 6 2 2 1 2
  . . . . . . . . - - - - + + + + + + - - - - . . . . . . . .
  4 2 1 1 415374017 6 4 4 72574924413 6 3 2 3 92122 9 2 2 1 2
  . . . . . . . . - - - + + + + + + + + - - - . . . . . . . .
  4 1 1 2 416444519 6 3 4 72781994713 5 3 2 3102222 9 3 2 1 2
  . . . . . . . . - - - + + + + + + + + - - - . . . . . . . .
  3 1 1 2 415363414 4 3 3 62262723410 4 3 2 3 81616 7 2 1 1 2
  . . . . . . . . - - - - + + + + + + - - - - . . . . . . . .
  2 1 1 2 3102017 7 3 2 2 414313617 5 3 2 2 2 4 9 9 3 1 0 0 1
  . . . . . . . . . - - - + + + + + + - - - . . . . . . . . .
  1 0-1 0 2 4 7 7 3 1 1 1 2 71313 7 2 1 0 1 1 2 3 3 1 0 0-1 0
  . . . . . . . . . - - - - - + + - - - - - . . . . . . . . .
  0 0 0 0 1 2 3 2 1 1 1 0 1 3 4 4 3 1 1 0 1 1 0 1 1 1 0 0-1 0
  . . . . . . . . . . - - - - - - - - - - . . . . . . . . . .
  0 1 1 0 1 1 1 1 0 0 1 0 0 1 2 3 2 0 1 1 0 0 0 1 0 0 0-1 0 0
  . . . . . . . . . . . - - - - - - - - . . . . . . . . . . .
  0-1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1-1 0 0 1 0 0 1
  . . . . . . . . . . . . . - - - - . . . . . . . . . . . . .
  0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0-1-1
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0-1 0 0-1-1 0 1 1 0 0 0 0 0 0 1 1 1-1 0 1 0-1 0 1 0 0-1-1 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  1 0 0-1 0 0 0-1 0 1 0 0 0 0 0 0 0 0 0 0 0-1 0 0-1 0 0 0 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0 0 0 0 0-1 0 0 0 0 0 0 0-1 0 1 0-1 0 0-1-1 0 0 0-1 0 0 0-1
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 -1 0 0 0 0-1-1-1-1 0 0 0-1 0-1-1-1 0 0-1-1 0 0-1 0 0 1 0 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 -1 0 0-1 0 0 0 0 0 0 0 0 0-1 0 0-1-2 0 0-1 0 0-1-1-2-1-1 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0-1-1-1-1-1-1-1-1 0-1 0-1-1-1-1-1-1-1 0 0-1 0-1-2-1 0-1 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  0-1 0 0-1-1-1-1-1-1-1 0 0-1 0-1 0-1-1 0 1-1-1-1-1 0 0-1 0 0
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

This is an atypical spot profile illustrating some problems - the background is fairly flat (good) and most of the spot is inside the "+" region i.e. it's going to be integrated, but there's some positive values outside the spot. This might mean you have satellite reflections that are not on integral Miller indices - some sort of supercell/subcell issue, or you might have mis-indexed your data and one of your unit cell axes is actually smaller than it needs to be. In this specific case I was lying to the program about spot size so I could integrate some data with a very long unit cell axis. Sometimes data processing is about the best compromise, but in the vast majority of cases you want the spots separated enough that you don't have to introduce systematic errors in the data just to get it processed.

The alternate lines show the spot profile values and the assignment of a pixel (. = background, - = guard region, + = spot).

The background region (".") is used to fit a least squares plane with which to subtract the background counts from the spot. If this has gone well all values in the background should be zero or close to it. In the second case there is intensity from neighboring spots in the background, but the subtraction still has worked since non-spot regions are still in the range -2 to +2. These neighboring spots do not contribute to the integrated intensity for this reflection because they are not in the spot region.

The guard region ("-") is simply an area near the spot that you want excluded from background or spot calculation (e.g. if your spot is smeary). The spot itself ("+") should fit the observed spot relatively closely, but should not clip it. In the second case the spot is defined somewhat too tight to try and reduce neighboring spots from being thrown out as overlaps. Sometimes you play this trade-off to get better (more complete) data, but it's nearly always better to make the spot too large, and let the learnt profile take out the excess pixels. Generally the guard region is 0.1 or 0.15 larger than the spot radius.

Spot parameters in the second case were:

ibox  30 30
spot elliptical       0.30  0.30
background elliptical 0.55  0.55

whereas typical synchrotron parameters would be:

ibox  21 21
spot elliptical       0.35  0.35
background elliptical 0.45  0.45

and those at home more like:

ibox  27 27
spot elliptical       0.45  0.45
background elliptical 0.55  0.55

although these parameters vary somewhat with the size of the beam, size of the crystal etc. The first spot profile is pretty good, although I would usually make the ibox larger and the spot a little larger (10-20%) to be on the safe side. What's confusing in Denzo is that "background" really means the size of the guard region, and "ibox" really defines the size of the area that's used to determine the background to be subtracted.

We typically use "overlap spot" in Denzo, where spots are only rejected if the actual spot region ("+") overlaps. This helps in cases where the spots are relatively close. There are more conservative overlap schemes, but overlap spot seems to work well enough.

For most well-behaved data the numerical spot size should match the observed spot size, with perhaps a small amount of extra space. Since Denzo does profile fitting the spot profile is learnt from strong spots and assumed to apply to all spots local to it. However for badly-behaved data one may want to play certain tricks: large unit cell dimensions often put the spots very close together on the detector - in this case one might systematically underestimate the spot size until you can integrate the data so that the data does not get thrown away due to overlaps; smeary spots often have a bright center with large trails on them due to crystal disorder/splitting - here also we (often substantially) underestimate the size of the true spot to only integrate the center of the smear which often gives better integrated intensities than when including all the disordered junk.

Crystal mosaicity can vary widely, from 0.2 for well-behaved crystals at synchrotron sources, to greater than 1.5 for badly-behaved crystals at home sources. Bad cryo conditions and heavy atom soaking often increase crystal mosaicity. If you get high mosaicity for native data, you may want to explore other stabilisation solutions and cryo buffers. If your crystal is thin, high mosaicities may also indicate handling problems (e.g. thin plate crystals often bend/warp during manipulation).

There's nothing inherently wrong with high mosaicity - it smears the data out along the rotation axis a bit making it a little weaker, and it can cause problems with overlaps, but data with high mosaicity can be processed perfectly well assuming you make a correct estimate of the true moscaicity. Mosaicity does cause more crowding on a frame, because more spots are spread over more frames, so high mosaicities can cause more overlap problems. HKL2000 refines mosaicity on a per-frame basis by default which for problem crystals means that the mosaicity estimate can inappropriately blow up due to modelling of spot smearing. In this case you should not refine mosaicity during data integration - this is another instance of the trade-off between "correctly" modeling a parameter and modeling it so that you can actually integrate the data. If the mosaicity is too high all the spots on the frame will overlap with those on subsequent and successive frames, and all your data will be thrown out.

One of the easiest ways to estimate mosaicity is to process the data and scale it with Scalepack and then reprocess it with the refined moscaicity value out of Scalepack. However it's not a bad thing to have the mosaicity estimated approximately correctly from the start.

Denzo shows a partiality histogram for each frame:


        zone          av. part.  -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140
  -0.125 - -0.112   1.364 +- 0.259     .*******************************************************
  -0.112 - -0.100   1.194 +- 0.350     .************************************************
  -0.100 - -0.087   0.644 +- 0.277     .**************************
  -0.087 - -0.075   1.071 +- 0.214     .*******************************************
  -0.075 - -0.062   0.874 +- 0.245     .***********************************
  -0.062 - -0.050   1.525 +- 0.223     .*******************************************************
  -0.050 - -0.037   0.970 +- 0.180     .***************************************
  -0.037 - -0.025   0.629 +- 0.195     .*************************
  -0.025 - -0.012   0.796 +- 0.253     .********************************
  -0.012 -  0.000   0.961 +- 0.143     .**************************************
   0.000 -  0.013   0.431 +- 0.113     .*****************
   0.013 -  0.025   0.637 +- 0.097     .*************************
   0.025 -  0.038   0.321 +- 0.078     .*************
   0.038 -  0.050   0.237 +- 0.063     .*********
   0.050 -  0.063   0.186 +- 0.057     .*******
   0.063 -  0.075   0.239 +- 0.044     .**********
   0.075 -  0.088   0.035 +- 0.037     .*
   0.088 -  0.100   0.050 +- 0.025     .**
   0.100 -  0.113   0.129 +- 0.022     .*****
   0.113 -  0.125   0.082 +- 0.024     .***

which adopts an approximately sigmoidal shape, although it's frequently noisy, as here. If your mosaicity estimate is much too low, it looks more like:

        zone          av. part.  -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140
  -0.050 - -0.045   0.403 +- 0.711     .****************
  -0.045 - -0.040   1.698 +- 0.572     .*******************************************************
  -0.040 - -0.035   0.478 +- 0.293     .*******************
  -0.035 - -0.030   0.492 +- 0.382     .********************
  -0.030 - -0.025   1.501 +- 0.360     .*******************************************************
  -0.025 - -0.020   0.224 +- 0.472     .*********
  -0.020 - -0.015   1.010 +- 0.342     .****************************************
  -0.015 - -0.010   0.771 +- 0.285     .*******************************
  -0.010 - -0.005   0.530 +- 0.312     .*********************
  -0.005 -  0.000   0.927 +- 0.576     .*************************************
   0.000 -  0.005   0.605 +- 0.169     .************************
   0.005 -  0.010   0.931 +- 0.147     .*************************************
   0.010 -  0.015   0.592 +- 0.103     .************************
   0.015 -  0.020   0.284 +- 0.082     .***********
   0.020 -  0.025   0.388 +- 0.065     .****************
   0.025 -  0.030   0.109 +- 0.055     .****
   0.030 -  0.035   0.993 +- 0.059     .****************************************
   0.035 -  0.040   0.133 +- 0.041     .*****
   0.040 -  0.045   0.152 +- 0.046     .******
   0.045 -  0.050   0.145 +- 0.046     .******

i.e. fairly flat with not much of a trail, and if it is too high it looks more like:

       zone          av. part.  -10...0..10..20..30..40..50..60..70..80..90.100.110.120.130.140
  -0.300 - -0.270   1.045 +- 0.205     .******************************************
  -0.270 - -0.240   1.092 +- 0.241     .********************************************
  -0.240 - -0.210   0.654 +- 0.223     .**************************
  -0.210 - -0.180   1.021 +- 0.179     .*****************************************
  -0.180 - -0.150   0.877 +- 0.209     .***********************************
  -0.150 - -0.120   0.921 +- 0.179     .*************************************
  -0.120 - -0.090   1.331 +- 0.158     .*****************************************************
  -0.090 - -0.060   1.077 +- 0.129     .*******************************************
  -0.060 - -0.030   1.225 +- 0.115     .*************************************************
  -0.030 -  0.000   0.732 +- 0.098     .*****************************
   0.000 -  0.030   0.445 +- 0.083     .******************
   0.030 -  0.060   0.245 +- 0.068     .**********
   0.060 -  0.090   0.205 +- 0.061     .********
   0.090 -  0.120   0.060 +- 0.042     .**
   0.120 -  0.150   0.098 +- 0.037     .****
   0.150 -  0.180   0.051 +- 0.023     .**
   0.180 -  0.210   0.015 +- 0.021     .*
   0.210 -  0.240   0.023 +- 0.017     .*
   0.240 -  0.270   0.018 +- 0.017     .*
   0.270 -  0.300   0.020 +- 0.016     .*

i.e. too quickly tapering off toward zero. Find a happy medium before you process the data.

Integrating Your Data

Most of the work has already been done - the auto-indexing procedure gives us initial estimates for the unit cell and the so-called "crystal missetting angles" (i.e the orientation of your crystal in some reference frame). The refinement procedure improves these estimates for the first frame. Once you've tweaked the spot size and the mosaicity, you are ready to integrate your entire dataset .

$ denzo
@index.dat
sector 1 to 100
@integrate.dat
stop

assuming you have 100 frames to integrate. This will take a while, and the output is voluminous, but if you haven't processed data before it may be worth saving a log file and looking through it. Many labs use a different integrate.dat for refinement to that for integration. A stripped down version of the above integrate.dat suitable for bulk integration would be as follows - it assumes that the orientation and unit cell parameters have been refined after auto-indexing, since this script does only limited refinement:

start refinement
	
refine partiality
use partials position
weak level 10.

resolution limits 20.0 4.0
fit cassette rotx roty
go go go 
fix cell
fit distance
go go go
fit all
go go go go go go go go

print profiles 1 1
go
calculate go        [ this actually tells Denzo to integrate the frame ]
end of pack         [ this increments the file counter and oscillation range ]
                    [ in Denzo ready for the next frame ]

i.e. no extension from low resolution, but it's possible to get too fussy about this. The "fit all" line can sometimes cause problems with data that is weak or low resolution since parameters can be too highly correlated with each other and/or not well-defined.

What matters, critically, is that Denzo must correctly assign the correct index to every reflection it integrates, and the predicted and observed reflection centroids must be close enough to let Denzo integrate the spot. Beyond that, it's not really important just how you acheive that.

The integrated data is written into files that we defined by "film output file" in index.dat (aka the ".x" files in conventional lab nomenclature). These contain the integrated reflection data plus the current values of the parameters for each frame. Since Denzo refines only on a per-frame basis the parameters are often in local minima rather than their global optimal values. Some drift in distance, cell dimensions etc is to be expected. Do:

$ grep cell *.x
$ grep dist *.x
$ grep beam *.x

to see how some parameters vary with frame. The ".x" files are ASCII, so you can view them using cat, more or your favorite editor.

Scaling Your Data

Assuming all Hell hasn't broken loose during integration, all you have to do is scale the data using Scalepack. Scaling is driven by comparing the intensities of symmetry related reflections whose intensities are expected to be identical within error.

For example: the unique reflection (h,k,l) in point group 222 should be identical in intensity to (h,-k,-l), (-h,k,-l), and (-h,-k,l) within experimental error. In the absence of measurable anomalous scattering, Friedel's Law applies, giving rise to a center of symmetry in the data, which means that another four reflections are equivalent: (-h,-k,-l), (-h,k,l), (h,-k,l) and (h,k,-l). These are often referred to as symmetry-related reflections, although they are also referred to as the Bijvoet (pron. Boy-vert) Pairs or Friedel (pron. Free-del) Pairs. Scalepack ultimately merges all the symmetry-related reflections into set of unique reflections (i.e. those which are not related to each other by symmetry).

Technically, Friedel Pairs are pairs of reflections related by (h,k,l) to (-h,-k,-l) whereas Bijvoet Pairs are related by a combination of crystallographic symmetry and Friedel's Law, but of course the usage tends to be pretty equivalent. Also technically, point group 222 in the presence of Friedel's Law becomes point group mmm, since this reflects the effect of the centrosymmetric symmetry in the diffraction pattern (point group 2 becomes 2/m etc.). These designations are used somewhat flexibly in the non-technical literature.

One crude monitor of data quality is the extent to which observations of each unique reflection deviate from being equal to each other. The percentage deviation is termed R_symm. As a precise monitor of data quality, R_symm is very flawed but nevertheless widely used. Sometimes people use the term R_merge when they mean R_symm, but I prefer to use the former when merging datasets from multiple crystals and the latter when referring to scaling within the same crystal. Community-wide, there's no clear consensus of the usage of R_symm vs R_merge. Ironically R_symm gets worse as the multiplicity/redundancy of your data increases, which is actually at odds with the fact that your data is getting better by virtue of this multiplicity. Two measures that attempt to compensate for this behavior are R_pim and R_meas - both of which are discussed elsewhere and neither of which are reported by Scalepack.

Scalepack refines a number of parameters during scaling. There is a per-frame scale factor (k) that models things like beam decay at synchrotrons, and the volume of the crystal in the beam. It is defined as relative to the reference frame (normally frame #1). If it increases your data is getting stronger. If it decreases your data is getting weaker. Therefore it is sensitive to things like beam intensity variation at synchrotrons and particularly sensitive to crystal mis-centering. There is also a per-frame B-factor (B) that models radiation damage in crystals. The (relative) B-factor models the extent to which the fall-off of diffracted intensity with resolution varies during the dataset. As the B-factor increases, your high resolution data is getting weaker relative to your low resolution data - a classical hallmark of radiation damage as order in the crystal is lost. Radiation damage happens at synchrotrons (especially at places like CHESS and APS) but frozen crystals are essentially immortal at home sources over timeframes of several days. The k and B (and other) parameters are refined by minimising R_symm (or a residual resembling it). Scales (k) and B-factors can be poorly determined in the early stages of data collection when there are few symmetry-related reflections, so it's often useful to use scale restraints and B-factor restraints when scaling (keywords SCALE RESTRAIN and B RESTRAIN lines in the example below). These values should be set to greater than or equal to the expected variation of the scale factors and B factors during scaling. You can actually hurt scaling by setting them too tight but you can improve scaling when giving them a reasonable estimate. On home sources I use SCALE RESTRAIN 0.02 and B RESTRAIN 0.1. At synchrotrons were things are more variable, I use values perhaps twice as large in the SCALE parameter - it depends a bit on beam strength variability and how fast you expect your crystal to die in the beam. At CHESS, in particular, the beam intensity can vary rapidly with frame number (70 minute beam lifetimes at CHESS last time I was there vs 12 hours at Brookhaven).

It can be important to monitor radiation damage at synchrotrons to set the appropriate exposure time for your crystal. Generally we do not like the per-frame B-factor correction to get much above 5 A² for high resolution datasets (2.5 Angstrom or greater) and 10 A² for low resolution datasets. For MAD datasets these values should probably be less. Per-frame B-factors can model other things (like your crystal rolling out of the beam) so treat them with caution - they are not a good guide for radiation damage but during data collection they're one of the few things you have.

Scalepack also does post-refinement in which it optimizes the unit cell and crystal orientation (aka "missetting angles") based on a knowledge of the observed locations of all the reflections in the entire dataset. Typically this improves the quality of the partial reflections by an accurate estimate of their "partiality". Scalepack uses partials by adding them across one or more adjacent frames. For mosaic crystals partials can be the majority of the reflections measured (as opposed to fulls). Post-refined unit cell dimensions are probably closest to the "true" ones since the locally-refined ones estimated by Denzo tend to be only a local approximation to the unit cell. Scalepack can also post-refine mosaicity which can give you an improved estimate for re-integration - generally speaking you should reintegrate if the mosaicity used in integration is more than 10% different from the one from postrefinement.

Under certain circumstances it's not always a good idea to post-refine mosaicity - with some crystals the mosaicity estimate will blow up and Scalepack will discard most of your data. This most often happens in crystals with smeary spot profiles - in these cases it pays to try a variety of different mosaicities to see what gives the best data. At other times you lie about (systematically underestimate) your mosaicity to reduce spot overlap problems. In these cases comment out the mosaicity line in the post-refinement block. At all other times, one should post-refine mosaicity, and re-integrate the data if the post-refined value and original value are more than 10% different .

Post-refinement is especially powerful in adding partials. Scalepack can add partial reflections split across several frames to reconstitute an intact reflection intensity. Since sometimes partials make up close to 100% of your data, this is a rather nice feature, but it does require that you integrate your data carefully so that the partiality estimates are as correct as possible - this is where post-refinement is important since this usually provides the most accurate values of the geometric parameters.

Scalepack throws out rejections which are reflections that differ excessively from the estimated mean value of the reflection, and are thus possibly erroneously measured. For good data only a percent of observations are discarded. Once the percentage gets more than 5% this is a pretty good indication of systematic errors in your data. You should examine your integration carefully in such cases. With these "rejections" it often pays to read them back in as exclusions from the next round of scaling, thus improving the scaling parameters for the subsequent round(s).

Scalepack is /programs2/hkl/HKLsgi_1.97.2/scalepack on the SGIs, and /programs2/denzo/scalepack on Helium.

A typical scalepack script file is as follows:

NUMBER OF ZONES 10
ESTIMATED ERROR               [ per-shell error estimates ]
0.02 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.04 
RESOLUTION 4.0
IGNORE OVERLOADS              [ throw out overloaded reflections ]
REJECTION PROBABILITY 0.001   [ parameter controls # of reflections rejected ]
WRITE REJECTION FILE 0.5
ERROR SCALE FACTOR 1.500      [ overall scalar for error estimates ]
ADD PARTIALS 1 TO 180         [ frames over which partials should be added ]
SPACE GROUP P6122
[SCALE RESTRAIN 0.05]         [ if scaling is problematic, restrain k and B ]
[B RESTRAIN 1.5]
POSTREFINE 10                 [ only do postrefinement once you have >5 frames ]
FIT CRYSTAL A* 1 TO 180       [ parameters to include in postrefinement ]
FIT CRYSTAL C* 1 TO 180
FIT BATCH ROTX 1 TO 180
FIT BATCH ROTY 1 TO 180
FIT CRYSTAL MOSAICITY 1 TO 180
END FIT
@reject
OUTPUT FILE 'ctmp7.hkl'
REFERENCE FILM  1
FORMAT DENZO_IP 

SECTOR 1 TO 180
FILE 1 'ctmp7_1_###.x'

When adding partials it's sometimes advantageous to not add them across "a break" in data collection. For instance if you collect frames 1-180 across two beam-fills at CHESS at frames 80 and 140 you might change the ADD PARTIALS line above to:

ADD PARTIALS 1 TO 79, 80 TO 139, 140 TO 180

at the cost of losing partials that bridge the gap between 79/80 and 139/140. Scalepack reads this and doesn't expect any SCALE RESTRAIN or B RESTRAIN keywords you used to apply across this gap.

However you should not break up the post-refinement of parameters like the unit cell or the mosaicity into blocks, because these things should have a single value for the crystal that does not vary by frame (exception: sometimes mosaicity will vary by rotation angle, but refining per-frame mosaicities is something ripe for abuse even if it is the default in HKL2000).

Scalepack makes a big deal over the error model. The basic dogma is that the error model should be modified until the chi**2 for the data is approximately unity. The error model is thus a mixture of the theoretical (the mix of errors and the approximate expected distribution) and empiricism (the match between what is expect and what is observed). The estimated error in zones is one way to do that, although I don't change this much. The error scale factor is the main way to do this. If chi**2 is >> 1.0, increase the error scale factor. If chi**2 is << 1.0, decrease the error scale factor . Personal experience suggests that the error scale factor is ~1.5 for good crystals at synchrotrons, and ~2.0 at home. However for badly behaved crystals this value will generally be higher. If your error scale factor is 3, however, you should question if you are collecting data or noise. Higher error scale factors reflect the reality that there is a greater error in your data than would be expected from the strength of your data. Usually this is systematic errors due to split/multiple crystals giving rise to poor spot profiles. This data can be used, but you should be aware that it contains more error than well-behaved crystals of comparable strength.

Selected parts of a Scalepack log file are as follows:

 Number of rejected reflections  11567

Scalepack uses a file typically called "reject" to store the reflections rejected as outliers. If you run scalepack multiple times it reads this reject file from the previous run and leaves out those reflections from scaling. Therefore you must run scalepack several times in succession to acheive convergence and actually reject the outlier data. Each time you add more frames or change resolution you should remove the reject file (typically called "reject") and run Scalepack several times until the number of rejected rejections remains stable (i.e. scaling reaches convergence with no more rejected reflections). Scalepack lists rejected reflections for each frame as the frames are read in so you can see where all your data is going:

 reading from a file:  ctmp7_1_001.x
   0.0            2   0 121 0    -4.7   -10.4   1.51  13.0 0.970  137.1 1254.8 0.505    10.4   
   0.0            2   0 118 1    65.6    60.6   1.52  12.6 0.972  160.4 1252.7 0.493    18.5   
   0.0            0  -1  99 0    69.4    78.8   1.48   9.7 0.980  291.5 1036.8 0.421    21.5   
   0.0            2   1  82 0   285.2   270.7   1.51   9.4 0.985  442.7 1292.1 0.337    20.0   
   0.0           -2  -2  57 0   207.0   209.5   1.27   5.6 0.991  590.8  811.1 0.254    67.2   
   0.0            5   3  60 1    14.1    21.0   1.87   6.3 0.983  632.7 1614.3 0.232    26.2

etc.

This is the output from post-refinement showing the refined parameters:

 Hkl's refined:  53162 N.Chi**2:    2.648 Decrease:   0.004 *  53162 =    213.6
 Film #       a       b       c   alpha    beta   gamma   crysz   crysy   crysx mosaicity
      1  57.725  57.725 518.853  90.000  90.000 120.000 -84.352-147.227  -5.776   0.370
      2  57.725  57.725 518.853  90.000  90.000 120.000 -84.348-147.223  -5.781   0.370
      3  57.725  57.725 518.853  90.000  90.000 120.000 -84.353-147.212  -5.774   0.370
      4  57.725  57.725 518.853  90.000  90.000 120.000 -84.361-147.204  -5.772   0.370
      5  57.725  57.725 518.853  90.000  90.000 120.000 -84.352-147.204  -5.776   0.370
      6  57.725  57.725 518.853  90.000  90.000 120.000 -84.359-147.193  -5.775   0.370
      7  57.725  57.725 518.853  90.000  90.000 120.000 -84.357-147.185  -5.773   0.370
      8  57.725  57.725 518.853  90.000  90.000 120.000 -84.362-147.180  -5.765   0.370
      9  57.725  57.725 518.853  90.000  90.000 120.000 -84.360-147.169  -5.752   0.370
     10  57.725  57.725 518.853  90.000  90.000 120.000 -84.361-147.170  -5.756   0.370

Notice that the cell dimensions are post-refined CRYSTAL which means they are the same for all frames, whereas CrysX, CrysY, CrysZ are post-refined BATCH so that if the crystal slips they can absorb the error - in this case crysx/y/z are very similar which indicates a stable integration and also that the crystal was not slipping. If CrysX/Y/Z move too much it may reflect a problem with locally inaccurate estimates of cell dimensions (scalepack can get better global estimates via postrefinement) or that your crystal is slipping.

This next table shows the scale factor and B-factor for each frame (the reference frame has values of 1.0 and 0.0 respectively) along with the per-frame rejection counts and other factors. Large numbers of reflections getting thrown out in columns 2 and 3 might indicate problems.

 1 - count of observations deleted manually
 2 - count of observations deleted due to zero sigma or profile test
 3 - count of non-complete profiles (e.g. overloaded) observations
 4 - count of observations deleted due to sigma cutoff
 5 - count of observations deleted below low resolution limit,
 6 - count of observations deleted above high resolution limit,
 7 - count of partial observations
 8 - count of fully recorded observations used in scaling

                                          1    2    3    4    5    6    7    8

 IP fitted, no o     1  1.0000    0.00    0    0   10   14    0   86  530   69
 IP fitted, no o     2  1.0251    0.03    0    0    8   22    0   94  561   70
 IP fitted, no o     3  1.0010    3.47    0    0    3   14    0   97  558   72
 IP fitted, no o     4  0.9552   -0.97    0    1    3   16    0   98  559   73
 IP fitted, no o     5  0.9540    1.44    0    0    5   10    0   93  583   71
 IP fitted, no o     6  0.8678   -0.99    0    0    4   11    0   94  572   71
 IP fitted, no o     7  0.8164   -1.36    0    0    6   17    0  101  558   73
 IP fitted, no o     8  0.7598    1.63    0    0    7   14    0   94  541   76
 IP fitted, no o     9  0.7580    1.14    0    0    6   11    0   85  542   77
 IP fitted, no o    10  0.7767    0.99    0   30    5    8    0  100  529   67
 IP fitted, no o    11  0.6735   -6.04    0    0    4   10    0   91  544   77
 IP fitted, no o    12  0.7765    1.46    0    0    2   20    0   93  598   86
 IP fitted, no o    13  0.7682    1.22    0    0    3   14    0   87  592   74
 IP fitted, no o    14  0.7554    2.79    0    0    4   18    0   71  548   70
 IP fitted, no o    15  0.7511    1.59    0    1    9   12    0   94  538   85
 IP fitted, no o    16  0.7673   -1.46    0    0    4   13    0   95  512   73
 IP fitted, no o    17  0.7224    0.54    0    1    6   14    0   82  506   56
 IP fitted, no o    18  0.7446    2.19    0    1    3   15    0   81  605   75
 IP fitted, no o    19  0.6930   -0.18    0   10    4   11    0   89  640  101
 IP fitted, no o    20  0.7186    1.99    0    8    1    8    0   76  615   74
 IP fitted, no o    21  0.7007    2.12    0    1    1   15    0   88  571   77

Notice that this table immediately suggests something non-optimal about the scaling - the per-frame B-factors and scale factors vary too greatly between frames (B-factor changes -6.02 to 1.46 on successive frames). One should use SCALE RESTRAIN and B RESTRAIN in this case and rescale the data. I typically use SCALE RESTRAIN 0.1 (or 0.05) and B RESTRAIN 0.1 in cases where the scaling is not so well-behaved.

Scalepack usually gives very verbose details about reflections rejected on each pass - most of us don't look hard at those listings, but they might be useful in pathological cases. But of particular interest is this table which gives frame-by-frame breakdowns on data quality (chi**2 and R-factor), data strength etc.

      Summary of reflection intensities and R-factors by batch number
                              All data                Linear
   Batch     # obs # obs > 1   < I/sigma> N. Chi**2    R-fac
       1       150       147         9.9     1.048    0.059
       2       245       243        10.3     1.081    0.071
       3       255       255        10.5     1.279    0.102
       4       268       268        10.6     1.084    0.072
       5       256       256         8.8     1.196    0.089
       6       267       267        10.6     1.082    0.077
       7       254       254         9.2     1.202    0.096
       8       269       269         9.6     1.113    0.081
       9       259       258         9.5     1.130    0.075
      10       246       239         8.8     1.136    0.089
      11       257       247         7.3     1.097    0.102
      12       277       266         8.1     1.154    0.100
       .         .         .          .        .        .
       .         .         .          .        .        .
       .         .         .          .        .        .
       .         .         .          .        .        .
     130        76        64         5.0     3.113    0.288
     131        76        70         4.4     4.853    0.523
     132        65        59         5.9     1.629    0.281
     133        63        58         5.4     2.546    0.198
     134        55        48         4.7     1.970    0.252
     135        70        65         4.6     3.052    0.304
     136        77        61         5.7     8.570    0.351
     137        83        81         3.9    20.434    0.748
     138        67        55         4.2     7.657    0.407
 All films   30407     30112         6.7     1.458    0.177

The first thing to notice is that there should be the same number of observations (# obs) on each frame, approximately. In this example there are missing observations on the last frames due to severe spot profile overlaps. Such phenomena should always be checked carefully in case you can recover this lost data. The chi**2 and R-factor for these latter frames is also pretty bad, indicating problems, and the data appears weak in these frames. Again, time to check that you've done the integration correctly.

This table tells us a lot about our data and should be checked carefully. It can also be used as a primitive strategy tool for data completeness. You are only adding new unique data when the #obs value is much greater than the #obs>1 value. If not, you are just adding redundancy. Extra redundancy is not a bad thing if you want to collect those extra frames (in fact it does modestly improve the quality of your dataset), but completeness is a more critical concern. Datasets that are less than 85% complete overall are rarely useful for anything. You can often identify if you are collecting more of the same data by looking for the discrepancy between those columns - if the #obs is > #obs>1, then you're inevitably adding more unique data.

Finally, Scalepack prints out a whole raft of tables for our amusement, broken down by resolution. The overall redundancy here is 30407 (above table) divided by 4876 (below) so we have reasonably redundant data (6.2x). This is reflected in the first two tables where most of the data is observed a substantial number of times.

    Shell            Summary of observation redundancies by shells:
  Lower Upper      No. of reflections with given No. of observations
  limit limit     0     1     2     3     4   5-6   7-8  9-12 13-19   >19  total
  99.00  8.62   145    46    53    43    59   185    55    74     4     0    519
   8.62  6.84    36    55    46    27    51   167    51   114    12     0    523
   6.84  5.98    22    27    38    34    55   139    63   133     9     0    498
   5.98  5.43    20    38    46    52    36   133    49   140     9     0    503
   5.43  5.04     9    21    46    43    56   115    61   130    13     0    485
   5.04  4.74    28    32    36    27    49   110    69   131    14     0    468
   4.74  4.50    15    24    49    49    38   123    46   140     6     0    475
   4.50  4.31    26    20    53    50    45   112    62   131     7     0    480
   4.31  4.14    19    13    23    36    53   104    45   144    17     0    435
   4.14  4.00    30    19    53    52    49    89    70   148    10     0    490
 All hkl        350   295   443   413   491  1277   571  1285   101     0   4876



     Shell             Summary of observation redundancies:
  Lower Upper      % of reflections with given No. of observations
  limit limit     0     1     2     3     4   5-6   7-8  9-12 13-19   >19  total
  99.00  8.62  21.8   6.9   8.0   6.5   8.9  27.9   8.3  11.1   0.6   0.0   78.2
   8.62  6.84   6.4   9.8   8.2   4.8   9.1  29.9   9.1  20.4   2.1   0.0   93.6
   6.84  5.98   4.2   5.2   7.3   6.5  10.6  26.7  12.1  25.6   1.7   0.0   95.8
   5.98  5.43   3.8   7.3   8.8   9.9   6.9  25.4   9.4  26.8   1.7   0.0   96.2
   5.43  5.04   1.8   4.3   9.3   8.7  11.3  23.3  12.3  26.3   2.6   0.0   98.2
   5.04  4.74   5.6   6.5   7.3   5.4   9.9  22.2  13.9  26.4   2.8   0.0   94.4
   4.74  4.50   3.1   4.9  10.0  10.0   7.8  25.1   9.4  28.6   1.2   0.0   96.9
   4.50  4.31   5.1   4.0  10.5   9.9   8.9  22.1  12.3  25.9   1.4   0.0   94.9
   4.31  4.14   4.2   2.9   5.1   7.9  11.7  22.9   9.9  31.7   3.7   0.0   95.8
   4.14  4.00   5.8   3.7  10.2  10.0   9.4  17.1  13.5  28.5   1.9   0.0   94.2
 All hkl        6.7   5.6   8.5   7.9   9.4  24.4  10.9  24.6   1.9   0.0   93.3

The second table also shows the % completeness. Although most of the shells of data are complete in this example, the low resolution shell lacks completeness and this will affect the quality of the electron density map. This next table allows us to see how strong the data is with resolution.

     Shell            I/Sigma in resolution shells:
  Lower Upper      No. of reflections with I / Sigma less than
  limit limit     0     1     2     3     5    10    20   >20  total
  99.00  8.62    16    25    33    49    69   103   172   347    519
   8.62  6.84    20    40    60    89   130   202   325   198    523
   6.84  5.98    17    39    60    91   143   250   393   105    498
   5.98  5.43    12    33    52    79   129   248   384   119    503
   5.43  5.04    12    26    45    73   119   222   360   125    485
   5.04  4.74     5    16    27    51    95   198   352   116    468
   4.74  4.50     5    13    28    41    91   194   346   129    475
   4.50  4.31    12    21    31    46    89   227   382    98    480
   4.31  4.14    17    34    57    84   140   272   390    45    435
   4.14  4.00    16    36    66   107   193   347   452    38    490
 All hkl        132   283   459   710  1198  2263  3556  1320   4876

And this table presents much the same data except also shows data completeness . Data less than 80% complete overall is mostly worthless and there really is no excuse for collecting such data. Check this table carefully and aim for at least 90% completeness in all shells.

     Shell            I/Sigma in resolution shells:
  Lower Upper      % of reflections with I / Sigma less than
  limit limit     0     1     2     3     5    10    20   >20  total
  99.00  8.62   2.4   3.8   5.0   7.4  10.4  15.5  25.9  52.3   78.2
   8.62  6.84   3.6   7.2  10.7  15.9  23.3  36.1  58.1  35.4   93.6
   6.84  5.98   3.3   7.5  11.5  17.5  27.5  48.1  75.6  20.2   95.8
   5.98  5.43   2.3   6.3   9.9  15.1  24.7  47.4  73.4  22.8   96.2
   5.43  5.04   2.4   5.3   9.1  14.8  24.1  44.9  72.9  25.3   98.2
   5.04  4.74   1.0   3.2   5.4  10.3  19.2  39.9  71.0  23.4   94.4
   4.74  4.50   1.0   2.7   5.7   8.4  18.6  39.6  70.6  26.3   96.9
   4.50  4.31   2.4   4.2   6.1   9.1  17.6  44.9  75.5  19.4   94.9
   4.31  4.14   3.7   7.5  12.6  18.5  30.8  59.9  85.9   9.9   95.8
   4.14  4.00   3.1   6.9  12.7  20.6  37.1  66.7  86.9   7.3   94.2
 All hkl        2.5   5.4   8.8  13.6  22.9  43.3  68.0  25.3   93.3

Finally the summary table presents the mean I/sigmaI for each shell, the chi**2 (here a little above 1.0, so we should increase the error scale factor), and the R-factor. Although traditionally the most-quoted value, the R-factor is so dependent on redundancy that it's actual utility for assessing the quality of the data is questionable. The R-factor generally increases with increasing redundancy (implying higher error) while the actual quality of the data usually increases (lower error). If we were fastidious about adhering to the Scalepack error model the overall chi**2 of 1.5 would induce us to increase the Error Scale Factor and rerun Scalepack.


               Summary of reflections intensities and R-factors by shells
     R linear = SUM ( ABS(I - mean(I))) / SUM (I)
     R square = SUM ( (I - mean(I)) ** 2) / SUM (I ** 2)
     Chi**2   = SUM ( (I - mean(I)) ** 2) / (Error ** 2 * N / (N-1) ) )
     In all sums single measurements are excluded

 Shell Lower Upper Average      Average    Norm.  Linear Square
 limit    Angstrom       I   error   stat. Chi**2  R-fac  R-fac
      99.00   8.62   617.4    22.7    20.8  2.357  0.086  0.435
       8.62   6.84   260.2    18.2    17.2  1.516  0.162  0.459
       6.84   5.98   162.3    13.8    13.3  1.324  0.250  0.771
       5.98   5.43   199.6    16.8    16.2  1.431  0.206  0.314
       5.43   5.04   212.0    14.9    14.2  1.479  0.171  0.196
       5.04   4.74   266.2    19.8    18.3  1.558  0.179  0.223
       4.74   4.50   303.9    21.0    19.2  1.532  0.172  0.204
       4.50   4.31   259.2    20.0    18.8  1.444  0.189  0.217
       4.31   4.14   157.0    16.7    16.2  1.378  0.263  0.304
       4.14   4.00   138.4    17.6    17.1  1.325  0.321  0.404
  All reflections    260.5    18.2    17.1  1.521  0.177  0.388

Finally, Denzo lists the observed intensities and I/sigmaI for any systematic absences present in the space group you specified. Even if the I/sigmaI looks high, you should compare it to the mean intensity for the resolution shell in the above table. For example (0,0,31) has an I/sigmaI of 2.9 but it is a 18 Angstrom resolution reflection, and it's intensity (24.7) is only 4% of the average intensity in that shell of data. So it's still pretty weak, and probably absent. Notably, however, I've been fooled by systematic absences in data. It's better to determine the space group definitively via determining the heavy atom substructure or molecular replacement solution - these are sensitive tests involving all the data, not just 10-100 or so reflections along the reciprocal space lattice axes.

     Intensities of systematic absences
      h   k   l  Intensity     Sigma   I/Sigma

      0   0  26       6.6       2.8       2.4
      0   0  27      -5.9       3.3      -1.8
      0   0  28      -1.2       2.8      -0.4
      0   0  29      10.2       6.0       1.7
      0   0  31      24.7       8.5       2.9
      0   0  32       0.0       2.8       0.0
      0   0  33      -4.7       2.5      -1.9

The Denzo manual does a good job of showing different Scalepack input files to be used in a variety of different situations. Rather than re-invent the wheel here, just go look at the examples for Scalepack in this manual.

Figuring Out Your Space Group

Denzo indexes and integrates data based on the physical dimensions of the lattice it found during auto-indexing, but it doesn't compare the intensities of reflections during integration. Therefore one must use Scalepack to determine the point group symmetry during scaling. Most of the time you cannot uniquely determine which space-group you have solely from Scalepack statistics because several space groups obey the same point group (e.g. P222, P222₁, P2₁2₁2, P2₁2₁2₁ all belong to point group 222). Systematic absences along the principal axes can sometimes help you to distinguish then (e.g. P222 has no systematic absences, but P222₁ has 00L absent for L=odd), but the best way to tell is to find the heavy atom substructure or Molecular Replacement solution in a specific space group. As ever, try all possible space groups and if you get one that is obviously better than the other, that's a good indication. Them all looking the same indicates that you haven't found any solution at all.

What follows is a description of how to tell what point group and space group you have. For some lattices there is only one point group (one line in the table above) but for certain others there is an inherent ambiguity.

I'm going to use primitive hexagonal as an example lattice here, but the concepts apply to all lattices. With lattices displaying a physical shape consistent with hexagonal, there are 4 possible point groups, sorted here in ascending symmetry:

3 < 32 (312 + 321) = 6 < 622

If you have a full or partial dataset, you can get a good handle on the point group by scaling in different space groups. For example, all hexagonal lattices are compatible with point group 3 . So if you give Scalepack the space group P3, then you can get some baseline statistics. If it doesn't scale in P3, either you have a real problem with integration (check by scaling in P1) or it's really not a trigonal/hexagonal lattice at all. Try reintegrating with a different lattice if it fails to scale well in P3.

Point group 3 is possible, but point groups 32, 6 and 622 all scale perfectly well in point group 622 also. So you should try each point group to see which point groups your data scales well in. In Scalepack you do this by trying a representative space group from each of these, so this could be: P3, P321, P312, P6, P622. Since you know what a "reasonable" overall chi**2 is (1.0, or the value from the scaling in P3), it'll become very obvious when you've got the wrong point group. In a quirk, space groups P321 and P312 are incompatible with each other, even though both belong to point group 32 - I think this is the only example of this, and I tend to treat point group 32 as really being two point groups: 312 and 321 to deal with this issue. If it scales in space group P321, it's not going to scale in the other (P312) unless it's really point group 622. So you simply try all possible point groups and typically choose the one with the highest symmetry that is consitent with your data. Note that your R_symm will probably increase a little if you scale in P622 instead of P3, because your redundancy is typically four times higher in P622 than P3. Point groups 622 and 32 (321 type) are common, 6 less common, 3 and 32 (312) type border on rare.

Then you can try your luck at chosing space groups. The only way to do this is to inspect the intensities and I/sigmaI values of absences. In the table below I give the expected systematic absences for certain symmetries along certain crystallographic axes. If you have strong reflections in the absences list, then it's probably not this symmetry. However nearly all of us have been fooled by this phenomenon, which is sensitive to the correct integration of just a few reflections out of tens of thousands - I saw systematic absences in a P2₁2₁2₁ space group that made it look a lot like P2₁2₁2, over multiple datasets, despite the fact that the structure was really P2₁2₁2₁. It's best not to assume too much about the space group, even though your estimate of the point group may be reliable.

Symmetry axis Along Condition Notes
2₁ A H00 absent for H=odd
2₁ B 0K0 absent for K=odd
2₁ C 00L absent for L=odd
3₁ C 00L absent for L<>3n
3₂ C 00L absent for L<>3n
4₁ C 00L absent for L<>4n
4₂ C 00L absent for L<>2n Like 2₁
4₃ C 00L absent for L<>4n
6₁ C 00L absent for L<>6n
6₂ C 00L absent for L<>3n Like 3₂
6₃ C 00L absent for L<>2n Like 2₁
6₄ C 00L absent for L<>3n Like 3₂
6₅ C 00L absent for L<>6n

Symmetry axis	Along	Condition	Notes
2₁	A	H00 absent for H=odd
2₁	B	0K0 absent for K=odd
2₁	C	00L absent for L=odd
3₁	C	00L absent for L<>3n
3₂	C	00L absent for L<>3n
4₁	C	00L absent for L<>4n
4₂	C	00L absent for L<>2n	Like 2₁
4₃	C	00L absent for L<>4n
6₁	C	00L absent for L<>6n
6₂	C	00L absent for L<>3n	Like 3₂
6₃	C	00L absent for L<>2n	Like 2₁
6₄	C	00L absent for L<>3n	Like 3₂
6₅	C	00L absent for L<>6n

Pure rotation axes (2,3,4,6) NEVER give rise to systematic absences along H00, 0K0, 00L - only screw axes do . Note also that enantiomorphic space groups (e.g. P3₁21 and P3₂21; P4₁2₁2 and P4₃2₁2) cannot be distinguished by any method within Scalepack. You have to try both pairs of enantiomorphs when solving the structure.

Note that non-primitive lattices like C, F and I have global systematic absences because their lattices a centro-symmetric (e.g. for C-lattice space groups like C2, h+k=odd is missing for the entire dataset). However denzo doesn't even attempt to integrate the "missing" reflections since they are not present and therefore these reflections don't even show up in the reflection lists.

Overlaps, Overloads and Other Glitches

Overlaps occur when one spot overlaps the spot region of another spot. This occurs mainly in three cases: you have your detector to crystal distance set too short for the cell dimensions; the oscillation range for each frame is too large; the mosaicity is large. You can't change the last one, but the first two are within your control. Move the detector further back and the spots separate more, at the cost of reduced maximum resolution at the edge of the detector. Alternatively, reduce the oscillation range in each frame - we rarely use more than 1.0 degree per frame except for really small unit cells, and we have used as little as 0.4 degrees per frame. However you won't gain anything if your mosaicity is much larger than the frame size. Generally speaking you don't gain anything if your frame size is less than 2/3 of the mosaicity.

Overloads - i.e. saturated pixels. CCDs at synchrotrons have a limited dynamic range. If you expose your crystal for a long time, typically the more intense low-resolution spots will saturate and be rejected since they exceed the pixel overload value in the file (for a 16-bit data frame the overload value is ~65,000). There are two methods to avoid this: reduce your exposure time (your data will get weaker); or collect a second low-resolution pass with larger frame widths and shorter exposure times to collect the data that was overloaded the first time around. Always collect the high resolution pass first since radiation damage affects high resolution reflections more than low resolution reflections. You then process your second low-res pass at a lower resolution than your high-res pass and merge the whole lot together in Scalepack.

Missing low resolution data occurs typically because the beam stop is too close. The closer the beam stop is to the crystal, the large the beam stop shadow becomes and the more low resolution data is lost. Although it is often an advantage to keep the beam stop reasonably close, it's also good to keep your eye on this phenomenon. Note: for partial datasets it's normal for the data completeness in higher resolution shells to be a little higher than lower resolution shells during data collection - this is a consequence of the curvature of the Ewald Sphere.

Other glitches are myriad: beam instability (keep an eye on those ion counters) at synchrotrons, mis-centered xtals (the frames start to appear weak with low scale factors in Scalepack), split spots, twinned crystals, beam dumps etc. The key factor is the more attention you pay to your data collection and processing the better your data will be . The values of the per-frame scale and B-factors reveal a lot about basic data collection performance. The values of chi**2 and R_symm reveal a lot about the quality of your crystal. There are plenty of bad crystals around, but there's no excuse for collecting bad data on good crystals.

Processing MAD Data

Processing MAD data has the same basic challenges as processing conventional data (it is, after all, still diffracton data). However the anomalous signal in MAD data is often very small so a GREAT deal of care must be taken during data integration and scaling.

One of the important things is that the same crystal orientation angles must be used for all 3 wavelengths during integration otherwise conceivably the anomalous signal may get mixed up (this shouldn't be an issue with Denzo, which should index the data in the same absolute hand each time). To do this one auto-indexes the first dataset in the conventional manner, but then use these auto-indexed values to integrate the subsequent datasets. To do this create a known.dat file that looks like an index.dat file with a few modifications:

format ccd adsc unsupported-q4

box print 2.4 2.4
spot elliptical 0.5 0.60 0.0
background elliptical  0.6 0.7 0.0
overlap spot

wavelength 1.009315
monochromator 0.9
air absorption length 3450
overload value 55000
error density 0.15
error systematic 5.0 partiality 0.1 positional 0.015
Y scale -1.0
skew 0.000

film rotation 90.0
film length 187.8 width 187.8
top margin 0.4
left margin 0.4

resolution limits 20. 3.7

mosaicity 1.0
profile fitting radius 20.0

space group C222

distance  250.00
X beam 88.6
Y beam 97.2

oscillation start 0.0 step -1.00 range 1.00
[oscillation start 0.0 range -1.0]
[Note - oscillation step is negative for F1, F2; positive for A1]

TITLE 'on F1 q4 at chess'
raw data file '../mb150_1_###.img'
film output file 'mb150_1_###.x'

write predictions

unit cell  162.525  228.512   81.630   90.000   90.000   90.000
crystal rotx  -91.623 roty  -10.938 rotz -141.952
cassette rotx   -0.01 roty   -0.57 rotz    0.00 2 theta    0.00
distance  252.48
x beam   88.658 y beam   97.158

crossfire y -0.027 x  0.002 xy  0.014

oscillation start 0.
sector 1

First note that we have got rid of the peak search file 'peaks.file' line since this causes Denzo to auto-index data, which is what we want to avoid. Then we take some information from a ".x" file that we trust the integration of. Minimally we must supply the unit cell and crystal rotx/y/z values. It's also a good idea to update the cassette rotx/y/z (detector skew), distance and direct beam positions. Create a new known.dat file for each dataset, and change the file names and wavelength. If you collect MAD data in the way we normally do, then things like distance and starting position will always be the same.

We then use this to integrate as follows:

denzo
@known.dat
oscillation start 0.
sector 1 to 20
@integrate.dat
stop

For scaling MAD data it is often useful to be exceptionally careful about scaling parameters. We often use a method which scales all three wavelengths together in one scale.com, then use these scale factors in subsequent scale files for each wavelength:

NUMBER OF ZONES 10
ESTIMATED ERROR 
0.030 0.030 0.030 0.030 0.030
0.030 0.040 0.040 0.040 0.040
RESOLUTION 2.5
IGNORE OVERLOADS
REJECTION PROBABILITY 0.000001
WRITE REJECTION FILE 0.5
ERROR SCALE FACTOR 2.2
ADD PARTIALS  1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
SPACE GROUP P212121
[SCALE RESTRAIN 0.01]
[B RESTRAIN 0.1]
POSTREFINE 10
FIT CRYSTAL MOSXX 1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
FIT BATCH ROTX    1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
FIT BATCH ROTY    1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
FIT CRYSTAL A*    1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
FIT CRYSTAL B*    1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
FIT CRYSTAL C*    1 TO 60, 101 TO 160, 201 TO 260, 301 TO 360, 401 TO 460, 501 TO 560, 601 TO 625
END FIT
@reject
OUTPUT FILE 'scale_all.hkl'
REFERENCE FILM  1
FORMAT DENZO_IP 

SECTOR 1 to 60
FILE 1   '../l1p/l1p_1_###.x'

SECTOR 1 to 60
FILE 101 '../l1n/l1n_1_###.x'

SECTOR 1 to 60
FILE 201 '../l2p/l2p_1_###.x'

SECTOR 1 to 60
FILE 301 '../l2n/l2n_1_###.x'

SECTOR 1 to 60
FILE 401 '../l3p/l3p_1_###.x'

SECTOR 1 to 60
FILE 501 '../l3n/l3n_1_###.x'

SECTOR 1 to 25
FILE 601 '../l4p/l4p_1_###.x'

which scales data from all three wavelengths (l1p/n, l2p/n, l3p/n) together. We then hack the refined scale and B-factors out of the Scalepack log file to create per-wavelength scaling files:

NUMBER OF ZONES 10
ESTIMATED ERROR 
0.030 0.030 0.030 0.030 0.030
0.030 0.040 0.040 0.040 0.040
RESOLUTION 2.7
IGNORE OVERLOADS
REJECTION PROBABILITY 0.000001
WRITE REJECTION FILE 0.5
ERROR SCALE FACTOR 1.500
ADD PARTIALS 1 to 60, 101 to 160
SPACE GROUP p212121
INITIAL SCALE
     1   1.0000     2   1.0079     3   0.9966     4   1.0029     5   0.9920
     6   0.9959     7   1.0082     8   1.0019     9   0.9992    10   1.0365
    11   1.0105    12   0.9950    13   1.0243    14   1.0335    15   1.0359
    16   0.9671    17   0.9618    18   0.9776    19   0.9566    20   0.9814
    21   0.9874    22   0.9947    23   1.0099    24   1.0183    25   1.0447
    26   1.0490    27   1.0816    28   1.0907    29   1.1033    30   1.1307
    31   1.0189    32   1.0490    33   1.0856    34   1.0513    35   1.1064
    36   1.1247    37   1.1401    38   1.1623    39   1.1721    40   1.2152
    41   1.2312    42   1.2234    43   1.2600    44   1.2924    45   1.3380
    46   1.2820    47   1.3188    48   1.3429    49   1.3778    50   1.4128
    51   1.4299    52   1.5028    53   1.4968    54   1.5213    55   1.5223
    56   1.5417    57   1.5847    58   1.5760    59   1.5821    60   1.6183
   101   1.0130   102   1.0164   103   1.0041   104   1.0118   105   1.0064
   106   0.9940   107   1.0103   108   0.9947   109   1.0230   110   1.0071
   111   1.0090   112   1.0181   113   1.0161   114   1.0291   115   1.0348
   116   0.9611   117   0.9629   118   0.9729   119   0.9644   120   0.9955
   121   0.9918   122   0.9948   123   1.0200   124   1.0187   125   1.0491
   126   1.0649   127   1.0775   128   1.1079   129   1.1104   130   1.1417
   131   1.0692   132   1.1063   133   1.1350   134   1.1436   135   1.1895
   136   1.2045   137   1.2318   138   1.2695   139   1.2832   140   1.3234
   141   1.3504   142   1.3780   143   1.4327   144   1.4762   145   1.4917
   146   1.4543   147   1.5094   148   1.5126   149   1.5599   150   1.5513
   151   1.5989   152   1.6041   153   1.6317   154   1.6185   155   1.6073
   156   1.6294   157   1.6461   158   1.6244   159   1.6360   160   1.6160
INITIAL B FACTOR
     1     0.00     2     0.13     3    -0.07     4    -0.13     5    -0.18
     6    -0.24     7     0.16     8    -0.41     9    -0.25    10     0.39
    11    -0.57    12    -0.89    13    -0.41    14     0.02    15    -0.35
    16     0.56    17     0.00    18     0.94    19    -0.28    20     0.30
    21    -0.06    22     0.44    23     0.15    24     0.04    25     0.76
    26     0.53    27     0.48    28     0.36    29     0.80    30     1.44
    31     1.41    32     1.48    33     2.25    34     1.47    35     2.47
    36     2.15    37     2.41    38     2.61    39     2.32    40     3.09
    41     3.10    42     2.66    43     3.34    44     2.83    45     3.56
    46     3.36    47     3.37    48     3.51    49     4.06    50     4.53
    51     3.61    52     4.50    53     4.30    54     4.00    55     4.25
    56     4.02    57     4.77    58     4.69    59     4.76    60     5.20
   101     0.54   102     0.42   103     0.18   104     0.31   105     0.27
   106    -0.16   107     0.36   108    -0.09   109     0.46   110     0.18
   111    -0.20   112     0.07   113    -0.18   114     0.19   115    -0.12
   116     0.42   117     0.33   118     0.83   119     0.33   120     0.61
   121     0.00   122     0.35   123     0.39   124     0.13   125     0.78
   126     0.83   127     0.75   128     0.86   129     0.58   130     1.32
   131     1.13   132     1.38   133     1.58   134     1.44   135     2.01
   136     1.67   137     2.00   138     2.40   139     2.06   140     2.26
   141     2.49   142     2.86   143     2.97   144     3.46   145     3.62
   146     3.72   147     4.38   148     3.89   149     4.81   150     4.72
   151     4.83   152     5.06   153     5.68   154     5.23   155     5.27
   156     5.53   157     5.87   158     5.87   159     5.74   160     5.89
NUMBER OF ITERATIONS 0

POSTREFINE 10
FIT CRYSTAL MOSXX 1 to 60, 101 to 160
FIT BATCH ROTX 1 to 60, 101 to 160
FIT BATCH ROTY 1 to 60, 101 to 160
FIT CRYSTAL A* 1 to 60, 101 to 160
FIT CRYSTAL B* 1 to 60, 101 to 160
FIT CRYSTAL C* 1 to 60, 101 to 160
END FIT
@reject_l1pn
OUTPUT FILE 'l1pn_ano.hkl'
[REFERENCE FILM  1]
FORMAT DENZO_IP 

SECTOR 1 to 60
FILE 1 '../l1p/l1p_1_###.x'

SECTOR 1 to 60
FILE 101 '../l1n/l1n_1_###.x'


[omit this in the first few passes, then include it later]
ANOMALOUS

Note that the NUMBER OF ITERATIONS 0 line prevents refinement of these scale/B-factors. You just cut and paste the relevant scale factors from the scale_all log file. (These files were from Woo Joo's successful MAD phasing of p53:BP1 complex using Zn signal collected on X4A using a data with embedded ice).

In order to use the anomalous data for phasing within CCP4 or some other progam you need to write out .hkl files with the anomalous data (I+ and I-) separate. The keyword ANOMALOUS just writes out the I+ and I- data separately in the output file, but treats I+ and I- as equivalent during scaling. This works fine for small anomalous scatering signals and is especially important for data with low redundancies. SCALE ANOMALOUS treats I+ and I- as distinct during scaling, and should not be used unless you have a lot of redundancy and/or you have a relatively large anomalous signal. For Solve, output the data with no merge original index specified in the scalepack file.

Postscript

The current examples I'm using for this guide are more of a special case than a general case. Given time, I hope to provide a more representative case with an expanded special case section plus tips and tricks. In the meanwhile, feedback gratefully accepted to: pjeffrey@princeton.edu.