Anonymous rare-cutter restriction fragments (ARRFs): a new source of polymorphisms

John H. McDonald

Department of Biological Sciences
University of Delaware
Newark, Delaware 19716 USA

Introduction

This web page describes anonymous rare-cutter restriction fragments (ARRFs), a new source of co-dominant, single-copy, nuclear polymorphisms, which may be useful for a variety of applications. If you need a few dozen polymorphisms and were considering microsatellites, RAPDs or AFLPs, the ARRF technique may be a worthwhile alternative.

The technique is similar in concept to AFLPs, but instead of PCR amplifying the restriction fragments, the ARRF technique involves fluorescently labeling restriction fragments and observing them on an automated DNA sequencer. The fluorescence intensity of ARRFs is proportional to their copy number, so homozygotes for the presence of a fragment can be distinguished from heterozygotes by the twofold difference in fluorescence, and multiple-copy and organellar fragments can be identified by their much greater fluorescence than single-copy nuclear fragments. This is an advantage over PCR-based techniques such as RAPDs and AFLPS, where the intensity of bands is a function of the relative efficiency of the amplification of each fragment in a complex, competitive reaction. ARRFs from parasites, gut contents, or laboratory contamination would probably have much lower fluorescence than fragments from the desired organism, eliminating another source of the artifacts that plague other techniques.

The lack of a PCR step in the ARRF technique also has disadvantages, because the DNA required for the technique must be greater in both quantity and quality than that required for PCR-based techniques. Small organisms, such as individual Drosophila, won't provide enough DNA to yield detectable bands. To give you a rough idea, the DNA from about 25 mg (wet weight) of mussel or oyster gill tissue is enough to do the ARRF technique once. The ARRF technique probably won't work well on organisms with unusually large genomes (I'm guessing that above 10 gigabases would be a problem) or on organisms with heavily methylated DNA.

How it works

The genomic DNA from the organism is first digested with two restriction enzymes (call them enzymes A and B) that cut rarely. Most of the physically sheared DNA molecules in the DNA prep are not cut at all, and some are cut only once, because the enzymes are rare-cutters whose restriction sites are hundreds of thousands of base pairs apart, on average. By chance, there are a few places in the genome where an enyzme A cut site is a few hundred base pairs from an enzyme B cut site. Those fragments are the ones we want. Finding a pair of enzymes that yield a useful number of fragments is an important part of the technique, requiring a mix of educated guesswork and trial and error.

Two adapters are ligated onto the restriction fragments. A biotin labeled adapter is ligated onto the cohesive end left by enzyme A, and a fluorescently labeled adapter is ligated onto the cohesive end left by enzyme B.

The fragments with biotin labels are attached to streptavidin-coated paramagnetic beads. The beads are attracted to the side of a tube with a strong magnet, and all of the fragments without biotin are washed away. This get rids of most of the DNA (which might otherwise clog up the gel), fragments with just fluorescent labels, and unincorporated fluorescent adapter (which would otherwise cause overwhelming background fluorescence on the gel).

The fragments attached to the beads are denatured with sodium hydroxide solution. The strand labeled with biotin stays attached to the beads, while the other strand, with its fluorescent label, is transferred to a new tube.

The fluorescently labeled single-stranded DNA is precipitated, washed, dried, and run on an automated DNA sequencer. Size standards are included in each lane of the gel. For any one individual, most bands have either fluorescence intensity X or fluorescence intensity 2X, corresponding to presence/absence heterozygotes and presence/presence homozygotes, respectively. Presence/absence polymorphism of a particular fragment could result from polymorphism at either restriction site, or from indel polymorphism inside the fragment. Some bands may have fluorescence much greater than 2X; these are either from organelle genomes or are multiple copy nuclear fragments, and would be ignored for most purposes. Some bands may have fluorescence much less than X; these are probably parasites or gut contents, and would be ignored for most purposes.

Preliminary steps

1. Choose some enzymes. I've compiled a list of commercially available six-, seven- and eight-cutters that could be used in the ARRF technique. You need to find a pair of enzymes that yield a reasonable number of fragments (one to a few dozen) in the size range you can resolve on your DNA sequencer (50 to 500 or 1000 nucleotides). The expected number of fragments a particular pair of enzymes will yield depends on the size and GC content of the organism's genome. There are web pages with listings of genome sizes for plants , animals , and a variety of organisms. Information on GC content is routinely determined for prokaryotes and is available in Bergey's Manual and other standard references. For eukaryotes, information on GC content is harder to find. If DNA sequences are available for your species, the GC content of non-coding regions (introns and flanking regions) is generally similar to the genomewide GC content.

A crude estimate of the expected number of fragments, F(E), is given by

F(E)=2*G*(GC/2)A(GC)*((1-GC)/2)A(AT)*R*(GC/2)B(GC)*((1-GC)/2)B(AT)

where G is the genome size (in basepairs), GC is the G+C content of the genome (as a decimal, not a percentage), A(GC) is the number of G+Cs in the restriction site for enzyme A (and so on for A(AT), B(GC), and B(AT)), and R is the size range of fragments that will be resolved (in nucleotides). As an example, for the enzymes Xma I (C/CCGGG) and Sgf I (GCGAT/CGC) in the American oyster, with a genome size of 700,000,000 bp and a GC content somewhere around 30%, on a sequencer that can resolve bands in the 50 to 500 nt size range, the expected number of fragments is about 10:

F(E)=2*700,000,000*(0.15)6*(0.35)0*450*(0.15)6*(0.35)2

This is a crude approximation; it assumes that the genome is entirely single-copy (repetitive DNA will reduce the number of fragments), it ignores the non-uniformity of GC content (GC-rich regions will increase the number of fragments for most rare-cutters), and it ignores the fact that some dinucleotide and higher combinations are more or less common than expected. For example, mammals have fewer CG dinucleotides than expected, which makes most of the rare-cutting restriction enzymes cut less often than you would expect.

For almost anything except a microorganism, you'll want to make at least one of your two enzymes an eight-cutter. According to Rebase, there are six commercially available eight-cutters with cohesive ends:

Asc I (GG/CGCGCC)
AsiS I (isoschizomer Sgf I) (GCGAT/CGC)
Not I (isoschizomer CciN I) (GC/GGCCGC)
Sbf I (isoschizomers Sse8387 I , Sda I) (CCTGCA/GG)
Fse I (GGCCGG/CC)
Pac I (TTAAT/TAA)

Pac I sites are common in some mitochondrial genomes I've looked at, which would cause problems by "soaking up" most of your adapter.

For a eukaryote with an AT-rich genome, you can use one eight-cutter combined with a six-cutter with all GCs in its restricition site. For organisms that are not strongly AT-rich, you'll probably need to use two eight-cutters. For microorganisms, with their small genomes, you may be able to use two six-cutters.

The two enzymes you've chosen must create different cohesive ends. You can't use Not I and Eag I together, for example, because they leave the same cohesive end. You also need to make sure there is a buffer that both enzymes have good activity in; New England Biolabs has a useful chart of activities in different buffers for the enzymes they sell. (You could, I suppose, digest with one enzyme, change the buffer, then digest with the second enzyme, but why make extra work when there are plenty of pairs of enzymes that will digest in a single buffer.)

Once you've found a pair of enzymes that should give the right number of bands, don't just try that one combination. You'll need some trial and error to find the combination of enzymes that works well.

2. Buy some adapters. If you will be using one enzyme that cuts less commonly than the other, make the biotin-labeled adapter match the cohesive end of the less common cutter. This will reduce the amount of magnetic beads you have to use. Biotin and fluorescently labeled adapters are rather expensive, so try to design them so that you can buy one labeled oligo and several cheap unlabeled oligos that will yield several different cohesive ends. For example, this oligo:

label-5'-TGCAAAATCCCAAACCGG-3'

when combined with 5'-CGCGCCGGTTTGGGATTTTGCA-3' matches the cohesive end of Asc I and BssH II, when combined with 5'-GGCCCCGGTTTGGGATTTTGCA-3' matches the cohesive end of Not I, Eag I, and PspOM I, when combined with 5'-CCGGCCGGTTTGGGATTTTGCA-3' matches the cohesive end of Xma I and NgoM IV, and when combined with 5'-TTTGGGATTTTGCA-3' matches the cohesive end of Fse I.

The fluorescent dye you use on the second adapter will depend on the kind of DNA sequencer you have. Fluorescein works well on ABI sequencers. Buying an oligo with fluorescein on the 5' end is relatively inexpensive; labeling with a second fluorescein at an internal T is more expensive, but it does double the fluorescence. Adding more than two fluoresceins to the adapter may cause quenching, and you could end up with much less fluorescence. If you use a Licor sequencer, you'll have to label the adapter with whatever dye you use for sequencing primers. You could even use a labeled sequencing primer paired with the appropriate complementary oligo as an adapter for those enzymes that leave a 5' cohesive end. I don't know whether internal labeling is available or desirable for Licor dyes.

3. Prepare adapter stocks. When your oligos arrive, resuspend them in TE to make a concentration of 250 µM or stronger. Store this frozen at -20° C. If you have a lot of oligo, divide it into several aliquots and freeze them, as repeated freeze-thaw cycles might be harmful. To make an adapter, combine equimolar amounts of one labeled oligo and one complementary unlabeled oligo in a microcentrifuge tube. Heat to 80° C for two minutes, then remove from heat and let it cool slowly to room temperature. Then add STE to make the desired final concentration. This adapter may be stored in a refrigerator for weeks.

The "desired final concentration" of an adapter will depend on how many restriction sites the adapter ligates to in the genome of your organism. To calculate this, start with the sensitivity of your sequencer. Older ABI machines (373, 310, 377) can detect a band with 100 attomoles of fluorescein, so let's say you're going to use 500 amol of genomes to be on the safe side. The number of restriction sites for enzyme A in a genome is crudely estimated by the equation

N=G*(GC/2)A(GC)*((1-GC)/2)A(AT),

so for my oysters (assuming 700 Mbp genome and 30% GC), I'd expect about 1000 Sgf I sites per genome. Multiplying this by the number of genomes and then multiplying by 2 (each restriction site generates two cohesive ends), I expect to need 1 pmol of the Sgf I adapter per reaction. If my guess about the GC content is low, there could be a lot more Sgf I sites than I calculated, so I'll use 10 pmol per reaction and make my stock solution 2.5 pmol per µl (or 2.5 µM) so I can use 4 µl per reaction. Xma I sites are more common, so similar calculations suggest that I use 4 µl of 20 µM adapter solution. The usual practice in molecular biology would be to avoid all this math and use a thousandfold excess of everything, but in this case that would be rather expensive.

4. Make or buy a magnet stand. You'll need a magnet stand to attract the paramagnetic beads to the side of microcentrifuge tubes. You can buy one (Promega sells a 12-tube stand for $108, for example) or you can make a 16-tube stand for about $40.

5. Prepare DNA. You're probably used to DNA preps for PCR purposes that look something like this:

  1. Pull a leg off the critter and swish it in some buffer.
  2. Use a little of the buffer as PCR template.

That won't work here. You need large quantities of DNA (hundreds of attomoles of genomes; to put in another way, the DNA from hundreds of millions of cells, or for a typical multicellular eukaryote with a 1 Gbp genome, hundreds of µg of DNA). The DNA must also be of good quality; it can't contain any contaminants that will interfere with the restriction digest. The particular protocol you use, and the tissue you use it on, will depend on your organism. I've been using a CTAB DNA prep that works well on mussels and oysters, and I suspect it will work well on many other animals.

ARRF protocol

1. Digestion. In a microcentrifuge tube, mix

20 µl of DNA (should be approximately 500 amol of genomes; for an organism with a 1 Gbp genome, that's about 300 µg of DNA)
5 µl of 10x restriction enzyme buffer (one that both enzymes work in)
5 units of enzyme A
5 units of enzyme B
(optional) 3 µg of E. coli DNA
water to make 50 ul

Mix by pumping through a pipette tip several times, then incubate at 37° C for several hours. You're digesting a lot of DNA, so the usual one-hour digest may not give you complete digestion. You may want to try an overnight digestion, although eventually some dephosphorylation of the digested fragments may occur.

The optional E. coli DNA may yield bands that are useful as size markers or that will help you troubleshoot the procedure.

2. Ligation. Add to the tube:

4 µl of biotin-labeled adapter, at the appropriate concentration (see above) 4 µl of fluorescent adapter, at the appropriate concentration
10 µl of 10x ligation buffer
200 units of DNA ligase
water to make 100 ul

Mix by pumping through a pipette tip several times, then incubate at room temperature in the dark for one hour.

3. Bead attachment. You will need enough paramagnetic beads to bind the amount of biotin in the reaction. As an example, if you're using Promega's MagneSphere beads, 1 mg of beads (1 ml of the original bead suspension) will bind 1 nmol of biotin. The example reaction, using 10 pmol of biotin adapter, would require a minimum of 10 µl of bead suspension. If there aren't enough beads, the unligated biotin adapter seems to bind preferentially to the beads, so I'd use 20 µl of bead suspension for this example.

First, wash the beads. Put the appropriate volume of bead suspension in a microcentrifuge tube. Put the tube in the magnet stand. The large beads will make a visible pellet in a few seconds, but wait several minutes before removing the liquid, to give the tiny invisible beads a chance to migrate to the pellet. Remove and discard the liquid, then remove the tube from the magnet stand, add 400 µl of BW buffer, mix by flicking, put back in the magnet stand, then after a few minutes remove and discard the liquid. Repeat this wash procedure. Then resuspend the beads in BW buffer, and to each tube add:

100 µl of water
the appropriate amount of beads, resuspended in BW buffer
additional BW buffer, to make the total volume 400 µl

Mix by inverting several times. Incubate at 43° C in the dark for several hours, mixing by inverting once or twice every hour. Note that most bead attachment protocols require only 15 minutes at room temperature; the long time at elevated temperature is required here because the beads are in a viscous solution of large DNA fragments. Also, while resuspending the beads in the solution once or twice an hour seems to help, putting the tube in a tube inverter so that it is constantly mixed seems to shear the DNA.

4. Bead wash. Put the tubes in the magnet stand for 10 minutes. Remove and discard the liquid. Add 200 µl of BW buffer, mix by flicking, put in the magnet stand for 5 minutes, and remove and discard the liquid. Repeat this wash step.

5. Denaturation. After the second wash, add 50 µl of 0.1 N NaOH solution. Mix by flicking, put in the magnet stand for 5 minutes, then save the liquid in a new tube.

6. Precipitation. Add 5 µg of glycogen, 7.5 µl of 2 M NaOAC, and 150 µl of cold 100% ethanol. Precipitate at -20° C for at least an hour. Spin in a microcentrifuge for 15 minutes, then remove and discard the liquid. Wash with 1 ml of cold 70% ethanol, then air dry or dry in a speedvac. Note that because you're precipitating a small amount of DNA, the precipitation step is longer and colder than usually necessary, and a carrier such as glycogen is helpful.

7. Run on a sequencer. Resuspend in the appropriate loading solution for your sequencer, add some size marker (if you're not using E. coli ARRFs as a size marker), and run on your sequencer.

8. Analyze data. You should get data that look like this:

ARRF chromatogram

This is part of the chromatogram of the ARRF procedure run on an oyster with the enzymes Not I and BssH II. The two large peaks are from E. coli DNA included in the reaction, which for this pair of enzymes yields good size markers. The position of these peaks was used to estimate the size of the oyster fragments. The fragments at 177.3 and 196.9 nt are homozygotes, while the remaining peaks are heterozygotes.

9. Troubleshooting. Run some reactions with just E. coli DNA, and some with both your organism and E. coli DNA. If the E. coli peaks are equally strong in both, it means the problem is with your DNA. Either you don't have enough DNA, or it isn't cutting due to methylation, or it isn't producing ARRFs because there just aren't any cut sites for your two enzymes that are close enough together. If the E. coli peaks are strong when run by themselves, but weak when run with your organism, there are several possibilities. You might not have enough restriction enzyme to digest the DNA; you might not have enough adapters for the number of restriction sites; or something in your DNA prep might be interfering with the digestion or ligation. Try doing the digests separately for your organism and E. coli, then combining them for the ligation; if the E. coli peaks are now strong, the problem was with the digestion, not the amount of adapter. You can also do the digestion and ligation separately for your organism and E. coli, then combine them for the bead attachment; if that results in lower E. coli peaks, it suggests that you don't have enough beads, or that you aren't letting the bead attachment step go on long enough.

If you get fairly strong peaks, but they have many different heights rather than falling into a good bimodal distribution, it could be that you're getting partial digestion, and some sites are cutting more efficiently than others. Try using more restriction enzyme and letting the digest go longer. Uneven peak heights could also result from methylation of some of the restriction sites; if that's the case, there's not a lot you can do.

10. Variations. There are a number of possible variations on this technique. For microorganisms, which have small genomes, you can simplify the procedure by cutting with a single rare-cutter, ligating on a fluorescent adapter, and running on a gel. In my hands, the amount of background fluorescence, presumably from physically sheared fragments that contain one restriction site, is unacceptable. But with a little more care in the DNA prep, this might work well.

If all of the combinations of rare-cutters yield too many fragments, I can think of a couple of options. One would be to use three enzymes: the two ARRF enzymes, plus a common-cutting enzyme that leaves blunt ends. This way, most of the ARRFs would be cut in the middle and thus wouldn't appear, and only the few fragments without the common cutter site would be visible.

Another possiblity would be to use an enzyme that leaves an "NNN" cohesive end, such as Dra III (CACNNN/GTG). By using an adapter that matches just one of the 64 possible NNN overhangs, you would in effect be using a nine-cutter enzyme (and you'd have 64 of these "nine-cutters" to choose from, by using 64 different adapters).

Recipes:

STE
10 mM Tris-HCl, pH 8.0
50 mM NaCl
1 mM EDTA

TE
10 mM Tris-HCl, pH 8.0
1 mM EDTA

CTAB
2.4 g tris
16.4 g NaCl
1.5 g EDTA
4 g CTAB
4 g PVP40
H2O to 200 ml

BW (bind and wash) buffer
10 mM Tris-HCl, ph 7.5
1 mM EDTA
2 M NaCl

Acknowledgments: I thank V.P. Edgcomb, A.M. Grasso, E. Mayo, and T.M. Onami for assistance in the lab, M.K. Duncan, K.W. Dunn, B.C. Verrelli and B.J. Wolpert for helpful discussions, and S.P. Kahneuf for suggesting the acronym. Funding was provided by NSF, the University of Delaware Research Foundation, and the Department of Biological Sciences.


Return to McDonald home page

Send comments to John H. McDonald (mcdonald@udel.edu).

This page was last revised July 23, 2002. Its URL is http://udel.edu/~mcdonald/arrf.html