One of the most important factors in successful automated DNA sequencing is proper primer design. This document describes the steps involved in this process and the major pitfalls to avoid.
**** Use a Computer to Design Primers ****
We highly recommend that a computer be used during primer design in order to check for certain fatal design flaws. Numerous programs are capable of performing this analysis. For example, look for ‘Primer3’ on the web.
Some Basic Concepts: If you are confused by the strands and primer orientation, read this.
Sequencing primers must be able to anneal to the target DNA in a predictable location and on a predictable strand. They furthermore must be capable of extension by Taq DNA Polymerase.
Some people are confused about how to examine a DNA sequence to choose an appropriate primer sequence. Here are a few things for novices to remember:
- Sequences are always written from 5′ to 3′. This includes the sequence of your template DNA (if known), the sequence of the vector DNA into which it is inserted, and the sequence of proposed primers. Don’t ever write a primer sequence reversed or you will only confuse yourself and others.
- Polymerase always extends the 3′ end of the primer, and the sequence you will read will be the same strand (sense or anti-sense) as the primer itself.
- Thus, if you choose a primer sequence that you can read in your source sequence (for example, in the vector), the sequence you will obtain will extend from the primer’s right (3′) end.
- Conversely, if you choose a primer from the strand opposite to what your ‘source’ sequence reads, the resulting sequence will read towards the left.
Here are a couple of examples:
Suppose you have a vector with the following sequence around the Multiple Cloning Site (the ‘MCS’):
If you cloned your DNA of interest between the BamHI and EcoRI sites, you could sequence using the primer ‘CTTGATGCTAGTACTACATC’ (remember – that’s written 5′ to 3′) and you’ll obtain the following sequence from the Core:
What if you wanted sequence from the other strand – Eco to Bam – instead? In that case, you need to select some sequence on the right and then reverse-complement it before requesting the oligo. Picking out some sequence from the figure above:
This is NOT the primer sequence – it is copied verbatim from the above sequence. In fact, if you used this sequence for a primer, sequencing would proceed towards the right, away from your insert. Instead, reverse-complement that sequence:
NOW this should produce the sequence of the opposite strand:
Some fine print: Only rarely does sequencing actually show the nucleotides immediately downstream from the primer. I’ve taken some didactic license in the examples above.
More Advanced Concepts: How to Design a Primer that Works.
Generally, you are starting with some small amount of known sequence that you wish to extend. Here’s how to proceed:
- I. Design primers only from accurate sequence data.
- Automated sequencing (and in fact any sequencing) has a finite probability of producing errors. The sequence obtained too far away from the primer must be considered questionable. To determine what is ‘too far’, we strongly suggest that our clients read the memo Interpretation of Sequencing Chromatograms, which describes how to assess the validity of data obtained from the ABI sequencers. Select a region for primer placement where the possibility of sequence error is low.
- II. Restrict your search to regions that best reflect your goals.
You may be interested in maximizing the sequence data obtained, or you may only need to examine the sequence at a very specific location in the template. Such needs dictate very different primer placements.
- Maximize sequence obtained while minimizing the potential for errors:
Generally, you should design the primer as far to the 3′ as you can manage so long as you have confidence in the accuracy of the sequence from which the primer is drawn. Primers on opposite strands should be placed in a staggered fashion as much as possible.
- Targetted sequencing of a specific region:
Position the primer so the desired sequence falls in the most accurate region of the chromatogram. Sequence data is often most accurate about 80-150 nucleotides away from the primer. Do not count on seeing good sequence less than 50 nucleotides away from the primer or more than 300 nt away (although we often get sequence starting immediately after the primer, and we often return 700 nt of accurate sequence).
- III. Locate candidate primers:
Identify potential sequencing primers that produce stable base pairing with the template DNA under conditions appropriate for cycle sequencing. It is strongly suggested that you use a computer at this step. Suggested primer characteristics:
- Length should be between 18 and 30 nt, with optimal being 20-25 nt. (Although we have had some successes with primers longer than 30 and shorter than 18).
- G-C content of 40-60% is desirable.
- The Tm should be between 55 C and 75 C. Warning: the old “4 degrees for each G-C, 2 degrees for each A-T” rule works poorly, especially for oligos shorter than 20 or longer than 25 nt. Instead, try:
Tm = 81.5 + 16.6* log[Na] + 0.41*(%GC) - 675/length - 0.65*(%formamide) - (%mismatch)
- IV. Discard candidate primers that show undesirable self-hybridization.
Primers that can self-hybridize will be unavailable for hybridization to the template. Generally avoid primers that can form 4 or more consecutive bonds with itself, or 8 or more bonds total. Example of a marginally problematic primer:
This oligo forms a substantially stable dimer with itself, with four consecutive bonds at two places and a total of eight inter-strand bonds.
Primers with 3′ ends hybridizing even transiently will become extended due to polymerase action, thus ruining the primer and generating false bands. Be somewhat more stringent in avoiding 3′ dimers. For example, the following primer self-dimerizes with a perfect 3′ hybridization on itself:
The above oligo is pretty bad, and almost guaranteed to cause problems. Note that the polymerase will extend the 3′ end during the sequencing reaction, giving very strong sequence ACTATGC. These bands will appear at the start of your ‘real’ data as immense peaks, occluding the correct sequence. Most primer design programs will correctly spot such self-dimerizing primers and will warn you to avoid them.
Note however that no computer program or rule-of-thumb assessment can accurately predict either success or failure of a primer. A primer that seems marginal may perform well, while another that appears to be flawless may not work at all. Avoid obvious problems, design the best primers you can, but in a pinch if you have few options, just try a few candidate primers, regardless of potential flaws.
- V. Verify the site-specificity of the primer.
- Perform a sequence homology search (e.g. dot-plot homology comparison) through all known template sequence to check for alternative priming sites. Discard any primers that display ‘significant’ tendency to bind to such sites. We can provide only rough guidelines as to what is ‘significant’. Avoid primers where alternative sites are present with (1) more than 90% homology to the primary site or (2) more than 7 consecutive homologous nucleotides at the 3′ end or (3) abundance greater than 5-fold higher than the intended priming site.
- VI. Choosing among candidate primers.
If at this point you have several candidate primers, you might select one or a few that are more A-T rich at the 3′ end. These tend to be slightly more specific in action, according to some investigators. You may want to use more than one primer, maximizing the likelihood of success.
If you have no candidates that survived the criteria above, then you may be forced to relax the stringency of the selection requirements. Ultimately, the test of a good primer is only in its use, and cannot be accurately predicted by these simplistic rules-of-thumb.
With luck, though, you have plenty of options for primers. For a sequence assembly project, design more primers than you think you really need so that if the sequence isn’t as long as you hoped, you might still obtain sufficient overlapping data to assure you of a good sequence consensus. We recommend that you sequence both strands, for better confirmation. On one strand, space the primers 500 to 700 nt part (shorter spacing is safer!). On the opposite strand, place the primers in staggered fashion away from the first strand primers, as depicted below: