Arabidopsis Genomic Sequencing: Sequences of cosmids g8261 and g17311 from Chromosome 1.

Howard M. Goodman (1,2)
Paul Gallant (1)
Steve Keifer-Higgins (2)
Marc Rubenfield (2)
George M. Church (2)
(1) Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114.
(2) Department of Genetics, Harvard Medical School, Boston, MA 02115.

As part of an effort to clone the ga2 locus and to intiate genomic sequencing of Arabidopsis thaliana, we have sequenced the genomic cosmid clone g8261 and a portion of g17311. These overlapping cosmids are part of a series of approximately 17,000 cosmids which we have fingerprinted (1, 2) and some of which have been mapped as RFLPs (3) onto the five chromosomes of Arabidopsis. g8261 and g17311 map to the bottom of chromosome 1 at a position of 159cM (4).

We employed the Multiplex DNA sequencing technique (5). Multiplexing allows the simultaneous processing of many sequencing templates by pooling these at the earliest stages of the procedure and resolves them into individual sequences at the latest possible stage thus enabling a high throughput of templates with a reduction in repetitious steps. Multiplex analysis can be described in four stages. First, the DNA to be sequenced is randomly cloned into a series of vectors (plex vectors) which differ only in the sequences immediately flanking the insert. These vector sequence differences or tags serve to later allow the sequences obtained to be read individually. In the second stage of Multiplexing, single clones from each of these libraries are pooled, grown en mass, and DNA prepared for sequence analysis. At this stage, all DNA pools are transferred into a 96 well microtiter plate and the individual pools of clones processed as a "pool" of 96. Thus in a single plate there are 96 x N (N is the number of vectors used, which was 7 in our analysis of cosmids g8261 and g17311) templates that can be read individually in the final stage of multiplexing. In the microtiter format, the DNAs are subjected to DNA sequencing by the chemical degradation method (6). Each microtiter plate produces material for 8 DNA sequencing gels. After separation of the sequencing products on the gels, the DNA ladders are transferred to nylon membranes which are hybridized to probes specific for each vector. Hybridization of the membranes to a radiolabeled probe specific for an individual vector allows a single DNA sequence to be read from each of the pools. Probe is easily removed from the membranes which can then be re-hybridized to another probe allowing another set of DNA sequence ladders to be read. As there are two tag sequences in each vector (allowing DNA sequence to be read from each end of the insert) each membrane can produce 12 x 2 x N DNA sequence reads (or 168 reads per membrane in this case where 7 vectors were used; the number can be larger if more vectors are used). Thus in the final stage of Multiplex sequence analysis, 96x2xN (96x2x7=1344 in this case) DNA sequence reads can be obtained from a single 96 well plate. The Replica gel reading and sequence editing program (7) and the FALCON sequence assembly program (8) were used to produce the consensus sequence from the overlapping gel reads.

Cosmid g8261 was sequenced, but the >7 fold sampling redundancy achieved did not produce complete coverage. The small gaps or regions where sequence was not covered in both directions were completed by directed dideoxy sequencing using oligonucleotide primers derived from the contig ends. Final sequence length of the cosmid is 37570 bp.

Assembly of the cosmid sequence g17311 revealed that the cosmid had suffered a large deletion of the Arabidopsis insert DNA. The remaining insert was 11862 bp with approximately 22 fold coverage.

The sequence of g17311 from base 1 to base 7505 is identical to bases 29557 to 37061 of g8261 with the exception of 3 differences at 29749 (G to C), 30309 (A to G) and 34866 (G to C). The unique region of g17311 from 7506 to 11862 would appear to be from the other end of the clone. The GenBank accession number for g8261 is U53501, for g17311 the accession number is U53502.

Sequence similarities:
The two cosmid sequences were examined for similarities to sequences in GenBank using Blastx and Blastn (9). Candidate proteins as identified by the program genefinder (10) were examined for sequence similarities using Blastp. The non-overlapping region of g17311 had no regions of high similarity, and will not be considered further. However, g8261 contained several regions with high similarity to previously characterized sequences. Sequences scoring matches with P values less than 1.0e-20 included:

Arabidopsis thaliana zinc finger protein
H(+)-ATPase
DNA (cytosine-5)-metyltransferase
Nucleoporin protein
S-receptor kinase

The above sequence similarities and their positions along with additional DNA & Protein matches are available the following page.


References and Notes

1. Hauge, B.M. and H.M. Goodman. 1992. In Methods in Arabidopsis Research (ed. C. Koncz, N.-H. Chua and J. Schell), pp. 191-223. World Scientific Publishing Co. Pte. Ltd., Singapore.

2. Hauge, B.M., S.M. Hanley, S. Cartinhour, J.M. Cherry, H.M. Goodman, M. Koornneef, P. Stam, C. Chang, S. Kempin and L. Medrano. 1993. An integrated genetic/RFLP map of the Arabidopsis thaliana genome. Plant J. 3:745-754.

3. Nam, H.G., J. Giraudat, B. Den Boer, F. Moonan, W.D.B. Loos, B.M. Hauge and H.M. Goodman. 1989. Restriction fragment length polymorphism linkage map of Arabidopsis thaliana. Plant Cell 1:699- 705.

4. Lister, C. and C. Dean. 1993. Recombinant inbred lines for mapping RFLP and phenotypic markers in Arabidopsis thaliana. Plant J. 4: 745-750.

5. G.M. Church and S. Kiefer-Higgins (1988). Multiplex DNA Sequencing. Science 240:185-188.

6. Maxam, A.M., and W. Gilbert.1980. Sequencing end-labeled DNA with base-specific cleavages. Methods in Enzymology 65:497-560.

7. Church, G.M., G. Gryan, N. Lakey, S. Kieffer-Higgins, L. Mintz, M. Temple, M. Rubenfield, L. Jaehn, H. Ghazizadeh, K. Robison, P. Richterich. 1994. In Automated DNA sequencing and analysis techniques. (ed M.D. Adams, C. Fields, J.C. Venter) pp 11-16. Academic Press, San Diego.

8. G. Gryan and G. M. Church personal communication. FALCON is available at URL <ftp://rascal.med.harvard.edu/gryan/falcon/>.

9. Altschul, S.F., W.Gish, W. Miller, E.W. Myers and D.J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol 215:403-410.

10. C. Wilson and P. Green personal communication. Contact Colin Wilson. Genefinder tables specific for Arabidopsis were from AAtDB.