ESTproject

To aid us in locating genes that may be important for oil metabolism in Maize, we are looking at gene expression patterns in high-oil and low-oil lines. One way to quickly identify gene differences at the DNA level is to look for single nucleotide polymorphism (SNPs) in gene products. We have chosen to start our search by using genes defined by ESTs produced at various stages of development.

Due to the redundant nature of EST libraries, and to maximize the number of unique genes scored, we are building a set of non-redundant EST contigs using the EST sequences coming out of the Maize Gene Discovery project as our starting point. The EST sequences are downloaded from ZmDB- A Maize Genome Database, loaded into the Macintosh version of the Sequencher program (ver.3.1, Gene Codes Corp, Ann Arbor, MI) and assembled using the "dirty data" parameter. This analysis parameter uses a rigorous set of algorithms in which ambiguous base calls are considered poor matches to exact base calls. It requires a minimum of 85% base match and a minimum overlap of 20 bases. Once assembled, contigs are manually checked and minimally edited. Consensus sequences are then blasted against the "nr' & "est" databases using either blastn or blastx at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA

Currently we have:

10,510 sequences analyzed (thru ZmDB release 052699) ***

Number of singleton contigs ("unique" clones)=3841
Number of muticlone contigs (2 or more members)=1747

which consist of 6669 sequences

Some sequences, released by ZmDB, have been removed from this analysis. One set, 148 ESTs, is sequence originally released and then subsequently deleted from ZmDB. A second set, 75 ESTs, is sequences we have labeled as problematic ("BAD"). These sequences consist of mainly long stretches of polynucleotides and tend to assemble into contigs erroneously. They look like they may be the result of polymerase stutter or dye terminator "blobs" though we would need to see the actual chromatographs to be certain of this. The removal of this set is totally based upon our own usage and assessment and should in no way be taken as a statement of invalidity.

To see a list of the multi-sequence contigs, and their members, follow this link. This data is also presented arranged by GenBank accession number (ACC#), or by NID number. Single clone contigs are not presented so if you are unable to find a sequence on these list it is either one of the "singleton" contigs, or has been removed from the analysis for reasons stated above.

At the present time, the contig consensus sequence and blast data is not provided, but we will be including it in a future update to this site. In the future we will provide infomation on SNPs found within these EST sequences as well as the primers used to find them and any mapping data we obtain.

***NOTE: Analysis of sequences through release 060799 has been done, but the data has not been reformatted for disply- we hope to get the updated data displayed soon- please check back