titlebar image
Szostak Lab

We apply the principles of Darwinian evolution to populations of molecules in the laboratory. What this means in practice is that we generate large numbers of DNA molecules with different sequences and then impose selective pressure on this mixed population so that sequences with particular properties increase in abundance. We can either select directly for DNAs with desirable properties or select indirectly for DNAs that encode RNAs or proteins that have particular binding
or catalytic activities. This approach, often referred to as directed evolution, can be applied to diverse problems in fundamental basic research and also has many practical applications.

Starting with a completely random pool allows us to sample all of sequence space, so that a variety of different and independent solutions to a given problem can be recovered. In these
experiments, we can sample more than 1015 nucleic acid sequences and, after a few
cycles of selection and amplification, recover the descendents of a single functional molecule from the initial population. In vitro selection for sequences that fold up into highly specific binding sites has been used to isolate many nucleic acids, called aptamers, that bind a wide range of small
biomolecules, including nucleotides, amino acids, antibiotics, and cofactors. The capacity to generate so many diverse structures from polymers containing only four chemically similar subunits
is surprising and raises questions about the minimal chemical requirements for the generation of stable structures. Recently we have found that ATP aptamers can be selected from pools
containing only three of the four nucleotides, but not from two-nucleotide pools. We are using in vitro selection to examine the ability of RNA to evolve new or enhanced binding specificities, starting from defined initial binding structures. These experiments will test the evolvability, or evolutionary flexibility, of RNA structures.

Another fundamental question that we are attempting to address through aptamer selections is the
relationship between information content and biochemical function. It seems intuitively obvious
that it should take more information to specify or encode a structure that does a better job of performing some function, such as binding a target molecule. However, it is difficult to turn this conjecture into a quantitative hypothesis, and even more difficult to develop experimental evidence
for a specific quantitative relationship between the information content of a sequence and the function that it encodes. We are approaching this problem by examining the structures of a
large number of aptamers that have been selected for binding to a particular target, but with affinities ranging from quite weak to very strong. Our preliminary results suggest that the
high-affinity aptamers are much more structurally complex than the low-affinity aptamers, consistent with the intuitive view outlined above.

Catalysts of chemical transformations must be able to bind the transition state of the reaction and to distinguish the transition state from the ground states, i.e., the substrates and products. This requirement for discrimination between very similar structures suggests that catalysts should be less common than simple binding sites. Nevertheless, in vitro selection has been used to isolate many RNA and DNA molecules that catalyze a wide range of different reactions. Ribozymes that act as ligases, kinases, and nucleases, as well as ribozymes that catalyze a variety of alkyl and acyl
transfer reactions, have been obtained from pools of random sequences. These results support the idea that ribozymes could have played an important role in the evolution of metabolism in early cells, prior to the evolution of protein synthesis and protein enzymes. Our current work in this area is focused on the evolution of ribozymes that catalyze RNA replication, and ribozymes with enhanced sequence-specific RNase activity.

Application of the principles of in vitro selection and directed evolution to proteins could be a powerful tool for investigating the origins of protein function and structure and for examining protein-ligand interactions on a genomic scale. Our approach to linking genotype and phenotype is called mRNA display, which involves the covalent attachment of a translated protein to its own mRNA. We do this by covalently linking puromycin,an antibiotic that mimics an aminoacylated tRNA, to the 3' end of a synthetic mRNA through a DNA linker. When the ribosome reaches the end of the open reading frame, it stalls at the DNA linker, allowing the puromycin to accept the nascent peptide chain. We have used mRNA display to select for ATP-binding proteins from a large library of random-sequence polypeptides. One of these has been optimized for improved folding and binding by subsequent rounds of directed evolution. Analysis of the overexpressed and purified protein showed that it was a zinc metalloprotein, although it does not exhibit significant sequence homology to naturally occurring zinc metalloproteins. We have started to evolve other binding domains, enzymes, and ribonucleoproteins. An exciting future application will be the ability to conduct side-by-side comparisons of RNA and protein evolution.

The technology of mRNA display can also be used to prepare libraries of RNA-protein fusions from cellular sequences. We are attempting to use DNA microarrays to decipher the results of our genomic in vitro selection experiments in a rapid and potentially high-throughput manner. This combination of mRNA display and microarray analysis may be useful as a tool for the genome-wide analysis of protein function.

This work was supported in part by grants from the National Institutes of Health and the NASA Astrobiology Institute.
Harvard Med.School Home PageDept. of Genetics-HMSMolecular Biology Home PageMGH home page