WO2005012550A2

WO2005012550A2 - Screening methods and libraries of trace amounts of dna from uncultivated microorganisms

Info

Publication number: WO2005012550A2
Application number: PCT/US2004/024954
Authority: WO
Inventors: Jay Short
Original assignee: Diversa Corporation; Keller, Martin; Wyborski, Denise; Chang, Hwai; Abulencia, Carl
Priority date: 2003-07-31
Filing date: 2004-07-30
Publication date: 2005-02-10
Also published as: WO2005012550A3

Abstract

The invention provides methods for making a gene library from trace amounts of DNA derived from a plurality of species of organisms comprising obtaining trace amounts of cDNA, gDNA, or genomic DNA fragments from a plurality of species of organisms, amplifying the DNA so obtained, and ligating the DNA to a DNA vector to generate a library of constructs in which genes are contained in the DNA. The invention also provides methods for screening clones having DNA recovered from trace amounts of DNA derived from a plurality of species of uncultivated organisms. The invention also provides methods for identifying and enriching for a polynucleotide encoding an activity of interest.

Description

SCREENING METHODS AND LIBRARIES OF TRACE AMOUNTS OF DNA FROM UNCULTIVATED MICROORGANISMS RELATED APPLICATIONS [001] This application claims priority to U.S. Patent Application Serial No. 60/573,473, filed May 21, 2004, and U.S. Patent Serial No. 10/633,248, filed July 31, 2003, which is continuation-in-part (CIP) of U.S. Patent Application Serial No. 09/875,412, filed on June 6, 2001, now pending, which is a continuation of U.S. Patent Application Serial No. 08/988,224, filed December 10, 1997, issued as U.S. Patent No. 6,280,926, which is a divisional of U.S. Patent Application Serial No. 08/657,409, filed June 3, 1996, issued as U.S. Patent No. 5,958,672, which is a CIP of U.S. Patent Application Serial No. 08/568,994, filed December 7, 1995, now abandoned, which is a CD? of U.S. Patent Application Serial No. 08/503,606, filed July 17, 1995, issued as U.S. Patent No. 6,004,788. This application is also a CIP of U.S. Patent Application Serial No. 10/121,145, filed April 9, 2002, now pending, which is a continuation of U.S. Patent Application Serial No. 09/421,970, filed May 12, 1998, issued as U.S. Patent No. 6,368,798, which is a divisional of U.S. Patent Application Serial No. 08/944,795, filed October 6, 1997, issued as U.S. Patent No. 6,030,779, which is a Cff of U.S. Patent Application Serial No. 08/692,002, filed August 2, 1996, issued as U.S. Patent No. 6,054,267, which is a CIP of U.S. Patent Application Serial No. 08/657,409, filed June 3, 1996, issued as U.S. Patent No. 5,958,672. This application is also a CIP of U.S. Patent Application Serial No. 09/975,036, filed October 10, 2001, now pending, which is a CIP of U.S. Patent Application Serial No. 09/738,871, filed December 14, 2000, now pending, which is a CIP of U.S. Patent Application Serial No. 09/685,432, filed October 10, 2000, now pending, which is a CP of U.S. Patent Application Serial No. 09/444,112, filed November 22, 1999, now pending, which is a CIP of U.S. Patent Application Serial No. 09/098,206, filed June 16, 1998, issued as U.S. Patent No. 6,174,673, which is a CIP of U.S. Patent Application Serial No. 08/876,276, filed June 16, 1997, now pending. Each of the aforementioned applications is explicitly incorporated herein by reference in their entirety and for all purposes. FIELD OF THE INVENTION

[002] This invention relates to the field of preparing and screening libraries of clones containing DNA derived from trace amounts of microbially derived DNA.

BACKGROUND OF THE INVENTION

[003] There is a critical need in the chemical industry for efficient catalysts for the practical synthesis of optically pure materials; enzymes can provide the optimal solution. All classes of molecules and compounds that are utilized in both established and emerging chemical, pharmaceutical, textile, food and feed, detergent markets must meet stringent economical and environmental standards. The synthesis of polymers, pharmaceuticals, natural products and agrochemicals is often hampered by expensive processes which produce harmful byproducts and which suffer from low enantioselectivity. Enzymes have a number of remarkable advantages that can overcome these problems in catalysis: they act on single functional groups, they distinguish between similar functional groups on a single molecule, and they distinguish between enantiomers. Moreover, they are biodegradable and function at very low mole fractions in reaction mixtures. Because of their chemo-, regio- and stereospecificity, enzymes present a unique opportunity to optimally achieve desired selective transformations. These are often extremely difficult to duplicate chemically, especially in single-step reactions. The elimination of the need for protection groups, selectivity, the ability to carry out multi-step transformations in a single reaction vessel, along with the concomitant reduction in environmental burden, has led to the increased demand for enzymes in chemical and pharmaceutical industries. Enzyme- based processes have been gradually replacing many conventional chemical-based methods. A current limitation to more widespread industrial use is primarily due to the relatively small number of commercially available enzymes. Only ~300 enzymes (excluding DNA modifying enzymes) are at present commercially available from the

>3000 non DNA-modifying enzyme activities thus far described.

[004] The use of enzymes for technological applications also may require performance under demanding industrial conditions. This includes activities in environments or on substrates for which the currently known arsenal of enzymes was not evolutionarily selected. Enzymes have evolved by selective pressure to perform very specific biological functions within the milieu of a living organism, under conditions of mild temperature, pH and salt concentration. For the most part, the non-DNA modifying enzyme activities thus far described have been isolated from mesophilic organisms, which represent a very small fraction of the available phylogenetic diversity. The dynamic field of biocatalysis takes on a new dimension with the help of enzymes isolated from microorganisms that thrive in extreme environments. Such enzymes must function at temperatures above 100°C in terrestrial hot springs and deep sea thermal vents, at temperatures below 0°C in arctic waters, in the saturated salt environment of the Dead Sea, at pH values around 0 in coal deposits and geothermal sulfur-rich springs, or at pH values greater than 11 in sewage sludge. Enzymes obtained from these extremophilic organisms open a new field in biocatalysis.

[005] In addition to the need for new enzymes for industrial use, there has been a dramatic increase in the need for bioactive compounds with novel activities. This demand has arisen largely from changes in worldwide demographics coupled with the clear and increasing trend in the number of pathogenic organisms that are resistant to currently available antibiotics. For example, while there has been a surge in demand for antibacterial drugs in emerging nations with young populations, countries with aging populations, such as the US, require a growing repertoire of drugs against cancer, diabetes, arthritis and other debilitating conditions. The death rate from infectious diseases has increased 58% between 1980 and 1992 and it has been estimated that the emergence of antibiotic resistant microbes has added in excess of $30 billion annually to the cost of health care in the US alone. (Adams et al., Chemical and Engineering News, 1995; A ann et al., Microbiological Reviews, 59, 1995). As a response to this trend pharmaceutical companies have significantly increased their screening of microbial diversity for compounds with unique activities or specificities.

[006] There are several common sources of lead compounds (drug candidates), including natural product collections, synthetic chemical collections, and synthetic combinatorial chemical libraries, such as nucleotides, peptides, or other polymeric molecules. Each of these sources has advantages and disadvantages. The success of programs to screen these candidates depends largely on the number of compounds entering the programs, and pharmaceutical companies have to date screened hundred of thousands of synthetic and natural compounds in search of lead compounds. Unfortunately, the ratio of novel to previously discovered compounds has diminished with time. The discovery rate of novel lead compounds has not kept pace with demand despite the best efforts of pharmaceutical companies. There exists a strong need for accessing new sources of potential drug candidates. [007] The majority of bioactive compounds currently in use are derived from soil microorganisms. Many microbes inhabiting soils and other complex ecological communities produce a variety of compounds that increase their ability to survive and proliferate. These compounds are generally thought to be nonessential for growth of the organism and are synthesized with the aid of genes involved in intermediary metabolism hence their name — "secondary metabolites". Secondary metabolites that influence the growth or survival of other organisms are known as "bioactive" compounds and serve as key components of the chemical defense arsenal of both micro- and macroorganisms. Humans have exploited these compounds for use as antibiotics, anti-infective and other bioactive compounds with activity against a broad range of prokaryotic and eukaryotic pathogens. Approximately 6,000 bioactive compounds of microbial origin have been characterized, with more than 60% produced by the gram-positive soil bacteria of the genus Streptomyces. (Barnes et al., Proc. Nat. Acad. Sci. U.S.A.., 91, 1994). Of these, at least 70 are currently used for biomedical and agricultural applications. The largest class of bioactive compounds, the polyketides, include a broad range of antibiotics, immunosuppressants and anticancer agents which together account for sales of over $5 billion per year.

[008] Despite the seemingly large number of available bioactive compounds, it is clear that one of the greatest challenges facing modern biomedical science is the proliferation of antibiotic resistant pathogens. Because of their short generation time and ability to readily exchange genetic information, pathogenic microbes have rapidly evolved and disseminated resistance mechanisms against virtually all classes of antibiotic compounds. For example, there are virulent strains of the human pathogens Staphylococcus and Streptococcus that can now be treated with but a single antibiotic, vancomycin, and resistance to this compound will require only the transfer of a single gene, vanA, from resistant Enterococcus species for this to occur. (Bateson et al., System. Appl. Microbiol, 12, 1989). When this crucial need for novel antibacterial compounds is superimposed on the growing demand for enzyme inhibitors, immunosuppressants and anti-cancer agents it becomes readily apparent why pharmaceutical companies have stepped up their screening of microbial diversity for bioactive compounds with novel properties.

[009] It has been estimated that to date less than one percent of the world's organisms have been cultured. It has been suggested that a large fraction of this diversity thus far has been unrecognized due to difficulties in enriching and isolating microorganisms in pure culture. Therefore, it has been difficult or impossible to identify or isolate valuable proteins, from these samples. These limitations suggest the need for alternative approaches to obtain genomic DNA and characterize the physiological and metabolic potential, i.e. activities of interest of as-yet uncultivated microorganisms, which to date have been characterized solely by analyses of PCR amplified rRNA gene fragments, clonally recovered from mixed assemblage nucleic acids.

[0010] Current methods of PCR amplification involve the use of two primers which hybridize to the regions flanking a nucleic acid sequence of interest such that DNA replication initiated at the primers will replicate the nucleic acid sequence of interest. By separating the replicated strands from the template strand with a denaturation step, another round of replication using the same primers can lead to geometric amplification of the nucleic acid sequence of interest. A variant of PCR amplification, termed whole genome PCR, involves the use of random or partially random primers to amplify the entire genome of an organism in the same PCR reaction. This technique relies on having a sufficient number of primers of random or partially random sequence such that pairs of primers will hybridize throughout the genomic DNA at moderate intervals. Replication initiated at the primers can then result in replicated strands overlapping sites where another primer can hybridize. By subjecting the genomic sample to multiple amplification cycles, the genomic sequences will be amplified.

[0011] However, PCR amplification has the disadvantage that the amplification reaction cannot proceed continuously and must be carried out by subjecting the nucleic acid sample to multiple cycles in a series of reaction conditions. These reaction conditions often rely on cycling at high temperatures, which may cause degradation of long pieces of DNA. The multiple random amplification cycles, as used in whole genome PCR, can also be a disadvantage because of potential amplification of the products made in previous cycles, instead of randomly amplifying the original sequence. Further, enzymes currently used in PCR amplification cannot proceed along long genomic pieces of DNA (i.e., 40kb and larger). Thus, amplification of entire genomes for use in large insert libraries is not possible using standard techniques.

[0012] Recent developments provide new methods of amplification of target nucleic acid sequences and whole genomes or other highly complex nucleic acid samples. U.S. Patent No. 6,124,120, herein incorporated by reference, teaches Whole Genome Strand Displacement Amplification, in which a set of primers having random or partially random nucleotide sequences is used to randomly prime a sample of genomic nucleic acid. By choosing a sufficiently large set of primers of random or mostly random sequence, the primers in the set will be collectively, and randomly, complementary to nucleic acid sequences distributed throughout nucleic acid in the sample. Amplification proceeds by replication with a processive polymerase initiated at each primer and continuing until spontaneous termination. Similarly, U.S. Patent No. 5,001,050, herein incorporated by reference, teaches amplification methods of very large fragments of DNA using Rolling Circle Amplification for circular templates. However, the teachings of both inventions disclose methods of amplifying nucleic acid from a single organism.

[0013] Previously, whole genome amplification from the gDNA of an isolate has been performed on Xylella fastidiosa using (RCA) on 1000 cells. (See Detter, et al., Isothermal Strand-Displacement Amplification Applications for High-Throughput Genomics, Genomics, Vol. 80, No.6 (Decmeber 2002), incorporated by reference herein in its entirety.)

[0014] Methods for isothermal amplification of whole genomes were previously been described. (See Lage, et al., Whole Genome Analysis of Genetic Alterations in Small DNA Samples Using Hyperbranched Strand Displacement Amplification and Array-CGH, Genome Research, 13:294-307 (2003)., herein incorporated by reference in its entirety.)

[0015] Therefore, the need exists for alternative approaches to obtain and amplify trace amounts of whole genomic DNA derived from at least one organism, and characterize the physiological and metabolic potential, i.e. activities of interest of as-yet uncultivated microorganisms from extreme and/or contaminated environments, clonally recovered from mixed assemblage nucleic acids.

SUMMARY OF THE INVENTION

[0016] The present invention provides a novel approach to obtain and amplify trace amounts of whole genomic DNA derived from a plurality of organisms. In accordance with one aspect of the present invention, environmental samples that do not contain enough DNA for analysis by traditional methods are subject to multiple displacement amplification to enable the recovery of substantially the whole genomic DNA represented and to characterize as to physiological and metabolic potential. [0017] More particularly, one aspect of the invention provides a process for making a gene library from trace amounts of DNA derived from a plurality of species of organisms comprising obtaining trace amounts of cDNA, gDNA, or genomic DNA fragments from a plurality of species of organisms, amplifying the cDNA, gDNA, or genomic DNA fragments, and ligating the cDNA, gDNA, or genomic DNA fragments to a DNA vector to generate a library of constructs in which genes are contained in the cDNA, gDNA, or genomic DNA fragments.

[0018] The organisms are uncultured organisms from environmental samples. The environmental sample may contain contaminated soil wherein only trace amounts of DNA exist. The organisms may be extremophiles such as thermophiles, hyperthermophiles, psychrophiles, phsychrotrophs, halophiles, alkalophiles, and acidophiles. In one aspect of this invention, the organisms comprise a mixture of terrestrial microorganisms or marine organisms, or a mixture of terrestrial microorganisms and marine microorganisms.

[0019] Another aspect of the invention provides a process of screening clones having DNA recovered from a plurality of species of uncultivated organisms having trace amounts of DNA for a specified protein, e.g. enzyme, activity which process comprises: screening for a specified protein, e.g. enzyme, activity in a library of clones prepared by: (i) recovering trace amounts of DNA from a DNA population derived from a plurality of species of uncultivated microorganisms; (ii) amplifying the trace amounts of DNA; and (iii) transforming a host with DNA to produce a library of clones which are screened for the specified protein, e.g. enzyme, activity.

[0020] The library is produced from DNA that is recovered without culturing of an organism, particularly where the DNA is recovered from an environmental sample containing organisms that are not or cannot be cultured and having trace amounts of DNA.

[0021] Preferably, the trace amounts of DNA are recovered without culturing of an organism, and are recovered from extreme and/or contaminated environmental samples containing organisms which are not or cannot be cultured.

[0022] In a preferred embodiment DNA is ligated into a vector, particularly wherein the vector further comprises expression regulatory sequences that can control and regulate the production of a detectable protein, e.g. enzyme, activity from the ligated DNA. [0023] The f-factor (or fertility factor) in E. coli is a plasmid which effects high frequency transfer of itself during conjugation and less frequent transfer of the bacterial chromosome itself. To achieve and stably propagate large DNA fragments from mixed microbial samples, a particularly preferred embodiment is to use a cloning vector containing an f-factor origin of replication to generate genomic libraries that can be replicated with a high degree of fidelity. When integrated with DNA from a mixed uncultured environmental sample, this makes it possible to achieve large genomic fragments in the form of a stable "environmental DNA library."

[0024] In another preferred embodiment, double stranded DNA obtained from the uncultivated DNA population is selected by: converting the double stranded genomic

DNA into single stranded DNA; recovering from the converted single stranded DNA single stranded DNA which specifically binds, such as by hybridization, to a probe DNA sequence; and converting recovered single stranded DNA to double stranded DNA.

[0025] The probe may be directly or indirectly bound to a solid phase by which it is separated from single stranded DNA which is not hybridized or otherwise specifically bound to the probe.

[0026] The process can also include releasing single stranded DNA from said probe after recovering said hybridized or otherwise bound single stranded DNA and amplifying the single stranded DNA so released prior to converting it to double stranded DNA.

[0027] The invention also provides a process of screening clones having DNA from uncultivated microorganisms for a specified protein, e.g. enzyme, activity which comprises screening for a specified gene cluster protein product activity in the library of clones prepared by: (i) recovering DNA from a DNA population derived from a plurality of uncultivated microorganisms; (ii) amplifying the recovered DNA; and (iii) transforming a host with recovered DNA to produce a library of clones with the screens for the specified protein, e.g. enzyme, activity. In one aspect of this invention, the trace amounts of DNA are recovered from the microorganisms. In another aspect, very few cells of the microorganisms are available within the environmental sample.

[0028] The library is produced from gene cluster DNA that is recovered without culturing of an organism, particularly where the DNA gene clusters are recovered from an environmental sample containing organisms that are not or cannot be cultured and having trace amounts of DNA. [0029] Preferably, the trace amounts of DNA are recovered without culturing of an organism, and are recovered from extreme and/or contaminated environmental samples containing organisms that are not or cannot be cultured.

[0030] Alternatively, double-stranded gene cluster DNA obtained from the uncultivated DNA population is selected by converting the double-stranded genomic gene cluster DNA into single-stranded DNA; recovering from the converted single-stranded gene cluster polycistron DNA, single-stranded DNA which specifically binds, such as by hybridization, to a polynucleotide probe sequence; and converting recovered single- stranded gene cluster DNA to double-stranded DNA.

[0031] In one aspect of the present invention, is provided a method for amplifying a

DNA template from trace amounts of DNA derived from a plurality of species of organisms comprising: obtaining trace amounts of cDNA, gDNA, or genomic DNA fragments from a plurality of species of organisms; preparing a template from said cDNA, gDNA, or genomic DNA fragments; and amplifying the template.

[0032] In another aspect, the invention provides a method for amplifying a DNA template from trace amounts of DNA derived from a plurality of species of organisms comprising: obtaining trace amounts of cDNA, gDNA, or genomic DNA fragments from a plurality of species of organisms; preparing a circular template from said cDNA, gDNA, or genomic DNA fragments; and amplifying the template.

[0033] In another aspect, the invention provides a method for making a DNA template from trace amounts of DNA isolated from trace amounts of DNA from a mixed population of uncultivated cells comprising: encapsulating individually, in a microenvironment, a plurality of cells from a mixed population of uncultivated cells; creating a template from said cDNA, gDNA, or genomic DNA fragments; and amplifying the template.

[0034] The methods of the present invention also find use for DNA, including ancient DNA, forensic DNA, pre-fragmented, degraded DNA (UN, chemical, oxygen, peroxide, and photochemical exposure, among others).

[0035] In one aspect, the invention provides a method for making a gene library from trace amounts of DΝA derived from a plurality of species of organisms comprising: (a) amplifying a substantial portion of the cD A, gDΝA, or genomic DΝA fragments, wherein said amplifying is by multiple strand displacement amplification (MDA); and (b) ligating the cDNA, gDNA, or genomic DNA fragments to a DNA vector to generate a library of constructs in which genes are contained in the cDNA, gDNA, or genomic DNA fragments.

[0036] In another aspect, the organisms comprise uncultured organisms.

Alternatively, the organisms are derived from an environmental sample. In another aspect, the organisms are derived from a contaminated environmental sample. In an additional aspect, the organisms comprise a mixture of terrestrial microorganisms or marine microorganisms, or a mixture of terrestrial microorganisms and marine microorganisms. In another aspect, the organisms are extremophiles. In one aspect, the extremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

[0037] In one aspect of the invention, the cDNA or genomic fragments comprise at least an operon, or portions thereof, of the donor microorganisms. In another aspect, the operon encodes a complete or partial metabolic pathway.

[0038] In another aspect, the invention provides a method of screening clones having DNA recovered from trace amounts of DNA derived from a plurality of species of uncultivated organisms, for a specified protein activity, which method comprises: (a) amplifying the trace amounts of DNA by multiple strand displacement amplification

(MDA); (b) transforming a host cell with the DNA of (b) to produce a library of clones which is screened for the specified protein activity; and (c) screening for a specified protein activity in a library of clones prepared by recovering trace amounts of DNA from a DNA population derived from a plurality of species of organisms. In one aspect, the DNA is ligated into a vector prior to transforming the host cell. In another aspect, the vector comprises at least one DNA sequence capable of regulating production of a detectable enzyme activity from said DNA. In another aspect, the vector into which the DNA has been ligated is used to transform a host cell. The organisms can be derived from an environmental sample or derived from a contaminated environmental sample. In one aspect, the organisms of the invention are extremophiles. In another aspect, the extremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles. [0039] In another aspect, a gene library can be made from trace amounts of DNA isolated from trace amounts of DNA from a mixed population of uncultivated cells comprising: a) encapsulating individually, in a microenvironment, a plurality of cells from a mixed population of uncultivated cells; b) placing the encapsulated cells in a growth column; c)incubating the encapsulated cells in the growth column under conditions allowing the encapsulated cells to grow into a microcolonies contaimng trace amounts of DNA; d) sorting the encapsulated microcolonies; e) amplifying the trace amounts of DNA by multiple strand displacement amplification; and f) ligating the amplified DNA to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

[0040] In one aspect, the cells are derived from an environmental sample. In another aspect, the cells are derived from a contaminated environmental sample. In one aspect, the cells are extremophiles. The extremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

[0041] In another embodiment, the invention provides a method for amplifying a

DNA template from trace amounts of DNA derived from a plurality of species of organisms comprising: a) preparing a template from said cDNA, gDNA, or genomic DNA fragments; wherein trace amounts of cDNA, gDNA, or genomic DNA fragments are obtained from a plurality of species of organism; and b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection.

[0042] In one aspect, the method further comprises fragmenting the template. In one aspect, trace amounts of cDNA, gDNA, or genomic DNA fragments are partially or completely digested. In another aspect, the template fragmentation is achieved by enzymatic, chemical, photometric, mechanical or any means that provides segments. In one embodiment, the enzymatic fragmentation comprises use of a DNase or a restriction enzyme. In an alternate embodiment, mechanical means comprises use of a shearing means. In another aspect, the method further comprises filling DNA ends by polymerase extension, hi one aspect, the template is diluted to a degree sufficient to obtain substantially self-ligated products in the presence of ligase and ligase buffer. In one embodiment, the template is circular. In another aspect, substantially self-ligated products are used in said amplifying step. In a preferred embodiment, a phi29 polymerase is used in the amplifying step. In one aspect, the organisms comprise uncultured organisms. The organism can be derived from an environmental sample or from a contaminated environmental sample.

[0043] In one aspect, the organisms comprise a mixture of terrestrial microorganisms or marine microorganisms, or a mixture of terrestrial microorganisms and marine microorganisms. In another embodiment, the organism is an extremophile. In another aspect, the extremophile comprises one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

[0044] In one aspect, the cDNA or genomic fragments comprise at least an operon, or portions thereof, of the donor microorganisms. In a preferred aspect, the operon encodes a complete or partial metabolic pathway.

[0045] In one aspect, the method of the present invention provides repeating the amplifying step. This can be done in an iterative manner.

[0046] In another aspect, the invention provides a method for amplifying a DNA template from trace amounts of DNA derived from a plurality of species of organism comprising: a) preparing a circular template from said cDNA, gDNA, or genomic DNA fragments; c) amplifying the template of step b) by multiple strand displacement amplification (MDA) to provide sufficient DNA to detect; and d) ligating the amplified DNA of step c) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

[0047] In another aspect, the invention provides a method for amplifying one or more DNA templates contained in a DNA sample derived from a plurality of species of organism , wherein at least one DNA template is in trace amounts, comprising: a) preparing a template from said cDNA, gDNA, or genomic DNA fragments; wherein trace amounts of cDNA, gDNA, or genomic DNA fragments are obtained from a plurality of species of organism; and b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection. [0048] In another aspect, the invention provides a method for making a DNA template from trace amounts of DNA isolated from trace amounts of DNA from a mixed population of uncultivated cells comprising: a) encapsulating each of a plurality of cells from a mixed population of uncultivated cells, in a microenvironment, wherein said cells contain cDNA, gDNA or genomic DNA fragments; b) preparmg a template from said cDNA, gDNA, or genomic DNA fragments; c) amplifying the DNA of step b) by multiple strand displacement amplification (MDA); and d)ligating the amplified DNA of step c) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

[0049] In one aspect, the template is fragmented. In another aspect, the fragments are partially or completely digested. In one aspect, the template fragmentation is achieved by enzymatic, chemical, photometric, mechanical or any means that provides segments. In another aspect, the enzymatic fragmentation comprises use of a DNAse or a restriction enzyme. In an alternate aspect, mechanical fragmentation comprises use of a shearing means. The DNA ends can be filled by polymerase extension. In another aspect, the template is diluted to a degree sufficient to obtain substantially self-ligated products in the presence of ligase and ligase buffer. The substantially self-ligated products are used in said amplifying step. In a preferred embodiment, phi29 polymerase is used in said amplifying step. In one aspect, the cells are derived from an environmental sample or from a contaminated environmental sample. In one aspect, the cells are an extremophile. The extremophile comprises one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles. In one aspect, the microenvironment has trace amounts of cells from at least one species of organism.

[0050] In one aspect, the amplifying step is performed by polymerase amplification. In another aspect, the invention provides a method for amplifying a DNA template from trace amounts of DNA derived from a plurality of species of organism comprising: a)preparing a template from said cDNA, gDNA, or genomic DNA fragments, wherein the cDNA, gDNA, or genomic DNA fragments are trace amounts from a plurality of species of organism; b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection; and c) ligating the amplified DNA of step b) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

[0051] In another aspect of the present invention biopanning, normalizing, ligating into a vector, directly transforming host cell, mutagenizing, expression screening, making a library, selection screen, sequencing, and/or any combination thereof of the may be performed on the amplified nucleic acid. In one aspect, the sequencing is shotgun sequencing. In another aspect, the clones for sequencing are selected without prior probing or screening.

[0052] In other aspects, the method of the invention further comprises biopanning, normalizing the amplified nucleic acid. In another aspect, the invention further comprises obtaining a sequence or a plurality of sequences, assembling two or more sequences to form a more competent sequence or genome or fragment thereof. Another aspect of the present invention provides searching the sequence in a database.

[0049] The methods of the present invention apply these techniques to samples of large strands of DNA from a plurality of species invites potential under representation of all of the genomes present in a sample. The sample may contain mixed populations of cultured or uncultured organisms from the environment.

[0050] These and other aspects of the present invention are described with respect to particular preferred embodiments and will be apparent to those skilled in the art from the teachings herein.

BRIEF DESCRIPTION OF THE FIGURES

[0051] The following drawings are illustrative of embodiments of the invention and are not meant to limit the scope of the invention as encompassed by the claims.

[0052] Figure 1 illustrates the protocol used in the cell sorting method of the invention to screen for a polynucleotide of interest, in this case using a (library excised into E. coli). The clones of interest are isolated by sorting.

[0053] Figure 2 shows a microtiter plate where clones or cells are sorted in accordance with the invention. Typically one cell or cells grown within a microdroplet are dispersed per well and grown up as clones.

[0054] Figure 3 depicts a co-encapsulation assay. Cells containing library clones are co-encapsulated with a substrate or labeled oligonucleotide. Encapsulation can occur in a variety of means, including GMDs, liposomes, and ghost cells. Cells are screened via high throughput screening on a fluorescence analyzer.

[0055] Figure 4 depicts a side scatter versus forward scatter graph of FACS sorted gel-microdroplets (GMDs) containing a species of Streptomyces which forms unicells.

Empty gel-microdroplets are distinguished from free cells and debris, also.

[0056] Figure 5 is a depiction of a FACS/Biopanning method described herein and described in Example 3, below.

[0057] Figure 6A shows an example of dimensions of a capillary array of the invention. Figure 6B illustrates an array of capillary arrays.

[0058] Figure 7 shows a top cross-sectional view of a capillary array.

[0059] Figure 8 is a schematic depicting the excitation of and emission from a sample within the capillary lumen according to one aspect of the invention.

[0060] Figure 9 is a schematic depicting the filtering of excitation and emission light to and from a sample within the capillary lumen according to an alternative aspect of the invention.

[0061] Figure 10 illustrates an aspect of the invention in which a capillary array is wicked by contacting a sample containing cells, and humidified in a humidified incubator followed by imaging and recovery of cells in the capillary array. [0062] Figure 11 illustrates a method for incubating a sample in a capillary tube by an evaporative and capillary wicking cycle.

[0063] Figure 12A shows a portion of a surface of a capillary array on which condensation has formed. Figure 12B shows the portion of the surface of the capillary array, depicted in Figure 12A, in which the surface is coated with a hydrophobic layer to inhibit condensation near an end of individual capillaries.

[0064] Figures 13A, 13B and 13C depict a method of retaining at least two components within a capillary.

[0065] Figure 14A depicts capillary tubes containing paramagnetic beads and cells. Figure 14B depicts the use of the paramagnetic beads to stir a sample in a capillary tube.

[0066] Figure 15 depicts an excitation apparatus for a detection system according to an aspect of the invention.

[0067] Figure 16 illustrates a system for screening samples using a capillary array according to an aspect of the invention.

[0068] Figure 17A illustrates one example of a recovery technique useful for recovering a sample from a capillary array. In this depiction a needle is contacted with a capillary containing a sample to be obtained. A vacuum is created to evacuate the sample from the capillary tube and onto a filter. Figure 17B illustrates one sample recovery method in which the recovery device has an outer diameter greater than the inner diameter of the capillary from which a sample is being recovered. Figure 17C illustrates another sample recovery method in which the recovery device has an outer diameter approximately equal to or less than the inner diameter of the capillary. Figure 17D shows the further processing of the sample once evacuated from the capillary.

[0069] Figure 18 is a schematic showing high throughput enrichment of low copy gene targets.

[0070] Figure 19 is a schematic of FACS-Biopanning using high throughput culturing. Polyketide synthase sequences from environmental samples are shown in the alignment.

[0071] Figure 20 shows whole cell hybridization for biopanning. [0072] Figure 21 is a schematic showing co-encapsulation of a eukaryotic cell and a bacterial cell.

[0073] Figure 22 illustrates a whole cell hybridization schematic for biopanning and FACS sorting.

[0074] Figure 23 shows a schematic of T7 RNA Polymerase Expression system.

[0075] Figure 24 is a schematic summarizing an exemplary protocol to determine the optimal growth medium for a broad diversity of organisms, as described in detail in Example 18, below.

[0076] Figure 25 is an illustration of a light scattering signature of microcolonies as detected and separated by flow cytometry, as described in detail in Example 18, below.

[0077] Figures 26a, 26b and 26c are schematic drawings summarizing the characterization of clones (microcolonies) from organisms found and isolated by a method of the invention and analyzed by 16S rRNA gene sequence analysis, as described in detail in Example 18, below. Figure 26d is an illustration of a picture of a culture designated as strain GMDJE10E6, as described in detail in Example 18, below.

[0078] Figure 27 is a schematic drawing for a recombinant clone which has been characterized in Tier 1 as hydrolase and in Tier 2 as amide, which may then be tested in Tier 3 for various specificities.

[0079] Figures 28 and 29 are schematic drawings for a recombinant clone which has been characterized in Tier 1 as hydrolase and in Tier 2 as ester which may then be tested in

Tier 3 for various specificities.

[0080] Figure 30 is a schematic drawing for a recombinant clone which has been characterized in Tier 1 as hydrolase and in Tier 2 as acetal which may then be tested in Tier 3 for various specificities.

[0081] Figure 31 is a schematic diagram of the procedure used to amplify trace amounts of environmental gDNA.

[0082] Figure 32 is a table showing the results from using extracted gDNA as template, the template concentration lower limit was tested by serial dilutions. The MDA reaction gave no product yield below 10,000 cells (genomes). Using the Cut/ Ligate method of template preparation, there was MDA reaction product from as little as 2 cells (genomes). Using the Re-amplification method, it was shown that there was substantial product yield from straight, extracted gDNA from 1000 cells (genomes).

[0083] Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION OF THE INVENTION

[0084] The methods of the present invention provide a novel approach to obtain and amplify trace amounts of whole genomic DNA derived from a plurality of organisms. In accordance with one aspect of the present invention, environmental samples that do not contain enough DNA for analysis by traditional methods are subject to multiple displacement amplification to enable the whole genomic DNA to be recovered and characterized as to physiological and metabolic potential.

[0085] This invention differs from multiple displacement amplification (MDA) and rolling circle amplification (RCA), as normally performed, in several aspects. Previously,

MDA and RCA have been employed to expedite and simplify amplification of nucleic acid derived from single organisms. The DNA molecule is annealed with a primer molecule able to hybridize to it. The annealed mixture is incubated in a vessel containing four different deoxynucleoside triphosphates, a DNA polymerase, and one or more DNA synthesis terminating agents, which terminated DNA synthesis at a specific nucleotide base. The DNA products are then separated according to size. The DNA polymerase catalyzes primer extension and strand displacement in a processive strand displacement polymerization reaction. Use of a strand displacing DNA polymerase allows the reaction to proceed as long as desired in an isothermal reaction, while generating molecules of up to 60,000 nucleotides or larger.

[0086] In one aspect of the invention, the gDNA from E. coli was diluted to five cells (approximately 25 picograms), then amplified using the method of the present invention. The five cell amplification product showed greater yield than the no-DNA negative control by agarose gel electrophoresis (1% agarose).

[0087] In accordance with another aspect of the present invention, novel high throughput cultivation methods based on the combination of a single cell encapsulation procedure with flow cytometry that enables cells to grow with nutrients that are present at environmental concentrations are combined with the novel amplification methods to provide access to trace amounts of DNA within microcolonies for further analysis.

[0088] In a preferred embodiment, prior to amplification the gDNA is fragmented and then ligated to form self-ligated products. The DNA fragmentation can be achieved by enzymatic, chemical, photometric, mechanical (shearing) or any means that provides segments. Any enzymes used for fragmentation are then heat-inactivated. The DNA ends may be filled in using a DNA polymerase. The fragmented DNA is diluted to a degree sufficient to obtain substantially self-ligated products in the presence of ligase and ligase buffer. Any enzymes used for ligation are then heat-inactivated. The ligated products are added as template to the amplification reaction. At any step, the gDNA, fragmented DNA, or ligated DNA may be cleaned utilizing techniques known in the art.

[0089] Using extracted gDNA as template, the template concentration lower limit was tested by serial dilutions. The MDA reaction gave no product yield below 10,000 cells (genomes). Using the Cut/ Ligate method of template preparation, there was MDA reaction product from as little as 2 cells (genomes). (Figure 32).

[0090] Amplification of nucleic acid from multiple organisms can be performed by mixing a set of random or partially random primers with a genomic sample from a mixed population of organisms to produce a primer-target sample mixture in a buffer solution. The mixture is incubated under conditions that promote hybridization between the primers and the genomic DNA in the primer-target sample mixture. The random or partially random primers may be modified by techniques known in the art. For example, A DNA polymerase is then added to produce a polymerase-target sample mixture, and incubated under conditions that promote replication of the genomic DNA. Strand displacement replication is preferably accomplished by using a strand displacing DNA polymerase or a DNA polymerase in combination with a compatible strand displacement factor.

[0091] In one embodiment of the present invention, the percent of DNA amplified comprises at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the genome from the sample.

[0092] In another aspect of the invention, the amplification step may be repeated ("re- amplification" method) one or more times to achieve higher product yield. This is accomplished by using the reaction product as template for subsequent reactions. Some or all of the reaction is added together with additional reaction components and incubated for one or more hours. The addition of some or all of the reaction to additional reaction components, and incubation for one or more hours, may be done one or more times.

[0093] Using the re-amplification method, it was shown that there was substantial product yield from straight, extracted gDNA from 1000 cells (genomes). The considerable amount of product from 1000 cells shows that it should be possible to use the re- amplification method on lower template concentrations. (Figure 32).

[0094] Preferred strand displacing DNA polymerases are large fragment Bst DNA polymerase (Exo(-)Bst), exo(-)Bca DNA polymerase, the DNA polymerase of the bacteriophage Φ29 and Sequenase.

[0095] In a preferred embodiment, the amplification buffer comprises: 10 mM

MgCl₂, 5-10 mM(NH₄)₂SO₄, 30-50 mM Tris-HCl; or 30-50 mM Tris-acetate, 50-70 mM Potassium-acetate, and 10 mM Mg-acetate. Optionally, the amplification buffer may also include any combination of 50 mM KC1, 4 mM Dithiothreitol, 1 unit/mL yeast pyrophosphatase, and 0.1 mg/mL BSA.

[0096] The present invention provides a method for rapid sorting and screening of libraries derived from trace amounts of DNA derived from a mixed population of organisms from, for example, an environmental sample or an uncultivated population of organisms. In one aspect, gene libraries are generated, clones are either exposed to a substrate or substrate(s) of interest, or hybridized to a fluorescence labeled probe having a sequence corresponding to a sequence of interest and positive clones are identified and isolated via fluorescence activated cell sorting. Cells can be viable or non-viable during the process or at the end of the process, as nucleic acids encoding a positive activity can be isolated and cloned utilizing techniques well known in the art.

[0097] This invention differs from fluorescence activated cell sorting, as normally performed, in several aspects. Previously, FACS machines have been employed in studies focused on the analyses of eukaryotic and prokaryotic cell lines and cell culture processes. FACS has also been utilized to monitor production of foreign proteins in both eukaryotes and prokaryotes to study, for example, differential gene expression. The detection and counting capabilities of the FACS system have been applied in these examples. However, FACS has never previously been employed in a discovery process to screen for and recover bioactivities in prokaryotes. In addition, non-optical methods have not been used to identify or discover novel bioactivities or biomolecules. Furthermore, the present invention does not require cells to survive, as do previously described technologies, since the desired nucleic acid (recombinant clones) can be obtained from alive or dead cells. For example, the cells only need to be viable long enough to contain, carry or synthesize a complementary nucleic acid sequence to be detected, and can thereafter be either viable or non-viable cells so long as the complementary sequence remains intact. The present invention also solves problems that would have been associated with detection and sorting of E. coli expressing recombinant enzymes, and recovering encoding nucleic acids. The invention includes within its aspects apparatus capable of detecting a molecule or marker that is indicative of a bioactivity or biomolecule of interest, including optical and non- optical apparatus.

[0098] In one aspect, the present invention includes within its aspects any apparatus capable of detecting fluorescent wavelengths associated with biological material, such apparatuses are defined herein as fluorescent analyzers (one example of which is a FACS apparatus).

[0099] I the methods of the invention, use of a culture-independent approach to directly clone genes encoding novel enzymes from, for example, an environmental sample containing trace amounts of DNA derived from a mixed population of organisms allows one to access untapped resources of biodiversity. In one aspect, the invention is based on the construction of "mixed population libraries" which represent the collective genomes of naturally occurring organisms archived in cloning vectors that can be propagated in suitable prokaryotic hosts. Because the cloned DNA is initially extracted directly from environmental samples, the libraries are not limited to the small fraction of prokaryotes that can be grown in pure culture. Additionally, a normalization of the DNA present in these samples could allow more equal representation of the DNA from all of the species present in the original sample. This can increase the efficiency of finding interesting genes from minor constituents of the sample which may be under-represented by several orders of magnitude compared to the dominant species.

[00100] Prior to the present invention, the evaluation of complex mixed population expression libraries was rate limiting. The present invention allows the rapid screening of complex mixed population libraries, containing, for example, genes from thousands of different organisms. The benefits of the present invention can be seen, for example, in screening a complex mixed population sample. Screening of a complex sample previously required one to use labor intensive methods to screen several million clones to cover the genomic biodiversity. The invention represents an extremely high-throughput screening method which allows one to assess this enormous number of clones. The method disclosed herein allows the screening anywhere from about 30 million to about 200 million clones per hour for a desired nucleic acid sequence or biological activity. This allows the thorough screening of mixed population libraries for clones expressing novel biomolecules.

[00101] The invention provides methods and compositions whereby one can screen, sort or identify a polynucleotide sequence, polypeptide, or molecule of interest from a mixed population of organisms (e.g., organisms present in a mixed population sample) based on polynucleotide sequences present in the sample. Thus, the invention provides methods and compositions useful in screening organisms for a desired biological activity or biological sequence and to assist in obtaining sequences of interest that can further be used in directed evolution, molecular biology, biotechnology and industrial applications. By screening and identifying the nucleic acid sequences present in the sample, the invention increases the repertoire of available sequences that can be used for the development of diagnostics, therapeutics or molecules for industrial applications. Accordingly, the methods of the invention can identify novel nucleic acid sequences encoding protems or polypeptides having a desired biological activity.

[00102] In one aspect, the invention provides a method for high throughput culturing of organisms. In another aspect, the organisms are a mixed population of organisms. In another aspect, organisms comprise a minute amount of cells. In another aspect, trace amounts of DNA are derived from the mixed population of organisms. In another aspect, the organisms include host cells of a library containing nucleic acids. For example, such libraries include nucleic acid obtained from various isolates of organisms, which are then pooled; nucleic acid obtained from isolate libraries, which are then pooled; or nucleic acids derived directly from a mixed population of organisms. Generally, a sample containing the organisms is mixed with a composition that can form a microenvironment, as described herein, e.g., a gel microdroplet or a liposome, among others. In one aspect, a mixed population of microorganisms is mixed with the encapsulation material in such a way that preferably fewer than 5 microorganisms are encapsulated. Preferably, only one microorganism is encapsulated in each microenvironment system.

[00103] Once encapsulated, the cells are cultured in a manner which allows growth of the organisms, e.g., host cells of a library. For example, Example 9 provides growth of the encapsulated organisms in a chromatography column which allows a flow of growth medium providing nutrients for growth and for removal of waste products from cells. Over a period of time (20 minutes to several weeks or months), a clonal population (i.e., microcolony) of the preferably one organism grows within the microenvironment.

[00104] After a desired period of time, microenvironments, e.g., gel microdroplets, may be sorted to eliminate "empty" microenvironments and to sort for the occupied microenvironments. The nucleic acid from organisms in the sorted microenvironments can be studied directly, for example, by treating with a PCR mixture and amplified immediately after sorting. In one Example described herein, 16S rRNA genes from individual cells were studied and organisms assessed for phylogenetic diversity from the samples. If only trace amounts of DNA are derived from the microcolony, the nucleic acid is amplified by multiple displacement amplification.

[00105] hi another aspect, the high throughput culturing methods of the invention allow culturing of organisms and enrichment of low copy gene targets. For example, a library of nucleic acid obtained from various isolates of organisms, which are then pooled; nucleic acid obtained from isolate libraries, which are then pooled; or nucleic acids derived directly from a mixed population of organisms, for example, are encapsulated, e.g., in a gel microdroplet or other microenvironment, and grown under conditions which allow clonal expansion of each organism in the microenvironment. In one aspect, the cells of the microcolony are lysed and treated with proteinases to yield nucleic acid (see

Figures) (e.g., the microcolonies are de-proteinized by incubating gel microdroplets in lysis solution containing proteinase K at 37 degrees C for 30 minutes). In order to denature and neutralize nucleic acid entrapped in the microenvironments, they are denatured with alkaline denaturing solution (0.5M NaOH) and neutralized (e.g., with Tris pH8). ha one particular example, nucleic acid entrapped in the microenvironment is hybridized with Digoxiginin (DIG)-labeled oligonucleotides (30-50 nt) in Dig Easy Hyb (available from Roche) overnight at 37 degrees C, followed by washing with 0.3xSSC and O.lxSSC at 38-50 degrees C to achieve desired stringency. One of skill in the art will appreciate that this is merely an example and not meant to limit the invention in any way. For example, other labels commonly used in the art, e.g., fluorescent labels such as GFP or chemiluminescent labels, may be utilized in the invention methods.

[00106] The nucleic acid is hybridized with a probe which is preferably labeled. A signal can be amplified with a secondary label (e.g., fluorescent) and the nucleic acid sorted for fluorescent microenvironments, e.g., gel microdroplets. Nucleic acid that is fluorescent can be isolated and further studied or cloned into a host cell for further manipulation, hi one particular example, signals are amplified with Tyramide Signal Amplification™ (TSA) kit commercially available from Molecular Probe. TSA is an enzyme-mediated signal amplification method that utilizes horseradish peroxidase (HRP) to depose fluorogenic tyramide molecules and generate high-density labeling of a target nucleic acid sequence in situ. The signal amplification is conferred by the turnover of multiple tyramide substrates per HRP molecule, and increases in signal strength of over 1, 000-fold have been reported. The procedure involves incubating GMDs with anti-DIG conjugated horseradish peroxidase (anti-DIG-HRP) (Roche, IN) for 3 hours at room temperature. Then the tyramide substrate solution will be added and incubated for 30 minutes at room temperature (RT).

[00107] In one aspect, this high throughput culturing method followed by sorting (e.g., FACS) screening (e.g., biopanning), allows for identification of gene targets. It may be desirable to screen for nucleic acids encoding virtually any protein or any bioactivity and to compare such nucleic acids among various species of organisms in a sample (e.g., study polyketide sequences from a mixed population). In another aspect, nucleic acid derived from high throughput culturing of organisms can be obtained for further study or for generation of a library. Such nucleic acid can be pooled and a library created, or alternatively, individual libraries from clonal populations (i.e., microcolonies) of organisms can be generated and then nucleic acid pooled from those libraries to generate a more complex library. The libraries generated as described herein can be utilized for the discovery of biomolecules (e.g., nucleic acid or bioactivities) or for evolving nucleic acid molecules identified by the high throughput culturing methods described in the present invention.

[00108] Such evolution methods are known in the art or described herein, such as, shuffling, cassette mutagenesis, recursive ensemble mutagenesis, sexual PCR, directed evolution, exonuclease-mediated reassembly, codon site-saturation mutagenesis, amino acid site-saturation mutagenesis, gene site saturation mutagenesis, introduction of mutations by non-stochastic polynucleotide reassembly methods, synthetic ligation polynucleotide reassembly, gene reassembly, oligonucleotide-directed saturation mutagenesis, in vivo reassortment of polynucleotide sequences having partial homology, naturally occurring recombination processes which reduce sequence complexity, and any combination thereof. [00109] Flow cytometry has been used in cloning and selection of variants from existing cell clones. This selection, however, has required stains that diffuse through cells passively, rapidly and irreversibly, with no toxic effects or other influences on metabolic or physiological processes. Since, typically, flow sorting has been used to study animal cell culture performance, physiological state of cells, and the cell cycle, one goal of cell sorting has been to keep the cells viable during and after sorting.

[00110] There currently are no reports in the literature of screening and discovery of polynucleotide sequence in libraries by cell sorting based on fluorescence (e.g. fluorescent activated cell sorting), or non-optical markers (e.g., magnetic fields and the like). Furthermore there are no reports of recovering DNA encoding bioactivities screened by

FACS or non-optical techniques and additionally screening for a bioactivity of interest. The present invention provides these methods to allow the extremely rapid screening of viable or non-viable cells to recover desirable activities and the nucleic acid encoding those activities.

[00111] Different types of encapsulation (e.g., gel microdroplet) strategies and compounds or polymers may be used with the present invention. For instance, a non- limiting example is a high temperature agarose, which may be employed for making microdroplets stable at high temperatures, allowing stable encapsulation of cells subsequent to heat-kill steps utilized to remove all background activities when screening for thermostable bioactivities. Encapsulation may be in beads, high temperature agaroses, gel microdroplets, cells, such as ghost red blood cells or macrophages, liposomes, or any other means of encapsulating and localizing molecules.

[00112] For example, methods of preparing liposomes have been described (i.e., U.S. Patent No.'s 5,653,996, 5393530 and 5,651,981), as well as the use of liposomes to encapsulate a variety of molecules U.S. Patent No.'s 5,595,756, 5,605,703, 5,627,159,

5,652,225, 5,567,433, 4,235,871, 5,227,170). Entrapment of proteins, viruses, bacteria and DNA in erythrocytes during endocytosis has been described, as well (Journal of Applied Biochemistry 4, 418-435 (1982)). Erythrocytes employed as carriers in vitro or in vivo for substances entrapped during hypo-osmotic lysis or dielectric breakdown of the membrane have also been described (reviewed in Hiler, G. M. (1983) J. Pharm. Ther).

These techniques are useful in the present invention to encapsulate samples for screening. [00113] "Microenvironment", as used herein, is any molecular structure which provides an appropriate environment for facilitating the interactions necessary for the method of the invention. An environment suitable for facilitating molecular interactions include, for example, gel microdroplets, agarose noodles, ghost cells, macrophages, liposomes, or any other method known in the art for encapsulation. A microenvironment may also be a structure such as a microdroplet, wherein a cell is encapsulated inside the microdroplet and otherwise treated so as to mimic the cell's natural environment and shaped and designed for use in the methods of the invention.

[00114] Liposomes can be prepared from a variety of lipids including phospholipids, glycolipids, steroids, long-chain alkyl esters; e.g., alkyl phosphates, fatty acid esters; e.g., lecithin, fatty amines and the like. A mixture of fatty material may be employed such a combination of neutral steroid, a charge amphiphile and a phospholipid. Illustrative examples of phospholipids include lecithin, sphingomyelin and dipalmitoylphos- phatidylcholine. Representative steroids include cholesterol, cholestanol and lanosterol. Representative charged amphiphilic compounds generally contain from 12-30 carbon atoms. Mono- or dialkyl phosphate esters, or alkyl amines; e.g., dicetyl phosphate, stearyl amine, hexadecyl amine, dilauryl phosphate, and the like.

[00115] The invention methods include a system and method for holding and screening samples. According to one aspect of the invention, a sample screening apparatus includes a plurality of capillaries formed into an array of adjacent capillaries, wherein each capillary comprises at least one wall defining a lumen for retaining a sample. The apparatus further includes interstitial material disposed between adjacent capillaries in the array, and one or more reference indicia formed within of the interstitial material. (See co-pending U.S. patent applications serial nos. 09/687,219 and 09/894,956).

[00116] According to another aspect of the invention, a capillary for screening a sample, wherein the capillary is adapted for being bound in an array of capillaries, includes a first wall defining a lumen for retaining the sample, and a second wall formed of a filtering material, for filtering excitation energy provided to the lumen to excite the sample.

[00117] h another aspect of the invention, a method for incubating a bioactivity or biomolecule of interest includes the steps of introducing a first component into at least a portion of a capillary of a capillary array, wherein each capillary of the capillary array comprises at least one wall defining a lumen for retaining the first component, and introducing an air bubble into the capillary behind the first component. The method further includes the step of introducing a second component into the capillary, wherein the second component is separated from the first component by the air bubble. [00118] In one aspect of the invention, a method of incubating a sample of interest includes introducing a first liquid labeled with a detectable particle into a capillary of a capillary array, wherein each capillary of the capillary array comprises at least one wall defining a lumen for retaining the first liquid and the detectable particle, and wherein the at least one wall is coated with a binding material for binding the detectable particle to the at least one wall. The method further includes removing the first liquid from the capillary tube, wherein the bound detectable particle is maintained within the capillary, and introducing a second liquid into the capillary tube.

[00119] Another aspect of the invention includes a recovery apparatus for a sample screening system, wherein the system includes a plurality of capillaries formed into an array. The recovery apparatus includes a recovery tool adapted to contact at least one capillary of the capillary array and recover a sample from the at least one capillary. The recovery apparatus further includes an ejector, connected with the recovery tool, for ejecting the recovered sample from the recovery tool.

Definitions [00120] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which the invention belongs. Although any methods, devices and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the methods, devices and materials are now described.

[00121] As used herein and in the appended claims, the singular forms "a," "and," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a clone" includes a plurality of clones and reference to "the nucleic acid sequence" generally includes reference to one or more nucleic acid sequences and equivalents thereof known to those skilled in the art, and so forth. [00122] An "amino acid" is a molecule having the structure wherein a central carbon atom (the β-carbon atom) is linked to a hydrogen atom, a carboxylic acid group (the carbon atom of which is referred to herein as a "carboxyl carbon atom"), an amino group (the nitrogen atom of which is referred to herein as an "amino nitrogen atom"), and a side chain group, R. When incorporated into a peptide, polypeptide, or protein, an amino acid loses one or more atoms of its amino acid carboxylic groups in the dehydration reaction that links one amino acid to another. As a result, when incorporated into a protein, an amino acid is referred to as an "amino acid residue."

[00123] "Protein" or "polypeptide" refers to any polymer of two or more individual amino acids (whether or not naturally occurring) linked via a peptide bond, and occurs when the carboxyl carbon atom of the carboxylic acid group bonded to the β-carbon of one amino acid (or amino acid residue) becomes covalently bound to the amino nitrogen atom of amino group bonded to the β-carbon of an adjacent amino acid. The term "protein" is understood to include the terms "polypeptide" and "peptide" (which, at times may be used interchangeably herein) within its meaning. In addition, proteins comprising multiple polypeptide subunits (e.g., DNA polymerase HI, RNA polymerase H) or other components

(for example, an RNA molecule, as occurs in telomerase) will also be understood to be included within the meaning of "protein" as used herein. Similarly, fragments of proteins and polypeptides are also within the scope of the invention and may be referred to herein as "proteins."

[00124] A particular amino acid sequence of a given protein (i.e., the polypeptide's "primary structure," when written from the amino-terminus to carboxy-terminus) is determined by the nucleotide sequence of the coding portion of a mRNA, which is in turn specified by genetic information, typically genomic DNA (including organelle DNA, e.g., mitochondrial or chloroplast DNA). Thus, determining the sequence of a gene assists in predicting the primary sequence of a corresponding polypeptide and more particular the role or activity of the polypeptide or proteins encoded by that gene or polynucleotide sequence.

[00125] The term "isolated" means altered "by the hand of man" from its natural state; i.e., if it occurs in nature, it has been changed or removed from its original environment, or both. For example, a naturally occurring polynucleotide or a polypeptide naturally present in a living animal, a biological sample or an environmental sample in its natural state is not "isolated", but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is "isolated", as the term is employed herein. Such polynucleotides, when introduced into host cells in culture or in whole organisms, still would be isolated, as the term is used herein, because they would not be in their naturally occurring form or environment. Similarly, the polynucleotides and polypeptides may occur in a composition, such as a media formulation (solutions for introduction of polynucleotides or polypeptides, for example, into cells or compositions or solutions for chemical or enzymatic reactions).

[00126] "Uncultured," "uncultivatable," "non-cutltured," or "non-culturable" organism as used herein is meant that an organism is known to exist, but has not been isolated in pure culture or isolated as a defined co-culture (syntrophic cultures).

[00127] "Polynucleotide" or "nucleic acid sequence" refers to a polymeric form of nucleotides. Li some instances a polynucleotide refers to a sequence that is not immediately contiguous with either of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally occurring genome of the organism from which it is derived. The term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences. The nucleotides of the invention can be ribonucleotides, deoxy-ribonucleotides, or modified forms of either nucleotide. A polynucleotides as used herein refers to, among others, single-and double- stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions. In addition, polynucleotide as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. The term polynucleotide encompasses genomic DNA or RNA (depending upon the organism, i.e., RNA genome of viruses), as well as mRNA encoded by the genomic DNA, and cDNA.

[00128] The term "trace" means an extremely small but detectable quantity. When used in conjunction with DNA (e.g., "trace amount of DNA"), it is meant to describe DNA in quantities not suitable for analysis by traditional methods such as by sequencing and/or library construction. For example, two E. coli cells weigh approximately 10 picograms, while 1000 E. coli cells weigh approximately 5 nanograms. When used in conjunction with cells (e.g., "trace amount of cells"), it is meant to describe approximately 1-1000 cells, which may also be called a "microcolony" if the cells were cultured from a single cell. Trace amounts of DNA or cells may also describe the amount of at least one species in the environmental sample or the environmental sample as a whole.

[00129] In one embodiment, the methods of the present invention are suitable for use in environmental samples where 1, 2, 3, 4, less than 5, less than 10, less than 100, less than 1000 cells of any one species is present in the sample.

[00130] In another embodiment, the methods of the present invention may be used when there is 0.1 - 200 million femtograms of any one organism present in an environmental sample. One skilled in the art would understand that the complexity of an organism's genome as compared to E. coli, for example, would require more DNA to obtain a full representation of the organism's genome.

[00131] The term "fragment," "fragments," and the grammatical equivalents thereof as used herein means a segment of sufficient size to allow ligation of a nucleic acid sequence into a circle by any method know in the art.

[00132] By rapidly screening for polynucleotides encoding polypeptides of interest, the invention provides not only a source of materials for the development of biologies, therapeutics, and enzymes for industrial applications, but also provides a new materials for further processing by, for example, directed evolution and mutagenesis to develop molecules or polypeptides modified for particular activity or conditions.

[00133] The invention is used to obtain and identify polynucleotides and related sequence specific information from, for example, infectious microorganisms present in the environment such as, for example, in the gut of various macroorganisms.

[00134] In another aspect, the methods and compositions of the invention provide for the identification of lead drug compounds present in an environmental sample. The methods of the invention provide the ability to mine the environment for novel drugs or identify related drugs contained in different microorganisms. There are several common sources of lead compounds (drug candidates), including natural product collections, synthetic chemical collections, and synthetic combinatorial chemical libraries, such as nucleotides, peptides, or other polymeric molecules that have been identified or developed as a result of environmental mining. Each of these sources has advantages and disadvantages. The success of programs to screen these candidates depends largely on the number of compounds entering the programs, and pharmaceutical companies have to date screened hundred of thousands of synthetic and natural compounds in search of lead compounds. Unfortunately, the ratio of novel to previously-discovered compounds has diminished with time. The discovery rate of novel lead compounds has not kept pace with demand despite the best efforts of pharmaceutical companies. There exists a strong need for accessing new sources of potential drug candidates. Accordingly, the invention provides a rapid and efficient method to identify and characterize environmental samples that may contain novel drug compounds.

[00135] The invention provides methods of identifying a nucleic acid sequence encoding a polypeptide having either known or unknown function. For example, much of the diversity in microbial genomes results from the rearrangement of gene clusters in the genome of microorganisms. These gene clusters can be present across species or phylogenetically related with other organisms.

[00136] For example, bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are involved in related processes. The genes are clustered, in structures referred to as "gene clusters," on a single chromosome and are transcribed together under the control of a single regulatory sequence, including a single promoter which initiates transcription of the entire cluster. The gene cluster, the promoter, and additional sequences that function in regulation altogether are referred to as an "operon" and can include up to 20 or more genes, usually from 2 to 6 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually as to their function. Gene clusters are generally 15 kb to greater than 120 kb in length.

[00137] Some gene families consist of identical members. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication is generated to adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Sometimes no significance is discemable in a repetition of a particular gene. A principal example of this is the expressed duplicate insulin genes in some species, whereas a single insulin gene is adequate in other mammalian species. [00138] Further, gene clusters undergo continual reorganization and, thus, the ability to create heterogeneous libraries of gene clusters from, for example, bacterial or other prokaryote sources is valuable in determining sources of novel proteins, particularly including enzymes such as, for example, the polyketide synthases that are responsible for the synthesis of polyketides having a vast array of useful activities. Other types of proteins that are the product(s) of gene clusters are also contemplated, including, for example, antibiotics, antivirals, antitumor agents and regulatory proteins, such as insulin.

[00139] As an example, polyketide synthases enzymes fall in a gene cluster.

Polyketides are molecules which are an extremely rich source of bioactivities, including antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many polyketides (produced by polyketide synthases) are valuable as therapeutic agents.

Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a huge variety of carbon chains differing in length and patterns of functionality and cyclization. Polyketide synthase genes fall into gene clusters and at least one type (designated type I) of polyketide synthases have large size genes and enzymes, complicating genetic manipulation and in vitro studies of these genes/proteins.

[00140] The ability to select and combine desired components from a library of polyketides and postpolyketide biosynthesis genes for generation of novel polyketides for study is appealing. The method(s) of the present invention make it possible to, and facilitate the cloning of, novel polyketide synthases, since one can generate gene banks with clones containing large inserts (especially when using the f-factor based vectors), which facilitates cloning of gene clusters.

[00141] Other biosynthetic genes include NRPS, glycosyl transferases and p450s. For example, a gene cluster can be ligated into a vector containing an expression regulatory sequences which can control and regulate the production of a detectable protein or protein- related array activity from the ligated gene clusters. Use of vectors which have an exceptionally large capacity for exogenous nucleic acid introduction are particularly appropriate for use with such gene clusters and are described by way of example herein to include artificial chromosome vectors, cosmids, and the f-factor (or fertility factor) of E. coli. For example, the f-factor of E. coli is a plasmid which affects high-frequency transfer of itself during conjugation and is ideal to achieve and stably propagate large nucleic acid fragments, such as gene clusters from samples of mixed populations of organisms.

[00142] The trace amounts of DNA isolated or derived from these microorganisms can preferably be amplified then inserted into a vector prior to probing for selected DNA. Such vectors are preferably those containing expression regulatory sequences, including promoters, enhancers and the like. Such polynucleotides can be part of a vector and/or a composition and still be isolated, in that such vector or composition is not part of its natural environment. Particularly preferred phages or plasmids, and methods for introduction and packaging into them, are described in detail in the protocol set forth herein.

[00143] The invention provides novel systems to clone and screen mixed populations of organisms present, for example, in environmental samples, for polynucleotides of interest, enzymatic activities and bioactivities of interest in vitro. The method(s) of the invention allow the cloning and discovery of novel bioactive molecules in vitro, and in particular novel bioactive molecules derived from uncultivated or cultivated samples.

Large size gene clusters, genes and gene fragments can be cloned, sequenced and screened using the method(s) of the invention. Unlike previous strategies, the method(s) of the invention allow one to clone, screen and identify polynucleotides and the polypeptides encoded by these polynucleotides in vitro from a wide range of mixed population samples.

[00144] The invention allows one to screen for and identify polynucleotide sequences from complex mixed population samples. DNA libraries obtained from trace amounts of DNA from these samples may be created from cell free samples, so long as the sample contains nucleic acid sequences, or from samples containing cellular organisms or viral particles. The organisms from which the libraries may be prepared include prokaryotic microorganisms, such as Eubacteria and Archaebacteria, lower eukaryotic microorganisms such as fungi, algae and protozoa, as well as plants, plant spores and pollen. The organisms may be cultured organisms or uncultured organisms obtained from mixed population environmental samples, including extremophiles, such as thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

[00145] Sources of nucleic acids used to construct a DNA library can be obtained from mixed population samples, such as, but not limited to, microbial samples obtained from Arctic and Antarctic ice, water or permafrost sources, materials of volcanic origin, materials from soil or plant sources in tropical areas, droppings from various organisms including mammals, invertebrates, dead and decaying matter, contaminated soil samples such as from radioactive waste sites and toxic spill sites, etc. Thus, for example, nucleic acids may be recovered from either a cultured or non-cultured organism and used to produce an appropriate DNA library (e.g., a recombinant expression library) for subsequent determination of the identity of the particular polynucleotide sequence or screening for bioactivity

[00146] The following outlines a general procedure for producing libraries from both culturable and non-culturable organisms as well as mixed population of organisms, which libraries can be probed, sequenced or screened to select therefrom nucleic acid sequences having an identified, desired or predicted biological activity (e.g., an enzymatic activity or a small molecule).

[00147] As used herein a mixed population sample is any sample containing organisms or polynucleotides or a combination thereof, which can be obtained from any number of sources (as described above), including, for example, insect feces, soil, water, etc. Any source of nucleic acids in purified or non-purified form can be utilized as starting material. Thus, the nucleic acids may be obtained from any source which is contaminated by an organism or from any sample containing cells. The mixed population sample can be an extract from any bodily sample such as blood, urine, spinal fluid, tissue, vaginal swab, stool, amniotic fluid or buccal mouthwash from any mammalian organism. For non- mammalian (e.g., invertebrates) organisms the sample can be a tissue sample, salivary sample, fecal material or material in the digestive tract of the organism. An environmental sample also includes samples obtained from extreme environments including, for example, hot sulfur pools, volcanic vents, and frozen tundra. In addition, the sample can come from a variety of sources. For example, in horticulture and agricultural testing the sample can be a plant, fertilizer, soil, liquid or other horticultural or agricultural product; in food testing the sample can be fresh food or processed food (for example infant formula, seafood, fresh produce and packaged food); and in environmental testing the sample can be liquid, soil, sewage treatment, sludge and any other sample in the environment which is considered or suspected of containing an organism or polynucleotides.

[00148] When the sample is a mixture of material (e.g., a mixed population of organisms), for example, blood, soil and sludge, it can be treated with an appropriate reagent which is effective to open the cells and expose or separate the strands of nucleic acids. Mixed populations can comprise pools of cultured organisms or samples. For example, samples of organisms can be cultured prior to analysis in order to purify a particular population and thus obtaining a purer sample. Organisms, such as actinomycetes or myxobacteria, known to produce bioactivities of interest can be enriched for, via culturing. Culturing of organisms in the sample can include culturing the organisms in microdroplets and separating the cultured microdroplets with a cell sorter into individual wells of a multi-well tissue culture plate from which further processing may be performed.

[00149] The sample may comprise nucleic acids from, for example, a diverse and mixed population of organisms (e.g., microorganisms present in the gut of an insect).

When present in trace amounts, the DNA is subject to multiple displacement amplification. Nucleic acids may then be isolated from the sample using any number of methods for DNA and RNA isolation. Such nucleic acid isolation methods are commonly performed in the art. Where the nucleic acid is RNA, the RNA can be reversed transcribed to DNA using primers known in the art. Where the DNA is genomic DNA, the DNA can be sheared using, for example, a 25 gauge needle, or by other such mechanical methods of shearing known in the art.

[00150] The nucleic acids can be cloned into a vector. Cloning techniques are known in the art or can be developed by one skilled in the art, without undue experimentation. Vectors used in the present invention include: plasmids, phages, cosmids, phagemids, viruses (e.g., retroviruses, parainfluenzavirus, herpesviruses, reoviruses, paramyxoviruses, and the like), artificial chromosomes, or selected portions thereof (e.g., coat protein, spike glycoprotein, capsid protein). For example, cosmids and phagemids are typically used where the specific nucleic acid sequence to be analyzed or modified is large because these vectors are able to stably propagate large polynucleotides.

[00151] The vector containing the cloned DNA sequence may then be amplified by plating (i.e., clonal amplification) or transfecting a suitable host cell with the vector (e.g., a phage on an E. coli host). Alternatively (or subsequently to amplification), the cloned DNA sequence is used to prepare a library for screening by transforming a suitable organism. Hosts, known in the art are transformed by artificial introduction of the vectors containing the target nucleic acid by inoculation under conditions conducive for such transformation. One could transform with double stranded circular or linear nucleic acid or there may also be instances where one would transform with single stranded circular or linear nucleic acid sequences. By transform or transformation is meant a permanent or transient genetic change induced in a cell following incorporation of new DNA (i.e., DNA exogenous to the cell). Where the cell is a mammalian cell, a permanent genetic change is generally achieved by introduction of the DNA into the genome of the cell. A transformed cell or host cell generally refers to a cell (e.g., prokaryotic or eukaryotic) into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a DNA molecule not normally present in the host organism.

[00152] A particularly preferred type of vector for use in the invention contains an f- factor origin replication. The f-factor (or fertility factor) in E. coli is a plasmid which effects high frequency transfer of itself during conjugation and less frequent transfer of the bacterial chromosome itself. In a particular aspect cloning vectors referred to as "fosmids" or bacterial artificial chromosome (BAC) vectors are used. These are derived from E. coli f-factor which is able to stably integrate large segments of DNA. When integrated with DNA from a mixed uncultured mixed population sample, this makes it possible to achieve large genomic fragments in the form of a stable "mixed population nucleic acid library."

[00153] The nucleic acids derived from a mixed population or sample may be inserted into the vector by a variety of procedures. In general, the nucleic acid sequence is inserted into an appropriate restriction endonuclease site(s) by procedures known in the art. Such procedures and others are deemed to be within the scope of those skilled in the art. A typical cloning scenario may have the DNA "blunted" with an appropriate nuclease (e.g.,

Mung Bean Nuclease), methylated with, for example, ΕcoR I Methylase and ligated to ΕcoR I linkers. The linkers are then digested with an ΕcoR I Restriction Endonuclease and the DNA size fractionated (e.g., using a sucrose gradient). The resulting size fractionated DNA is then ligated into a suitable vector for sequencing, screening or expression (e.g., a lambda vector and packaged using an in vitro lambda packaging extract).

[00154] Transformation of a host cell with recombinant DNA may be carried out by conventional techniques as are well known to those skilled in the art. Where the host is prokaryotic, such as E. coli, competent cells which are capable of DNA uptake can be prepared from cells harvested after exponential growth phase and subsequently treated by the CaCl₂ method by procedures well known in the art. Alternatively, MgCl₂ or RbCl can be used. Transformation can also be performed after forming a protoplast of the host cell or by electroporation. Transformation of Pseudomonas fluorescens and yeast host cells can be achieved by electroporation, using techniques described herein.

[00155] When the host is a eukaryote, methods of transfection or transformation with DNA include conjugation, calcium phosphate co-precipitates, conventional mechanical procedures such as microinjection, electroporation, insertion of a plasmid encased in liposomes, or virus vectors, as well as others known in the art, may be used. Eukaryotic cells can also be cotransfected with a second foreign DNA molecule encoding a selectable marker, such as the herpes simplex thymidme kinase gene. Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein. (Eukaryotic Viral

Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982). The eukaryotic cell may be a yeast cell (e.g., Saccharomyces cerevisiae), an insect cell (e.g., Drosophila sp.) or may be a mammalian cell, including a human cell.

[00156] Eukaryotic systems, and mammalian expression systems, allow for post- translational modifications of expressed mammalian proteins to occur. Eukaryotic cells which possess the cellular machinery for processing of the primary transcript, glycosylation, phosphorylation, and, advantageously secretion of the gene product should be used. Such host cell lines may include, but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK-293, and WI38.

[00157] After the gene libraries have been generated one can perform "biopanning" of the libraries prior to expression screening. The "biopanning" procedure refers to a process for identifying clones having a specified biological activity by screening for sequence homology in the library of clones, using at least one probe DNA comprising at least a portion of a DNA sequence encoding a polypeptide having the specified biological activity; and detecting interactions with the probe DNA to a substantially complementary sequence in a clone. Clones (either viable or non-viable) are then separated by an analyzer (e.g., a FACS apparatus or an apparatus that detects non-optical markers).

[00158] The probe DNA used to probe for the target DNA of interest contained in clones prepared from polynucleotides in a mixed population of organisms can be a full- length coding region sequence or a partial coding region sequence of DNA for a known bioactivity. The sequence of the probe can be generated by synthetic or recombinant means and can be based upon computer based sequencing programs or biological sequences present in a clone. The DNA library can be probed using mixtures of probes comprising at least a portion of the DNA sequence encoding a known bioactivity having a desired activity. These probes or probe libraries are preferably single-stranded. The probes that are particularly suitable are those derived from DNA encoding bioactivities having an activity similar or identical to the specified bioactivity which is to be screened.

[00159] In another aspect, a nucleic acid library from a mixed population of organisms is screened for a sequence of interest by transfecting a host cell containing the library with at least one labeled nucleic acid sequence which is all or a portion of a DNA sequence encoding a bioactivity having a desirable activity and separating the library clones containing the desirable sequence by optical- or non-optical-based analysis.

[00160] In another aspect, in vivo biopanning may be performed utilizing a

FACS-based machine. Complex gene libraries are constructed with vectors which contain elements which stabilize transcribed RNA. For example, the inclusion of sequences which result in secondary structures such as hairpins which are designed to flank the transcribed regions of the RNA would serve to enhance their stability, thus increasing their half life within the cell. The probe molecules used in the biopanning process consist of oligonucleotides labeled with reporter molecules that only fluoresce upon binding of the probe to a target molecule. Various dyes or stains well known in the art, for example those described in "Practical Flow Cytometry", 1995 Wiley-Liss, Inc., Howard M. Shapiro, M.D., can be used to intercalate or associate with nucleic acid in order to "label" the oligonucleotides. These probes are introduced into the recombinant cells of the library using one of several transformation methods. The probe molecules interact or hybridize to the transcribed target mRNA or DNA resulting in DNA/RNA heteroduplex molecules or DNA/DNA duplex molecules. Binding of the probe to a target will yield a fluorescent signal which is detected and sorted by the FACS machine during the screening process.

[00161] The probe DNA may be at least about 10 bases, or, at least 15 bases. Other size ranges for probe DNA are at least about 15 bases to about 100 bases, at least about 100 bases to about 500 bases, at least about 500 bases to about 1,000 bases, at least about 1,000 bases to about 5,000 bases and at least about 5,000 bases to about 10,000 bases. In one aspect, an entire coding region of one part of a pathway may be employed as a probe. Where the probe is hybridized to the target DNA in an in vitro system, conditions for the hybridization in which target DNA is selectively isolated by the use of at least one DNA probe will be designed to provide a hybridization stringency of at least about 50% sequence identity, more particularly a stringency providing for a sequence identity of at least about 70%. Hybridization techniques for probing a microbial DNA library to isolate target DNA of potential interest are well known in the art and any of those which are described in the literature are suitable for use herein. Prior to fluorescence sorting the clones may be viable or non-viable. For example, in one aspect, the cells are fixed with paraformaldehyde prior to sorting.

[00162] Once viable or non-viable clones containing a sequence substantially complementary to the probe DNA are separated by a fluorescence analyzer, polynucleotides present in the separated clones may be further manipulated. In some instances, it may be desirable to perform an amplification of the target DNA that has been isolated. In this aspect, the target DNA is separated from the probe DNA after isolation.

In one aspect, the clone can be grown to expand the clonal population. Alternatively, the host cell is lysed and the target DNA amplified. It is then amplified before being used to transform a new host (e.g., subcloning). Long PCR (Barnes, W M, Proc. Natl. Acad. Sci, USA, Mar. 15, 1994) can be used to amplify large DNA fragments (e.g., 35 kb). Numerous amplification methodologies are now well known in the art.

[00163] Where the target DNA is identified in vitro, the selected DNA is then used for preparing a library for further processing and screening by transforming a suitable organism. Hosts can be transformed by artificial introduction of a vector containing a target DNA by inoculation under conditions conducive for such transformation.

[00164] The resultant libraries (enriched for a polynucleotide of interest) can then be screened for clones which display an activity of interest. Clones can be shuttled in alternative hosts for expression of active compounds, or screened using methods described herein.

[00165] Having prepared a multiplicity of clones from DNA selectively isolated via hybridization technologies described herein, such clones are screened for a specific activity to identify clones having a specified characteristic.

[00166] The screening for activity may be affected on individual expression clones or may be initially affected on a mixture of expression clones to ascertain whether or not the mixture has one or more specified activities. If the mixture has a specified activity, then the individual clones may be re-screened for such activity or for a more specific activity. [00167] Prior to, subsequent to or as an alternative to the in vivo biopanning described above is an encapsulation technique such as GMDs or by microcapsules, which may be employed to localize at least one clone in one location for growth or screening by a fluorescent analyzer (e.g. FACS). The separated at least one clone contained in the GMD or microcapsule may then be cultured to expand the number of clones or screened on a

FACS machine to identify clones containing a sequence of interest as described above, which can then be broken out into individual clones to be screened again on a FACS machine to identify positive individual clones. Screening in this manner using a FACS machine is described in patent application Ser. No. 08/876,276, filed June 16, 1997. Thus, for example, if a clone has a desirable activity, then the individual clones may be recovered and re-screened utilizing a FACS machine to determine which of such clones has the specified desirable activity.

[00168] Further, it is possible to combine some or all of the above aspects such that a normalization step is performed prior to generation of the expression library, the expression library is then generated, the expression library so generated is then biopanned, and the biopanned expression library is then screened using a high throughput cell sorting and screening instrument. Thus there are a variety of options, including: (i) generating the library and then screening it; (ii) normalize the target DNA, generate the expression library and screen it; (iii) normalize, generate the library, biopan and screen; or (iv) generate, biopan and screen the library.

[00169] The library may, for example, be screened for a specified enzyme activity. For example, the enzyme activity screened for may be one or more of the six IUB classes; oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. The recombinant enzymes which are determined to be positive for one or more of the IUB classes may then be rescreened for a more specific enzyme activity.

[00170] Alternatively, the library may be screened for a more specialized protein, e.g. enzyme, activity. For example, instead of generically screening for hydrolase activity, the library may be screened for a more specialized activity, i.e. the type of bond on which the hydrolase acts. Thus, for example, the library may be screened to ascertain those hydrolases which act on one or more specified chemical functionalities, such as: (a) amide

(peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) acetals, i.e., glycosidases etc. 05/012550

[00171] As described with respect to one of the above aspects, the invention provides a process for activity screening of clones containing trace amounts of DNA derived from a mixed population of organisms or more than one organism.

[00172] Biopanning polynucleotides from a mixed population of organisms by separating the clones or polynucleotides positive for sequence of interest with a fluorescent analyzer that detects fluorescence, to select polynucleotides or clones contaimng polynucleotides positive for a sequence of interest, and screening the selected clones or polynucleotides for specified bioactivity. In one aspect, the polynucleotides are contained in clones having been prepared by recovering trace amounts of DNA of a plurality of microorganisms, which DNA is selected by hybridization to at least one DNA sequence which is all or a portion of a DNA sequence encoding a bioactivity having a desirable activity.

[00173] In another aspect, a DNA library derived from a plurality of microorganisms is subjected to a selection procedure to select therefrom DNA which hybridizes to one or more probe DNA sequences which is all or a portion of a DNA sequence encoding an activity having a desirable activity by contacting a DNA library with a fluorescent labeled DNA probe under conditions permissive of hybridization so as to produce a double- stranded complex of probe and members of the DNA library.

[00174] The present invention offers the ability to screen for many types of bioactivities. For instance, the ability to select and combine desired components from a library of polyketides and postpolyketide biosynthesis genes for generation of novel polyketides for study is appealing. The method(s) of the present invention make it possible to and facilitate the cloning of novel polyketide synthase genes and/or gene pathways, and other relevant pathways or genes encoding commercially relevant secondary metabolites, since one can generate gene banks with clones containing large inserts (especially when using vectors which can accept large inserts, such as the f-factor based vectors), which facilitates cloning of gene clusters.

[00175] The biopanning approach described above may be used to create libraries enriched with clones carrying sequences substantially homologous to a given probe sequence. Using this approach libraries containing clones with inserts of up to 40 kbp or larger can be enriched approximately 1,000 fold after each round of panning. This enables one to reduce the number of clones to be screened after 1 round of biopanning enrichment. This approach can be applied to create libraries enriched for clones carrying sequence of interest related to a bioactivity of interest, for example, polyketide sequences.

[00176] Hybridization screening using high density filters or biopanning has proven an efficient approach to detect homologues of pathways containing genes of interest to discover novel bioactive molecules that may have no known counterparts. Once a polynucleotide of interest is enriched in a library of clones it may be desirable to screen for an activity. For example, it may be desirable to screen for the expression of small molecule ring structures or "backbones". Because the genes encoding these polycyclic structures can often be expressed in E. coli, the small molecule backbone can be manufactured, even if in an inactive form. Bioactivity is conferred upon transferring the molecule or pathway to an appropriate host that expresses the requisite glycosylation and methylation genes that can modify or "decorate" the structure to its active form. Thus, even if inactive ring compounds, recombinantly expressed in E. coli are detected to identify clones which are then shuttled to a metabolically rich host, such as Streptomyces (^e-g-. Streptomyces diversae or venezuelae) for subsequent production of the bioactive molecule. It should be understood that E. coli can produce active small molecules and in certain instances it may be desirable to shuttle clones to a metabolically rich host for "decoration" of the structure, but not required. The use of high throughput robotic systems allows the screening of hundreds of thousands of clones in multiplexed arrays in microtiter dishes.

[00177] One approach to detect and enrich for clones carrying these structures is to use FACS screening, a procedure described and exemplified in U.S. Ser. No. 08/876,276, filed June 16, 1997. Polycyclic ring compounds typically have characteristic fluorescent spectra when excited by ultraviolet light. Thus, clones expressing these structures can be distinguished from background using a sufficiently sensitive detection method. High throughput FACS screening can be utilized to screen for small molecule backbones in, for example, E. coli libraries. Commercially available FACS machines are capable of screening up to 100,000 clones per second for UV active molecules. These clones can be sorted for further FACS screening or the resident plasmids can be extracted and shuttled to Streptomyces for activity screening.

[00178] In another aspect, a bioactivity or biomolecule or compound is detected by using various electromagnetic detection devices, including, for example, optical, magnetic and thermal detection associated with a flow cytometer. Flow cytometer typically use an optical method of detection (fluorescence, scatter, and the like) to discriminate individual cells or particles from within a large population. There are several non-optical technologies that could be used alone or in conjunction with the optical methods to enable new discrimination/screening paradigms.

[00179] Magnetic field sensing is one such techniques that can be used as an alternative or in conjunction with, for example, fluorescence based methods. Hall-Effect Sensors are one example of sensors that can be employed. Superconducting Quantum Interference Devices ("SQUIDS") are the most sensitive sensors for magnetic flux and magnetic fields, so far developed. A standardized criteria for the sensitivity of a SQUID is its energy resolution. This is defined as the smallest change in energy that the SQUID can detect in one second (or in a bandwidth of 1 Hz). Typical values are 10^"33 J/Hz. The utility of SQUIDS can be found in the presence of magnetosomes in certain types of bacterial that contain chains of permanent single magnetic domain particles of magnetite (FE₃O ) of gregite (Fe₃S ). The magnetic field (or residual magnetic field) of a cell that contains a magnetosome is detected by positioning a SQUID in close proximity to the flow stream of a flow cytometer. Using this method cells or cells containing, for example, magnetic probes can be isolated based on their magnetic properties. As another example, changes in the synthetic pathway of magnetosome containing bacteria can be measured using a similar technique. Such techniques can be used to identify agents which modulate the synthetic pathway of magnetosomes.

[00180] Measuring dynamic charge properties is another techniques that can be used as an alternative or in conjunction with, for example, fluorescence based methods. Multipole Coupling Spectroscopy ("MCS") directly measures the dynamic charge properties of systems without the need for labeling. Structural changes that occur when molecules interact result in representative changes in charge distribution, and these produce a dielectric based spectra or "signature" that reveals the affinity, specificity and functionality of each interaction. Similar changes in charge distribution occur in cellular systems. By observing the changes in these signatures, the dynamics of molecular pathways and cellular function can be resolved in their native conditions. MCS utilizes a small microwave (500 MHz to 50 GHz) transceiver that could be positioned in close proximity to the flow stream of a flow cytometer. Because of the short measurement times (e.g., microseconds) required, a complete MCS signature for each cell within the stream of a flow cytometer can be generated and analyzed. Certain cells can then be sorted and/or isolated based on either spectral features that are known a priori or based on some statistical variation from a general population. Examples of uses for this technique include selection of expression mutants, small molecule pre-screening, and the like.

[00181] In one screening approach, biomolecules from candidate clones can be tested for bioactivity by susceptibility screening against test organisms such as Staphylococcus aureus, Micrococcus luteus, E. coli, or Saccharomyces cerevisiae. FACS screening can be used in this approach by co-encapsulating clones with the test organism.

[00182] An alternative to the above-mentioned screening methods provided by the present invention is an approach termed "mixed extract" screening. The "mixed extract" screening approach takes advantage of the fact that the accessory genes needed to confer activity upon the polycyclic backbones are expressed in metabolically rich hosts, such as

Streptomyces, and that the enzymes can be extracted and combined with the backbones extracted from E. coli clones to produce the bioactive compound in vitro. Enzyme extract preparations from metabolically rich hosts, such as Streptomyces strains, at various growth stages are combined with pools of organic extracts from E. coli libraries and then evaluated for bioactivity. Another approach to detect activity in the E. coli clones is to screen for genes that can convert bioactive compounds to different forms. For example, a recombinant enzyme was recently discovered that can convert the low value daunomycin to the higher value doxorubicin. Similar enzyme pathways are being sought to convert penicillins to cephalosporins.

[00183] Screening may be carried out to detect a specified enzyme activity by procedures known in the art. For example, enzyme activity may be screened for one or more of the six IUB classes; oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. The recombinant enzymes which are determined to be positive for one or more of the IUB classes may then be rescreened for a more specific enzyme activity. Alternatively, the library may be screened for a more specialized enzyme activity. For example, instead of generically screening for hydrolase activity, the library may be screened for a more specialized activity, i.e. the type of bond on which the hydrolase acts. Thus, for example, the library may be screened to ascertain those hydrolases which act on one or more specified chemical functionalities, such as: (a) amide (peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) acetals, i.e., glycosidases.

[00184] FACS screening can also be used to detect expression of UV fluorescent molecules in any host, including metabolically rich hosts, such as Streptomyces. For example, recombinant oxytetracylin retains its diagnostic red fluorescence when produced heterologously in S. lividans TK24. Pathway clones, which can be sorted by FACS, can thus be screened for polycyclic molecules in a high throughput fashion.

[00185] Recombinant bioactive compounds can also be screened in vivo using "two- hybrid" systems, which can detect enhancers and inhibitors of protein-protein or other interactions such as those between transcription factors and their activators, or receptors and their cognate targets. In this aspect, both the small molecule pathway and the reporter construct are co-expressed. Clones altered in reporter expression can then be sorted by FACS and the pathway clone isolated for characterization.

[00186] As indicated, common approaches to drug discovery involve screening assays in which disease targets (macromolecules implicated in causing a disease) are exposed to potential drug candidates which are tested for therapeutic activity. In other approaches, whole cells or organisms that are representative of the causative agent of the disease, such as bacteria or tumor cell lines, are exposed to the potential candidates for screening purposes. Any of these approaches can be employed with the present invention. [00187] The present invention also allows for the transfer of cloned pathways derived from uncultivated samples into metabolically rich hosts for heterologous expression and downstream screening for bioactive compounds of interest using a variety of screening approaches briefly described above.

Recovering Desirable Bioactivities [00188] In one aspect, after viable or non-viable cells, each containing a different expression clone from the gene library are screened, and positive clones are recovered, DNA can be isolated from positive clones utilizing techniques well known in the art. The DNA can then be amplified either in vivo or in vitro by utilizing any of the various amplification techniques known in the art. In vivo amplification would include transformation of the clone(s) or subclone(s) into a viable host, followed by growth of the host. In vitro amplification can be performed using techniques such as the polymerase chain reaction. Once amplified the identified sequences can be "evolved" or sequenced.

Evolution [00189] hi one aspect, the present invention manipulates the identified polynucleotides to generate and select for encoded variants with altered activity or specificity. Clones found to have the bioactivity for which the screen was performed can be subjected to directed mutagenesis to develop new bioactivities with desired properties or to develop modified bioactivities with particularly desired properties that are absent or less pronounced in the wild-type activity, such as stability to heat or organic solvents. Any of the known techniques for directed mutagenesis are applicable to the invention. For example, mutagenesis techniques for use in accordance with the invention include those described below.

[00190] Alternatively, it may be desirable to variegate a polynucleotide sequence obtained, identified or cloned as described herein. Such variegation can modify the polynucleotide sequence in order to modify (e.g., increase or decrease) the encoded polypeptide's activity, specificity, affinity, function, etc. Such evolution methods are known in the art or described herein, such as, shuffling, cassette mutagenesis, recursive ensemble mutagenesis, sexual PCR, directed evolution, exonuclease-mediated reassembly, codon site-saturation mutagenesis, amino acid site-saturation mutagenesis, gene site saturation mutagenesis, introduction of mutations by non-stochastic polynucleotide reassembly methods, synthetic ligation polynucleotide reassembly, gene reassembly, oligonucleotide-directed saturation mutagenesis, in vivo reassortment of polynucleotide sequences having partial homology, naturally occurring recombination processes which reduce sequence complexity, and any combination thereof.

[00191] The clones enriched for a desired polynucleotide sequence, which are identified as described above, may be sequenced to identify the DNA sequence(s) present in the clone, which sequence information can be used to screen a database for similar sequences or functional characteristics. Thus, in accordance with the present invention it is possible to isolate and identify: (i) DNA having a sequence of interest (e.g., a sequence encoding an enzyme having a specified enzyme activity), (ii) associate the sequence with known or unknown sequence in a database (e.g., database sequence associated with an enzyme having an activity (including the amino acid sequence thereof)), and (iii) produce recombinant enzymes having such activity.

[00192] Sequencing may be performed by high through-put sequencing techniques. The exact method of sequencing is not a limiting factor of the invention. Any method useful in identifying the sequence of a particular cloned DNA sequence can be used. For example, genome sequencing can be determined by a random whole-genome shotgun method as described by Nelson. (See for example, Nelson et al., Nature 399, 323 (1999), hereby incorporated by reference in its entirety.) In general, sequencing is an adaptation of the natural process of DNA replication. Therefore, a template (e.g., the vector) and primer sequences are used. One general template preparation and sequencing protocol begins with automated picking of bacterial colonies, each of which contains a separate DNA clone which will function as a template for the sequencing reaction. The selected clones are placed into media, and grown overnight. The DNA templates are then purified from the cells and suspended in water. After DNA quantification, high-throughput sequencing is performed using a sequencer, such as Applied Biosystems, Inc., Prism 377 DNA Sequencers. The resulting sequence data can then be used in additional methods, including searching a database or databases.

Database Searches and Alignment Algorithms [00193] A number of source databases are available that contain either a nucleic acid sequence and/or a deduced amino acid sequence for use with the invention in identifying or determining the activity encoded by a particular polynucleotide sequence. All or a representative portion of the sequences (e.g., about 100 individual clones) to be tested are used to search a sequence database (e.g., GenBank, PFAM or ProDom), either simultaneously or individually. A number of different methods of performing such sequence searches are known in the art. The databases can be specific for a particular organism or a collection of organisms. For example, there are databases for the C. elegans, Arabadopsis. sp., M. genitalium, M. jannaschii, E. coli, H. influenzae, S. cerevisiae and others. The sequence data of the clone is then aligned to the sequences in the database or databases using algorithms designed to measure homology between two or more sequences.

[00194] Such sequence alignment methods include, for example, BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), and FASTA (Person & Lipman, 1988). The probe sequence (e.g., the sequence data from the clone) can be any length, and will be recognized as homologous based upon a threshold homology value. The threshold value may be predetermined, although this is not required. The threshold value can be based upon the particular polynucleotide length. To align sequences a number of different procedures can be used. Typically, Smith-Waterman or Needleman-Wunsch algorithms are used. However, as discussed faster procedures such as BLAST, FASTA, PSI-BLAST can be used.

[00195] For example, optimal alignment of sequences for aligning a comparison window may be conducted by the local homology algorithm of Smith (Smith and Waterman, Adv Appl Math, 1981; Smith and Waterman, J Teor Biol, 1981; Smith and Waterman, J Mol Biol, 1981; Smith et al, J Mol Evol, 1981), by the homology alignment algorithm of Needleman (Needleman and Wuncsch, 1970), by the search of similarity method of Pearson (Pearson and Lipman, 1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, WI, or the Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin, Madison, WI), or by inspection, and the best alignment (i.e., resulting in the highest percentage of homology over the comparison window) generated by the various methods is selected. The similarity of the two sequence (i.e., the probe sequence and the database sequence) can then be predicted.

[00196] Such software matches similar sequences by assigning degrees of homology to various deletions, substitutions and other modifications. The terms "homology" and "identity" in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same when compared and aligned for maximum correspondence over a comparison window or designated region as measured using any number of sequence comparison algorithms or by manual alignment and visual inspection.

[00197] For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.

[00198] A "comparison window", as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.

[00199] One example of an algorithm used in the methods of the invention is BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0).

The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a comparison of both strands.

[00200] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. USA 90:5873 (1993)). One measure of similarity provided by BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide sequences would occur by chance. For example, a nucleic acid is considered similar to a references sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.

[00201] Sequence homology means that two polynucleotide sequences are homologous (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. A percentage of sequence identity or homology is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence homology. This substantial homology denotes a characteristic of a polynucleotide sequence, wherein the polynucleotide comprises a sequence having at least 60 percent sequence homology, typically at least 70 percent homology, often 80 to 90 percent sequence homology, and most commonly at least 99 percent sequence homology as compared to a reference sequence of a comparison window of at least 25-50 nucleotides, wherein the percentage of sequence homology is calculated by comparing the reference sequence to the polynucleotide sequence which may include deletions or additions which total 20 percent or less of the reference sequence over the window of comparison.

[00202] Sequences having sufficient homology can then be further identified by any annotations contained in the database, including, for example, species and activity information. Accordingly, in a typical mixed population sample, a plurality of nucleic acid sequences will be obtained, cloned, sequenced and corresponding homologous sequences from a database identified. This information provides a profile of the polynucleotides present in the sample, including one or more features associated with the polynucleotide including the organism and activity associated with that sequence or any polypeptide encoded by that sequence based on the database information. As used herein "fingerprint" or "profile" refers to the fact that each sample will have associated with it a set of polynucleotides characteristic of the sample and the environment from which it was derived. Such a profile can include the amount and type of sequences present in the sample, as well as information regarding the potential activities encoded by the polynucleotides and the organisms from which polynucleotides were derived. This unique pattern is each sample's profile or fingerprint.

[00203] In some instances it may be desirable to express a particular cloned polynucleotide sequence once its identity or activity is determined or a demonstrated identity or activity is associated with the polynucleotide. In such instances the desired clone, if not already cloned into an expression vector, is ligated downstream of a regulatory control element (e.g., a promoter or enhancer) and cloned into a suitable host cell. Expression vectors are commercially available along with corresponding host cells for use in the invention.

[00204] As representative examples of expression vectors which may be used there may be mentioned viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, bacterial artificial chromosomes, viral nucleic acid (e.g., vaccinia, adenovirus, foul pox virus, pseudorabies and derivatives of SV40), PI -based artificial chromosomes, yeast plasmids, yeast artificial chromosomes, and any other vectors specific for specific hosts of interest (such as bacillus, Aspergillus, yeast, etc.) Thus, for example, the DNA may be included in any one of a variety of expression vectors for expressing a polypeptide. Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available. The following vectors are provided by way of example; ZAP Express, Lambda ZAP^®- CMV, Lambda ZAP ^® II , Lambda gtlO, Lambda gtll, pMyr, pSos, pCMV-Script, pCMV-Script XR, pBK Phagemid, pBK-CMV, pBK-RSV, pBluescript II Phagemid, pBluescript II KS +, pBluescript II SK +, pBluescript II SK -,

Lambda FIX II, Lambda DASH II, Lambda EMBL3 and EMBL4, EMBL3, EMBL4, SuperCos I and pWE15, pWE15, SuperCos I, pPCR-Script Amp, pPCR-Script Cam, pCMV-Script, pBC KS +, pBC KS -, pBC SK +, pBC SK -, psiX174, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); PT7BLUE, pSTBlue, pCITE, pET, ptriEx, pForce (Novagen); pIND-E, pIND Vector, pIND/Hygro, pIND(SPl)/Hygro, pIND/GFP, pIND(SPl)/GFP, pIND/V5-His and pIND(SPl)/V5-His Tag, pIND TOPO TA, pShooter™ Targeting Vectors, pTracer™ GFP Reporter Vectors, pcDNA© Vector Collection, EBV Vectors, Voyager™ VP22 Vectors, pVAXl - DNA vaccine vector, pcDNA4/His-Max, pBCl Mouse Milk System (Invitrogen); pQE70, pQE60, pQE-9, pQE- 16, pQE - 30 / pQE -80, pQE 31/ pQE 81, pQE -32/pQE 82, pQE - 40, pQE - 100

Double Tag (Qiagen); pTRC99a, pKK223-3, pKK233-3, pDR540, pRIT5, pWLNEO, pSV2CAT, pOG44, pXTl, pSG (Stratagene), pSVK3, pBPV, pMSG, pSVL (Pharmacia). However, any other plasmid or vector may be used as long as they are replicable and viable in the host.

[00205] The nucleic acid sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (promoter) to direct mRNA synthesis. Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL, SP6, trp, /αcUV5, PBAD, araBAD, araB, trc, proV, p-D-HSP, HSP, GAL4 UAS/Elb, TK, GAL1, CMV/TetO₂ Hybrid, EF-la CMV, EF-la CMV, EF-la CMV, EF, EF-la, ubiquitin C, rsv-ltr, rsv , b -lactamase, nmtl, and gallO. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression. Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.

[00206] In addition, the expression vectors can contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.

[00207] The nucleic acid sequence(s) selected, cloned and sequenced as hereinabove described can additionally be introduced into a suitable host to prepare a library which is screened for the desired enzyme activity. The selected nucleic acid is preferably already in a vector which includes appropriate control sequences whereby a selected nucleic acid encoding an enzyme may be expressed, for detection of the desired activity. The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a prokaryotic cell, such as a bacterial cell. The selection of an appropriate host is deemed to be within the scope of those skilled in the art from the teachings herein.

[00208] In some instances it may be desirable to perform an amplification of the nucleic acid sequence present in a sample or a particular clone that has been isolated. In this aspect the nucleic acid sequence is amplified by PCR reaction or multiple displacement amplification or similar reaction known to those of skill in the art.

Commercially available amplification kits are available to carry out such amplification reactions.

[00209] i addition, it is important to recognize that the alignment algorithms and searchable database can be implemented in computer hardware, software or a combination thereof. Accordingly, the isolation, processing and identification of nucleic acid sequences and the corresponding polypeptides encoded by those sequence can be implemented in and automated system.

[00210] Naked Biopanning involves the direct screening or enrichment for a gene or gene cluster from environmental genomic DNA. The enrichment for or isolation of the desired genomic DNA is performed prior to any cloning, gene-specific PCR or any other procedure that may introduce unwanted bias affecting downstream processing and applications due to toxicity or other issues. Several methodologies can be described for this type of sequence based discovery. These generally include the use of nucleic acid probe(s) that is(are) partially or completely homologous to the target sequence in conjunction with the binding of the probe-target complex to a solid phase support. The probe(s) may be polynucleotide or modified nucleic acid, such as peptide nucleic acid (PNA) and may be used with other facilitating elements such as proteins or additional nucleic acids in the capture of target DNA. An amplification step which does not introduce sequence bias may be used to ensure adequate yield for downstream applications.

[00211] An example of a Naked Biopanning approach can be found in the use of RecA protein and a complement-stabilized D-loop (csD-loop) structure (Jayasena & Johnston, 1993; Sena and Zarling, 1993) to target genomic DNA of interest. It does not involve complete denaturation of the target DNA and therefore is of particular interest when one is attempting to capture large genomic fragments. The following method incorporates the ClonCapture™ cDNA selection procedure (CLONTECH Laboratories, Inc.), with some modification, to take advantage of csD-loop formation, a stable structure which may be used to capture genomic DNA containing an internal target sequence.

[00212] Environmental genomic DNA is cleaved into fragments (fragment size depends upon type of target and desired downstream insert size if making a pre-enriched library) using mechanical shearing or restriction digest. Fragments are size selected according to desired length and purified. A biotinylated dsDNA probe is produced, based upon existing knowledge of conserved regions within the target, by PCR from a positive clone or by synthetic means. The probe can be internally (ex. incorporation of biotin 21- dCTP) or end labeled with biotin. It must be purified to remove any unincorporated biotin. The probe is heat denatured (5 min. at 95°C) and placed immediately on ice. The denatured probe is then reacted with RecA and an ATP mix containing ATP and a nonhydrolyzable analog (15 min. at 37°C). The target DNA is added and incubated with the RecA/biotinylated probe nucleofilaments to form the csD-loop structure (20 min. at 37°C). The RecA is then removed by treatment with proteinase K and SDS. After inactivating the proteinase K with PMSF, washed and blocked (with sonicated salmon sperm DNA) streptavidin paramagnetic beads are transfeired to the reaction and incubated to bind the csD-loop complex to the support (rotate 30 min. at room temp.). The unbound

DNA is removed and may be saved for use as target for a different probe. The beads are thoroughly washed and the enriched population is eluted using an alkaline buffer and transferred off. The enriched DNA is then ethanol precipitated and is ready for ligation and pre-enriched library preparation.

[00213] Other stable complexes may be used instead of the RecA/csD-loop structure for the capture of genomic DNA. For instance, PNAs may be used, either as "openers" to allow insertion of a probe into dsDNA (Bukanov et al., 1998), or as tandem probes themselves (Lohse et al., 1999). In the first case, PNAs bind to two short tracts of homopurines that are in close proximity to each other. They form P-loop structures, which displace the unbound strand and make it available for binding by a probe, which can then be used to capture the target using an affinity capture method involving a solid phase. Likewise, PNAs may be used in a "double-duplex invasion" to form a stable complex and allow target recovery.

[00214] Simpler methods may be used in the retrieval of targets from environmental genomic DNA that involve complete denaturation of the DNA fragments. After cutting genomic DNA into fragments of the desired length via mechanical shearing or through the use of restriction enzymes, the target DNA may be bound to a solid phase using a direct hybridization affinity capture scheme. A nucleic acid probe is covalently bound to a solid phase such as a glass slide, paramagnetic bead, or any type of matrix in a column, and the denatured target DNA is allowed to hybridize to it. The unbound fraction may be collected and re-hybridized to the same probe to ensure a more complete recovery, or to a host of different probes, as a part of a cascade scenario, where a population of environmental genomic DNA is subsequently panned for a number of different genes or gene clusters.

[00215] Linkers containing restriction sites and sites for common primers may be added to the ends of the genomic fragments using sticky-ended or blunt-ended ligations (depending upon the method used for cutting the genomic DNA). These enable one to amplify the size-selected inserted fragment population by PCR without significant sequence bias. Thus, after using any of the abovementioned techniques for isolation or enrichment, one may help to ensure adequate recovery for downstream processing. Furthermore, the recovered population is ready for cutting and ligation into a suitable vector as well as containing the priming sites for sequencing at any time.

[00216] A variation of the above scheme involves including a tag from a combinatorial synthesis of polynucleotide tags (Brenner et al., 1999) within the linker that is attached onto the ends of the genomic fragments. This allows each fragment within the starting population to have its own unique tag. Therefore, when amplified with common primers, each of these uniquely tagged fragments give rise to a multitude of in vitro clones which are then bound to the paramagnetic bead containing millions of copies of the complementary, covalently bound anti-tag. A fluorescently labeled, target specific probe may be subsequently hybridized to the target-containing beads. The beads may be sorted using FACS, where the positives may be sequenced directly from the beads and the insert may be cut out and ligated into the desired vector for further processing. The negative population may be hybridized with other probes and resorted as part of the cascade scenario previously described.

[00217] Transposon technology may allow the insertion of environmental genomic DNA into a host genome through the use of transposomes (Goryshin & Reznikoff, 1998) to avoid bias resulting from expression of toxic genes. The host cells are then cultured to provide more copies of target DNA for discovery, isolation, and downstream processes.

[00218] Host cells may be genetically engineered (transduced or transformed or transfected) with the vectors. The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transfonnants or amplifying genes. The culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan.

[00219] The clones which are identified as having the specified protein, e g. enzyme, activity may then be sequenced to identify the DNA sequence encoding an protein, e.g. enzyme, having the specified activity. Thus, in accordance with the present invention it is possible to isolate and identify: (i) DNA encoding an protein, e.g. enzyme, having a specified protein, e.g. enzyme, activity, (ii) protems, e.g. enzymes, having such activity (including the amino acid sequence thereof) and (iii) produce recombinant proteins, e.g. enzymes, having such activity.

[00220] The present invention may be employed for example, to identify uncultured microorganisms with proteins, e.g. enzymes, having, for example, the following activities which may be employed for the following uses:

1. Lipase/Esterase a. Enantioselective hydrolysis of esters (lipids)/ thioesters 1) Resolution of racemic mixtures 2) Synthesis of optically active acids or alcohols from mesodiesters b. Selective syntheses 1) Regiospecific hydrolysis of carbohydrate esters 2) Selective hydrolysis of cyclic secondary alcohols c. Synthesis of optically active esters, lactones, acids, alcohols 1) Transesterification of activated/nonactivated esters 2) Interesterification 3) Optically active lactones from hydroxyesters 4) Regio- and enantioselective ring opening of anhydrides d. Detergents e. Fat/Oil conversion f. Cheese ripening

Protease a. Ester/amide synthesis b. Peptide synthesis c. Resolution of racemic mixtures of amino acid esters d. Synthesis of non-natural amino acids e. Detergents/protein hydrolysis

Glycosidase/Glycosyl transferase a. Sugar/polymer synthesis b. Cleavage of glycosidic linkages to form mono, all-and oligosaccharides c. Synthesis of complex oligosaccharides d. Glycoside synthesis using UDP-galactosyl transferase e. Transglycosylation of disaccharides, glycosyl fluorides, aryl galactosides f. Glycosyl transfer in oligosaccharide synthesis g. Diastereoselective cleavage of p-glucosylsulfoxides h. Asymmetric glycosylations i. Food processing j . Paper processing

4. Phosphatase/Kinase a. Synthesis/hydrolysis of phosphate esters 1) Regio-, enantioselective phosphorylation 2) Introduction of phosphate esters 3) Synthesize phospholipid precursors 4) Controlled polynucleotide synthesis b. Activate biological molecule c. Selective phosphate bond formation without protecting groups

5. Mono/Dioxygenase a. Direct oxyfunctionalization of unactivated organic substrates b. Hydroxylation of alkanes, aromatics, steroids c. Epoxidation of alkenes d. Enantioselective sulphoxidation e. Regio- and stereoselective Bayer-Villiger oxidation

6. Haloperoxidase a. Oxidative addition of halide ion to nucleophilic sites b. Addition of hypohalous acids to olefinic bonds c. Ring cleavage of cyclopropanes d. Activated aromatic substrates converted to ortho and para derivatives e. 1.3 diketones converted to 2-halo-derivatives f. Heteroatom oxidation of sulfur and nitrogen containing substrates g. Oxidation of enol acetates, alkynes and activated aromatic rings

7. Lignin peroxidase/Diarylpropane peroxidase a. Oxidative cleavage of C-C bonds b. Oxidation of benzylic alcohols to aldehydes c Hydroxylation of benzylic carbons d. Phenol dimerization e. Hydroxylation of double bonds to form diols f. Cleavage of lignin aldehydes

8. Epoxide hydrolase a. Synthesis of enantiomerically pure bioactive compounds b. Regio- and enantioselective hydrolysis of epoxide Aromatic and olefinic epaxidation by monoaxygenases to form epoxides c. Resolution of racemic epoxides d. Hydrolysis of steroid epoxides

9. Nitrile hydratase/nitrilase a. Hydrolysis of aliphatic nitriles to carboxamides b. Hydrolysis of aromatic, heterocyclic, unsaturated aliphatic nitriles to corresponding acids c. Hydrolysis of acrylonitrile d. Production of aromatic and carboxamides, carboxylic acids (nicotinamide, picolinarnide, isonicotmamide) e. Regioselective hydrolysis of acrylic dinitrile f. α-amino acids from α -hydroxynitriles

10. Transaminase a. Transfer of amino groups into oxo-acids

11. Amidase/Acylase a. Hydrolysis of amides, amidines, and other C-N bonds b. Non-natural amino acid resolution and synthesis EXAMPLES

Example 1: DNA Isolation and Library Construction [00221] The following outlines the procedures used to generate a gene library from a mixed population of organisms.

[00222] DNA isolation. DNA is isolated using the IsoQuick Procedure as per manufacturer's instructions (Orca, Research Inc., Bothell, WA). DNA can be normalized according to Example 2 below. Upon isolation the DNA is sheared by pushing and pulling the DNA through a 25G double-hub needle and a 1-cc syringes about 500 times. A small amount is run on a 0.8% agarose gel to make sure the majority of the DNA is in the desired size range (about 3-6 kb). [00223] Blunt-ending DNA. The DNA is blunt-ended by mixing 45 ul of 1 OX Mung Bean Buffer, 2.0 ul Mung Bean Nuclease (150 u/ul) and water to a final volume of 405 ul. The mixture is incubate at 370C for 15 minutes. The mixture is phenol/chloroform extracted followed by an additional chloroform extraction. One ml of ice cold ethanol is added to the final extract to precipitate the DNA. The DNA is precipitated for 10 minutes on ice. The DNA is removed by centrifugation in a microcentrifuge for 30 minutes. The pellet is washed with 1 ml of 70% ethanol and repelleted in the microcentrifuge. Following centrifugation the DNA is dried and gently resuspended in 26 ul of TE buffer.

[00224] Methylation of DNA. The DNA is methylated by mixing 4 ul of 1 OX

EcoR I Methylase Buffer, 0.5 ul SAM (32 mM), 5.0 ul EcoR I Methylase (40 u/ul) and incubating at 370C, 1 hour. In order to insure blunt ends, add to the methylation reaction: 5.0 ul of 100 mM MgC12, 8.0 ul of dNTP mix (2.5 mM of each dGTP, dATP, dTTP, dCTP), 4.0 ul of Klenow (5 u ul) and incubate at 120C for 30 minutes.

[00225] After 30 minutes add 450 ul IX STE. The mixture is phenol/chloroform extracted once followed by an additional chloroform extraction. One ml of ice cold ethanol is added to the final extract to precipitate the DNA. The DNA is precipitated for 10 minutes on ice. The DNA is removed by centrifugation in a microcentrifuge for 30 minutes. The pellet is washed with 1 ml of 70% ethanol, repelleted in the microcentrifuge and allowed to dry for 10 minutes.

[00226] Ligation. The DNA is ligated by gently resuspending the DNA in 8 ul EcoR I adaptors (from Stratagene's cDNA Synthesis Kit), 1.0 ul of 10X Ligation Buffer, 1.0 ul of 10 mM rATP, 1.0 ul of T4 DNA Ligase (4Wu/ul) and incubating at 4oC for 2 days. The ligation reaction is terminated by heating for 30 minutes at 70oC.

[00227] Phosphorylation of adaptors. The adaptor ends are phosphorylated by mixing the ligation reaction with 1.0 ul of 10X Ligation Buffer, 2.0 ul of lOmM rATP, 6.0 ul of H2O, 1.0 ul of polynucleotide kinase (PNK) and incubating at 37oC for 30 minutes. After 30 minutes 31 ul H2O and 5 ml 1 OX STE are added to the reaction and the sample is size fractionate on a Sephacryl S-500 spin column. The pooled fractions (1-3) are phenol/chloroform extracted once followed by an additional chloroform extraction. The DNA is precipitated by the addition of ice cold ethanol on ice for 10 minutes. The precipitate is pelleted by centrifugation in a microfuge at high speed for 30 minutes. The resulting pellet is washed with 1 ml 70% ethanol, repelleted by centrifugation and allowed to dry for 10 minutes. The sample is resuspended in 10.5 ul TE buffer. Do not plate. Instead, ligate directly to lambda arms as above except use 2.5 ul of DNA and no water. [00228] Sucrose Gradient (2.2 ml) Size Fractionation. Stop ligation by heating the sample to 65oC for 10 minutes. Gently load sample on 2.2 ml sucrose gradient and centrifuge in mini-ultracentrifuge at 45K, 20oC for 4 hours (no brake). Collect fractions by puncturing the bottom of the gradient tube with a 20G needle and allowing the sucrose to flow through the needle. Collect the first 20 drops in a Falcon 2059 tube then collect 10 1-drop fractions (labeled 1-10). Each drop is about 60 ul in volume. Run 5 ul of each fraction on a 0.8% agarose gel to check the size. Pool fractions 1-4 (about 10-1.5 kb) and, in a separate tube, pool fractions 5-7 (about 5-0.5 kb). Add 1 ml ice cold ethanol to precipitate and place on ice for 10 minutes. Pellet the precipitate by centrifugation in a microfuge at high speed for 30 minutes. Wash the pellets by resuspending them in 1 ml 70% ethanol and repelleting them by centrifugation in a microfuge at high speed for 10 minutes and dry. Resuspend each pellet in 10 ul of TE buffer.

[00229] Test Ligation to Lambda Arms. Plate assay by spotting 0.5 ul of the sample on agarose containing ethidium bromide along with standards (DNA samples of known concentration) to get an approximate concentration. View the samples using UV light and estimate concentration compared to the standards. Fraction 1-4 = >1.0 ug/ul. Fraction 5-7 = 500 ng/ul.

[00230] Prepare the following ligation reactions (5 μl reactions) and incubate 4oC, overnight:

[00231] Test Package and Plate. Package the ligation reactions following manufacturer's protocol. Stop packaging reactions with 500 ul SM buffer and pool packaging that came from the same ligation. Titer 1.0 ul of each pooled reaction on appropriate host (OD₆₀₀ = 1.0) [XLI-Blue MRF]. Add 200 ul host (in mM MgSO₄) to Falcon 2059 tubes, inoculate with 1 ul packaged phage and incubate at 37°C for 15 minutes. Add about 3 ml 48°C top agar [50ml stock contaimng 150 ul IPTG (0.5M) and 300 ul X-GAL (350 mg/ml)] and plate on 100 mm plates. Incubate the plates at 37°C, overnight.

[00232] Amplification of Libraries (5.0 x 10⁵ recombinants from each library). Add

3.0 ml host cells (OD₆₀o⁼l-0) to two 50 ml conical tube and inoculate with 2.5 X 10⁵ pfu of phage per conical tube. Incubate at 37°C for 20 minutes. Add top agar to each tube to a final volume of 45 ml. Plate each tube across five 150 mm plates. Incubate the plates at 37°C for 6-8 hours or until plaques are about pin-head in size. Overlay the plates with 8-10 ml SM Buffer and place at 4°C overnight (with gentle rocking if possible).

[00233] Harvest Phage. Recover phage suspension by pouring the SM buffer off each plate into a 50-ml conical tube. Add 3 ml of chloroform, shake vigorously and incubate at room temperature for 15 minutes. Centrifuge the tubes at 2K rpm for 10 minutes to remove cell debris. Pour supernatant into a sterile flask, add 500 ul chloroform and store at 4°C.

[00234] Titer Amplified Library. Make serial dilutions of the harvested phage (for example, 10^"5= 1 ul amplified phage in 1 ml SM Buffer; 10^"6= 1 ul of the 10^"3 dilution in 1 ml SM Buffer). Add 200 ul host (in 10 mM MgSO ) to two tubes. Inoculate one tube with 10 ul 10^"6 dilution (10^"s). Inoculate the other tube with 1 ul 10^"6 dilution (10^"6). Incubate at 37°C for 15 minutes. Add about 3 ml 48°C top agar [50ml stock containing 150 ul IPTG (0.5M) and 375 ul X-GAL (350 mg/ml)] to each tube and plate on 100 mm plates. Incubate the plates at 37°C, overnight. Excise the ZAP II library to create the pBLUESCRIPT library according to manufacturers protocols (Stratagene).

EXAMPLE 2: Enzymatic Activity Assay [00235] The following is a representative example of a procedure for screening an expression library prepared in accordance with Example 1. In the following, the chemical characteristic Tiers are as follows:

Tier 1: Hydrolase Tier 2: Amide, Ester and Acetal Tier 3: Divisions and subdivisions are based upon the differences between individual substrates that are covalently attached to the functionality of Tier 2 undergoing reaction; as well as substrate specificity.

Tier 4: The two possible enantiomeric products which the protein, e.g. enzyme, may produce from a substrate.

[00236] Although the following example is specifically directed to the above- mentioned tiers, the general procedures for testing for various chemical characteristics is generally applicable to substrates other than those specifically referred to in this Example.

Screening: for Tier 1-hydrolase; Tier 2-amide. [00237] Plates of the library prepared as described in Example 1 are used to multiply inoculate a single plate containing 200 μl of LB Amp/Meth, glycerol in each well. This step is performed using the High Density Replicating Tool (HDRT) of the Beckman Biomek with a 1% bleach, water, isopropanol, air-dry sterilization cycle between each inoculation. The single plate is grown for 2h at 37°C and is then used to inoculate two white 96-well Dynatech microtiter daughter plates containing 250 μl of LB Arnp/Meth, glycerol in each well. The original single plate is incubated at 37°C for 18h, then stored at 80°C. The two condensed daughter plates are incubated at 37°C also for 18h. The condensed daughter plates are then heated at 70°C for 45 min. to kill the cells and inactivate the host E.coli proteins, e.g. enzymes. A stock solution of 5mg/mL morphourea phenylalanyl-7-arnino-4-trifluoromethyl cournarin (MuPheAFC, the 'substrate') in DMSO is diluted to 600 μM with 50 mM pH 7.5 Hepes buffer containing 0.6 mg/ml of the detergent dodecyl maltoside.

MuPheAFC [00238] Fifty μl of the 600 μM MuPheAFC solution is added to each of the wells of the white condensed plates with one 100 μl mix cycle using the Biomek to yield a final concentration of substrate of -100 μM. The fluorescence values are recorded (excitation = 400 nm, emission = 505 nm) on a plate reading fluorometer immediately after addition of the substrate (t=O). The plate is incubated at 70°C for 100 min, then allowed to cool to ambient temperature for 15 additional minutes. The fluorescence values are recorded again (t=100). The values at t=0 are subtracted from the values at t=100 to determine if an active clone is present.

[00239] The data will indicate whether one of the clones in a particular well is hydrolyzing the substrate. In order to determine the individual clone which carries the activity, the source library plates are thawed and the individual clones are used to singly inoculate a new plate containing LB Arnp/Meth, glycerol. As above, the plate is incubated at 37°C to grow the cells, heated at 70°C to inactivate the host proteins, e.g. enzymes, and 50 μl of 600 μM MuPheAFC is added using the Biomek. Additionally three other substrates are tested. They are methyl umbelliferone heptanoate, the CBZ-arginine rhodamine derivative, and fluorescein-conjugated casein (~3.2 mol fluorescein per mol of casein).

methyl umbelliferone heptanoate

(CBZ - arginine)₂ rhodamine 110

[00240] The umbelliferone and rhodamine are added as 600 μM stock solutions in 50 μl of Hepes buffer. The fluorescein-conjugated casein is also added in 50 μl at a stock concentration of 20 and 200 mg/ml. After addition of the substrates the t=0 fluorescence values are recorded, the plate is incubated at 70°C, and the t=100 min. values are recorded as above.

[00241] These data indicate which plate the active clone is in, where the arginine rhodamine derivative is also turned over by this activity, but the lipase substrate, methyl umbelliferone heptanoate, and protein, fluorescein-conjugated casein, do not function as substrates, the Tier I classification is 'hydrolase' and the Tier 2 classification is amide bond. No cross reactivity should be seen with the Tier 2-ester classification.

[00242] As shown in Figure 27, a recombinant clone from the library which has been characterized in Tier 1 as hydrolase and in Tier 2 as amide may then be tested in Tier 3 for various specificities. In Figure 1, the various classes of Tier 3 are followed by a parenthetical code which identifies the substrates of Table 1 which are used in identifying such specificities of Tier 3.

[00243] As shown in Figures 28 and 29, a recombinant clone from the library which has been characterized in Tier 1 as hydrolase and in Tier 2 as ester may then be tested in Tier 3 for various specificities. In Figures 2 and 3, the various classes of Tier 3 are followed by a parenthetical code which identifies the substrates of Tables 3 and 4 which are used in identifying such specificities of Tier 3. In Figures 2 and 3, R₂ represents the alcohol portion of the ester and Ri represents the acid portion of the ester.

[00244] As shown in Figure 30, a recombinant clone from the library which has been characterized in Tier 1 as hydrolase and in Tier 2 as acetal may then be tested in Tier 3 for various specificities. In Figure 29, the various classes of Tier 3 are followed by a parenthetical code which identifies the substrates of Table 5 which are used in identifying such specificities of Tier 3.

[00245] Proteins, e.g. enzymes, may be classified in Tier 4 for the chirality of the product(s) produced by the enzyme. For example, chiral amino esters may be determined using at least the following substrates:

[00246] For each substrate which is turned over the enantioselectivity value, E, is determined according to the equation below: ln[l-c(l+ee_p)] E = ln[l-c(l-ee_p)] where ee_p = the enantiomeric excess (ee) of the hydrolyzed product and c = the percent conversion of the reaction. See Wong and Whitesides, Proteins, e.g. enzymes, in Synthetic Organic Chemistry, 1994, Elsevier, Tarrytown, New York, pp. 9-12. [00247] The enantiomeric excess is determined by either chiral high performance liquid chromatography (HPLC) or chiral capillary electrophoresis (CE). Assays are performed as follows: two hundred μl of the appropriate buffer is added to each well of a 96-well white microtiter plate, followed by 50 μl of partially or completely purified protein, e.g. enzyme, solution; 50 μl of substrate is added and the increase in fluorescence monitored versus time until 50% of the substrate is consumed or the reaction stops, whichever comes first.

EXAMPLE 3: Construction of a Stable, Large Insert Picoplankton Genomic DNA Library [00248] Figure 5 shows an overview of the procedures used to construct an environmental library from a mixed picoplankton sample. A stable, large insert DNA library representing picoplankton genomic DNA was prepared as follows.

Cell collection and preparation of DNA. [00249] Agarose plugs containing concentrated picoplankton cells were prepared from samples collected on an oceanographic cruise from Newport, Oregon to Honolulu, Hawaii. Seawater (30 liters) was collected in Niskin bottles, screened through 10 μm Nitex, and concentrated by hollow fiber filtration (Amicon DC10) through 30,000 MW cutoff polyfulfone filters. The concentrated bacterioplankton cells were collected on a 0.22 11m, 47 mm Durapore filter, and resuspended in 1 ml of 2X STE buffer (1M NaCI, 0.1M EDTA, 10 mM Tris, pH 8.0) to a final density of approximately 1 x 10¹⁰ cells per ml. The cell suspension was mixed with one volume of 1% molten Seaplaque LMP agarose (FMC) cooled to 40°C, and then immediately drawn into a 1 ml syringe. The syringe was sealed with parafilm and placed on ice for 10 min. The cell-containing agarose plug was extruded into 10 ml of Lysis Buffer (lOmM Tris pH 8.0, 50 mM NaCI, 0.1M EDTA, 1% Sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml lysozyme) and incubated at 37°C for one hour. The agarose plug was then transferred to 40 ml of ESP Buffer (1% Sarkosyl, 1 mg/ml proteinase K, in 0.5M EDTA), and incubated at 55°C for 16 hours. The solution was decanted and replaced with fresh ESP Buffer, and incubated at 55°C for an additional hour. The agarose plugs were then placed in 50 mM EDTA and stored at 4°C shipboard for the duration of the oceanographic cruise.

[00250] One slice of an agarose plug (72 μl) prepared from a sample collected off the Oregon coast was dialyzed overnight at 4°C against 1 ml of buffer A (lOOmM NaCI, lOmM Bis Tris Propane-HCI, 100 μg/ml acetylated BSA: pH 7.0 (@ 25°C) in a 2 ml microcentrifuge tube. The solution was replaced with 250 μl of fresh buffer A containing

10 mM MgCl₂ and 1 mM DTT and incubated on a rocking platform for 1 hr at room temperature. The solution was then changed to 250 μl of the same buffer containing 4U of Sau3Al (NEB), equilibrated to 37°C in a water bath, and then incubated on a rocking platform in a 37°C incubator for 45 min. The plug was transferred to a 1.5 ml microcentrifuge tube and incubated at 68°C for 30 min to inactivate the protein, e.g. enzyme, and to melt the agarose. The agarose was digested and the DNA dephosphorylased using Gelase and HK-phosphatase (Epicentre), respectively, according to the manufacturer's recommendations. Protein was removed by gentle phenol/chloroform extraction and the DNA was ethanol precipitated, pelleted, and then washed with 70% ethanol. This partially digested DNA was resuspended in sterile H₂O to a concentration of

2.5 ng/μl for ligation to the pFOSl vector.

[00251] PCR amplification results from several of the agarose plugs (data not shown) indicated the presence of significant amounts of archaeal DNA. Quantitative hybridization experiments using rRNA extracted from one sample, collected at 200 m of depth off the Oregon Coast, indicated that planktonic archaea in (this assemblage comprised approximately 4.7% of the total picoplankton biomass (this sample corresponds to "PAQ"-200 m in Table 1 of DeLong et al., high abundance of Archaea in Antarctic marine picoplankton, Nature, 377:695-698, 1994). Results from archaeal-biased rDNA PCR amplification performed on agarose plug lysates confirmed the presence of relatively large amounts of archaeal DNA in this sample. Agarose plugs prepared from this picoplankton sample were chosen for subsequent fosmid library preparation. Each 1 ml agarose plug from this site contained approximately 7.5 x 10⁵ cells, therefore approximately 5.4 x 10⁵ cells were present in the 72 μl slice used in the preparation of the partially digested DNA.

[00252] Vector arms were prepared from pFOSl as described (Kim et al, Stable propagation of casmid sized human DNA inserts in an f-factor based vector, Nucl. Acids Res., 20:10832-10835, 1992). Briefly, the plasmid was completely digested with Astπ, dephosphorylated with HK phosphatase, and then digested with BamHI to generate two arms, each of which contained a cos site in the proper orientation for cloning and packaging ligated DNA between 35-45 kbp. The partially digested picoplankton DNA, isolated by partial fragment gel electrophoresis (PFGE), was ligated overnight to the PFOSl arms in a 15 μl ligation reaction containing 25 ng each of vector and insert and 1U of T4 DNA ligase (Boehringer-Mannheim). The ligated DNA in four microliters of this reaction was in vitro packaged using the Gigapack XL packaging system (Stratagene), the fosmid particles transfected to E. coli strain DH10B (BRL), and the cells spread onto LB_cmi₅ plates. The resultant fosmid clones were picked into 96-well microliter dishes containing LB_cmι₅ supplemented with 7% glycerol. Recombinant fosmids, each containing cat 40 kb of picoplankton DNA insert, yielded a library of 3,552 fosmid clones, containing approximately 1.4 x 10⁸ base pairs of cloned DNA. All of the clones examined contained inserts ranging from 38 to 42 kbp. This library was stored frozen at -80°C for later analysis.

[00253] Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced other than as particularly described.

EXAMPLE 4: CsCl-Bisbenzimide Gradients Gradient visualization by UV:

[00254] Visualize gradient by using the UV handlamp in the dark room and mark bandings of the standard which will show the upper and lower limit of GC-contents.

Harvesting of the gradients:

1. Connect Pharmacia-pump LKB PI with fraction collector (BIO-RAD model 2128). 2. Set program: rack 3, 5 drops (about 100 ul), all samples.

3. Use 3 microtiter-dishes (Costar, 96 well cell culture cluster).

4. Push yellow needle into bottom of the centrifuge tube.

5. Start program and collect gradient. Don't collect first and last 1-2 ml depending on where your markers are. Dialysis 1. Follow microdialyzer instruction manual and use Spectra/Por CE Membrane MWCO 25,000 (wash membrane with ddH20 before usage). 2. Transfer samples from the microtiter dish into microdialyzer (Spectra/Por, 3. MicroDialyzer) with multipipette. (Fill dialyzer completely with TE, get rid of any air bubble, transfer samples very fast to avoid new air-bubbles). 4. Dialyze against TE for 1 hr on a plate stirrer. DNA estimation with PICOGREEN™ 1. Transfer samples (volume after dialysis should be increased 1.5 - 2 times) with multipipette back into microtiter dish. 2. Transfer 100 ul of the sample into Polytektronix plates. 3. Add 100 ul Picogreen-solution (5 ul Picogreen-stock-solution + 995 ul TE buffer) to each sample. 4. Use WPR-plate-reader. 5. Estimate DNA concentration.

EXAMPLE 5: Bis-Benzimide Separation of Genomic DNA - [00255] A sample composed of genomic DNA from Clostridium perfringens (27% G+C), Escherichia coli (49% WC) and Micrococcus lysodictium (72% G+C) was purified on a cesium-chloride gradient. The cesium chloride (Rf = 1.3980) solution was filtered through a 0.2 m filter and 15 ml were loaded into a 35 ml OptiSeal tube (Beckman). The DNA was added and thoroughly mixed. Ten micrograms of bis-benzimide (Sigma; Hoechst 33258) were added and mixed thoroughly. The tube was then filled with the filtered cesium chloride solution and spun in a VTi5O rotor in a Beckman L8-70 Ultracentrifuge at 33,000 rpm for 72 hours. Following centrifugation, a syringe pump and fractionator (Brandel Model 186) were used to drive the gradient through an ISCO UA-5 UN absorbance detector set to 280 nm. Three peaks representing the DΝA from the three organisms were obtained. PCR amplification of DΝA encoding rRΝA from a 10-fold dilution of the E. coli peak was performed with the following primers to amplify eubacterial sequences:

Forward primer: (27F) 5 -AGAGTTTGATCCTGGCTCAG-3 (SEQ ID ΝO:l) Reverse primer: (1492R) 5 -GGTTACCTTGTTACGACTT-3 (SEQ ID NO:2)

EXAMPLE 6: FACS/Biopanning Infection of library lysates into Exp503 E.coli strain.

61 [00256] 25 ml LB + Tet culture of Exp503 were cultured overnight at 37 C. The next day the culture was centrifuged at 4000 rpm for 10 minutes and the supernatant decanted. 20ml lOmM MgSO₄ was added and the OD₆₀₀ checked. Dilute to OD 1.0.

[00257] In order to obtain a good representation of the library, at least 2-fold (and preferably 5-fold) of the library lysate titer was used. For example: Titer of library lysate is 2x106 cfu/ml. Need to plate at least 4x106 cfu. Can plate approx. 500,000 microcolonies/ 150mm LB-Kan plate. Need 8 plates. Can plate 1 ml of reaction/plate- need 8 mis of cells + lysate.

[00258] 2-fold (ex. 2 ml) of library lysate was mixed with appropriate amount ( e.g., 6 ml) of OD 1.0 Exp503. The sample was incubated at 37oC for at least 1 hour. Plated 1 ml reaction on 150mm LB-Kan plate x 8 plates and incubated overnight at 30oC. Harvesting, induction, and fixing of library in Exp503 cells. Scrape all cells from plates into 20 ml LB using a rubber policeman. Dilute cells approx. 1:100 (200 ul cells/ 20 ml LB) and incubate at 37oC until culture is OD 0.3. Add 1:50 dilution of 20% sterile Glucose and incubate at 37oC until culture is OD 1.0. Add 1 : 100 dilution of 1M MgSO4.

Transfer 5 ml of culture to a fresh tube and the remaining culture can be used as an uninduced control if desired or discarded. Add MOI 5 of CE6 bacteriophage to the remaining 5 ml of culture. (CE6 codes for T7 RNA Polymerase) (e.g., OD 1 = 8x108 cells/ml x 5 ml = 4x109 cells x MOI 5 = 2x1010 bacteriophage needed). Incubate culture + CE6 for 2 hr at 37oC. Cool on ice and centrifuge cells at 4000 rpm for 10 min.

Wash with 10 ml PBS. Fix cells in 600 ul PBS + 1.8 ml fresh, filtered 4% paraformaldehyde. Incubate on ice for 2 hrs. (4% Paraformaldehyde: Heat 8.25 ml PBS in flask at 65oC. Add 100 ul 1M NaOH and 0.5 g paraformaldehyde (stored at 4oC.) Mix until dissolved. Add 4.15 ml PBS. Cool to OoC. Adjust pH to 7.2 with 0.5 M NaH2PO4. Cool to OoC. Syringe filter. Use within 24 hrs). After fixing, centrifuge at

4000 rpm for 10 min. Resuspend in 1.8 ml PBS and 200 ul 0.1% NP40. Store at 4oC overnight.

Hybridization of fixed cells.

[00259] Centrifuge fixed cells at 4000 rpm for 10 min. Resuspend in 1 ml 40 mM Tris ρH7.6/ 0.2% NP40. Transfer 100 ul fixed cells to an Eppendorf tube. Centrifuge for 1 min and remove supernatant. Resuspend each reaction in 50 ul Hybridization buffer (0.9 M NaCI; 20 mM Tris pH7.4; 0.01% SDS; 25% formamide- can be made in advance and stored at -20oC). Add 0.5 nmol fluorescein-labeled primer to the appropriate reactions. Incubate with rocking at 46oC for 2 hr. (Hybridization temperature may depend on sequence of primer and template.) Add 1 ml wash buffer to each reaction, rinse briefly and centrifuge for 1 min. Discard supernatant. (Wash buffer: 0.9 M NaCI; 20 m-M Tris pH 7.4; 0.01% SDS). Add another 1 ml of wash buffer to each reaction, and incubate at 48oC with rocking for 30 min. Centrifuge and remove supernatant. Nisualize cells under microscope using WIB filter.

FACS sorting.

[00260] Dilute cells in 1 ml PBS. If cells are clumping, sonicate for 20 seconds at 1.5 power. FAC sort the most highly fluorescent single-cells and collect in 0.5 ml PCR strip tubes (approximately one 96-well plate/ library). PCR single-cells with vector specific primers to amplify the insert in each cell. Elecfrophorese all samples on an agarose gel and select samples with single inserts. These can be re-amplified with Biotin-labeled primers, hybridized to insert-specific primers, and examined in an ELISA assay. Positive clones can then be sequenced. Alternatively, the selected samples can be re-amplified with various combinations of insert-specific primers, or sequenced directly.

EXAMPLE 7: Large Insert FACS Biopanning Protocol

1. Encapsulate 1 vial of 3% home-made SeaPlaque gel. Each vial of gel can make 10⁶ GMD. Take lOOul melt frozen fosmid pMF21/DH10B library, OD600 = 0.4 to encapsulate, centrifuge down to lOul. Melt agarose gel, add lOOul FBS (fetal bovine serum) and vortex. Place in 50 C water in a beaker. Add lOul culture, vortex and add to 17ml mineral oil. Shake for about 30 times, place on the One Cell machine. Blend at 2600rpm lmin at room temperature and 2600rpm 9 minutes on ice. Wash with PBS twice. Resuspend in 10ml LB+ Apr⁵⁰, shake at 37°C for 4 hours at 230 rpm. Check microscopically to see the growth and size of microcolonies.

2. Centrifuge at 1500rpm for 6 min. GMDs are resuspend in 5ml of 2xSSC and can be saved at 4 °C for several days. Take 200ul GMD in 2xSSC for each reaction.

3. Resuspend in 10 ml 2xSSC/5% SDS. Incubate 10 min at RT shaking or rotating. Centrifuge. 4. Resuspend in 5 ml lysis solution containing proteinase K. Incubate 30 min at 37°C shaking or rotating. Centrifuge. Ey-?.^'-? Solution: 50mM Tris pH8 0J5ml 1M Tris 50mM ΕDTA 1.5ml 0.5M ΕDTA lOOmM ΝaCl 300 ul 5MΝaCl 1% Sarkosyl 0J5ml 20% Sarkosyl 250ug/ml Proteinase K 375ul proteinase K stock (lOmg/ml) 11.325ml dH2O 5. Resuspend in 5 ml denaturing solution. Incubate 30 min at RT shaking or rotating. Centrifuge at 1500rpm for 5 min.

Denaturing Solution: 0.5M NaOH/1.5M NaCl

6. Resuspend in 5 ml neutralizing solution. Incubate 30 min at RT shaking or rotating. Centrifuge.

Neutralizing Solution: 0.5M Tris pH8/1.5M NaCI

7. Wash in 2XSSC briefly.

8. Aliquot 200ul /RxN into microcentrifuge tubes, microcentrifuge and take out the 2XSSC. Add 130 ul "DIG EASY HYB" to prehyb for 45 minutes at 37°C. Do prehyb and hyb in Personal Hyb Oven. 9. Aliquot oligo probe and denature at 85°C for 5 minutes, place on ice immediately.

Add appropriate amount of probe (0.5-lnmol/RXN) and return to rotating hyb. oven for

O/N.

10. Prepare a 1% (lOmg/ml) solution of Blocking Reagent in PBS. Store at 4°C for the day use. 11. Wash GMD's with 0.8ml of 2XSSC/0.1%SDS RT 15 min, rotating. At the meantime, prewarm next wash solution.

12. Wash GMD's with 0.8ml of 0.5XSSC/0.1%SDS 2xl5min at appropriate temp, rotating. If more stringency is required, the 2^nd wash can be done in 0.1XSSC/0.1%SDS.

13. Wash with 0.8ml/RXN 2XSSC briefly. 14. Block the reaction w/130ul 1% Blocking Reagent in PBS at RT for 30 minutes.

15. Add 1.4ul anti-DIG-POD (so 1:100) and incubate at RT for 3 hours.

16. Wash GMDs w/ 0.8ml PBS/RN 3x 7 minutes at 37°C.

17. Prepare a tyramide working solution by diluting the tyramide stock solution 1:85 in Amplification buffer/0.0015% H₂0₂. Apply 130ul tyramide working solution at RT and incubate in the dark at RT for 30 minutes.

18. Wash 3X for 7 min. in 0.8ml PBS buffer @37°C.

19. Visualize by microscope and FACS sort.

EXAMPLE 8: Biopanning Protocol Preparing Insert DNA from the Lambda DNA

[00261] PCR amplify inserts using vector specific primers CA98 and CA103.

CA98: ACTTCCGGCTCGTATATTGTGTGG (SEQ ID NO: 4) CA103 : ACGACTCACTATAGGGCGAATTGGG (SEQ ID NO: 5)

These primers match perfectly to lambda ZAP Express clones (pBKCMV). Reagents: Lambda DNA prepared from the libraries to be panned (Librarians) Roche Expand Long Template PCR System #1-759-060 Pharmacia dNTP mix #27-2094-01 or Roche PCR Nucleotide Mix (10 mM) #1-581 -295 or Roche dNTP's - PCR grade #1-969-064 Make the insert amplification mix: X μl dH₂O (final 50 μl) 5 μl lOx Expand Buffer #2 (22.5 mM MgCl₂) 0.5 or 0.625 μl dNTP mix (20 mM each dNTP) 10 ng (approx) lambda DNA per library (usually lμl or 1 μl 1 : 10 diln) 1-2 μl CA98 (100 ng/μl or 15 μM) 1-2 μl CA103 (100 ng/μl or 15μM) 0.5 μl Expand Long polymerase mix PCR amplify: Robocycler

Analyze 5 μl of reaction product on a gel.

Note: The reaction product should be a strong smear of products usually ranging from 0.5-5 kb in size and centered around 1.5-2 kb. Prepare Biotinylated Hook Reagents: PCR reagents Biotin-14-dCTP (BRL #19518-018) Individual dNTP stock solutions (Roche dNTP's #1-969-064) Gene specific template and primers PCR purification kit (Roche #1732668 or Qiagen Qiaquick #28106) 1. Make 1 Ox biotin dNTP mix: 150 μl biotin-14-dCTP 3 μl 100 mM dNTP 3 μl 100 mM dGTP 3 μl 100 mM dTTP 1.5 μl l00 mM dCTP 2. Make PCR mix: 74 μl water 10 μl lOx Expand Buffer #1 10 μl lOx biotin dΝTP mix (step #1) 2 μl Primer #1 (100 ng/μl) 2 μl Primer #2 (100 ng/μl) 1 μl template (gene specific) (100 ng/μl) 1 μl Expand Long polymerase mix 3. PCR amplify: Robocycler

* Use an annealing temperature appropriate for your primers. ** Allow 1 minute/ kb of target length. 4. Clean up the reaction product using a PCR purification kit. Elute in 50 μl 5T.1E or Qiagen's EB buffer (10 mM Tris pH 8.5). 5. Check 5 μl on an agarose gel. Note: The product may be slightly larger than expected due to the incorporation of biotin. Biopanning Reagents: Streptavidin-conjugated paramagnetic beads (CPG MPG-Streptavidin lOmg/ml #MSTR0502)(Dynal Dynabeads M-280 Streptavidin) Sonicated, denatured salmon sperm DNA (heated to 95°C, 5 min) (Stratagene # 201190) PCR reagents dNTP mix Magnetic particle separator Topo-TA cloning kit with ToplOF' comp cells (Invitrogen #K4550-40) High Salt Buffer: 5M NaCI, lOmM EDTA, lOmM Tris pH 7.3 1. Make the following reaction mix for each library/ hook combination: 5 μg insert DNA (PCR amplified lambda DNA) 100 ng Biotinylated hook (100 ng total if using more than one hook) 4.5 μl 20x SSC for a 3x final concentration (or High Salt buffer) X μl dH₂O for a final volume of 30 μl 2. Denature by heating to 95°C for 10 min. (Robocycler works well for this step). 3. Hybridize at 70°C for 90 min. (Robocycler) 4. Prepare 100 μl of MPG beads for each sample: Wash 100 μl beads two times with 1 ml 3x SSC Resuspend in: 50 μl 3x SSC (or High Salt bujfei') 10 μl Sonicated, denatured salmon sperm DNA (10 mg/ml) to block (or 100 ng total) (Do not ice) 5. Add the hybridized DNA to the washed and blocked beads . 6. Incubate at room temp for 30 min, agitating gently in the hybridization oven. 7. Wash twice atroom temp with 1 ml O.lx SSC/ 0.1% SDS, (or high salt buffer) using magnetic particle separator. 8. Wash twice at 42°C with 1 ml O.lx SSC/ 0.1% SDS (or high salt buffer) for 10 min each, (magnet) 9. Wash once at room temp with 1 ml 3x SSC. (magnet) 10. Elute DNA by resuspending the beads in 50 μl dH₂O and heating the beads to 70°C for 30 min or 85°Cfor 10 min. in the hyb oven (or thermomixer at 500rpm). Separate using magnet, and discard the beads. 11. PCR amplify 1 - 5 μl of the panned DNA using the same protocol as Preparing Insert DNA from the Lambda DNA above. 12. Check 5 μl on agarose gel.

Note: The reaction product should be a strong smear of products usually ranging from 0.5-5 kb in size and centered around 1.5-2 kb. 13. Clone 1-4 μl into pCR2.1-TopoTA cloning vector. 14. Transform 2 x 3 μl into Topi OF' chemically comp cells. Plate each transformation on 2 x 150mm LB-kan plates. Incubate at 30°C overnight. (Ideal density is ~ 3000 colonies per plate). Repeat transformation if necessary to get a representative number of colonies per library. Archive the Biopanned DNA. 15. Transfer plates to Hybridization group, along with appropriate templates and a single primer for run off PCR ³²P-labeling reactions.

Analysis of Results 1. Filter lifts from plates will be performed, and hybridized to the appropriate probe. Resultant films will be given to the Biopanned. 2. Align films to original colony plates. Colonies corresponding to positive "dots-on- film" should be toothpicked, patched onto an LB-Kan plate, and inoculated in 4 ml TB- Kan. For automation, inoculate 1 ml TB-kan in a 96-well plate and incubate 18 hrs. at 37°C. 3. Overnight cultures are mini-prepped (Biomek if possible). Digest with EcoRl to determine insert size. 2 μl DNA 0.5 μl EcoRl l μl lOx EcoRI buffer 6.5 μl dH₂O Incubate at 37°C for 1 hr. Check insert size on agarose gel. Large insert clones (>500bp) are then PCR confirmed if possible with gene specific primers. 4. Putative positive clones are then sequenced. 5. Glycerol stocks should be made of all interesting clones (>500bp). Example 9: High Throughput Cultivation Of Marine Microbes From Sea Sample

1. Preparation of cell suspension [00262] Cells were obtained after filtering 110 L of surface water through a 0.22 μm membrane. The cell pellet was then resuspended with seawater and a volume of 100 μL was used for cell encapsulation. This provided cell numbers of approximately 10⁷ cells per mL.

2. Cell encapsulation into GMDs [00263] The following reagents were used: CelMixTM Emulsion Matrix and CelGelTM Encapsulation Matrix (One Cell Systems, Inc., Cambridge, MA), Pluronic F-68 solution and Dulbecco's Phosphate Buffered Saline (PBS, without Ca2+ and Mg2+). Scintillation vials each containing 15 ml of CelMixTM emulsion matrix were placed in a 40oC water bath and were equilibrated to 40oC for a minimum of 30 minutes. 30 ul of Pluronic Solution F-68 (10%) was added to each of 6 vials of melted CelGelTM agarose. The agarose mixture was incubated to 40oC for a minimum of 3 minutes. 100 ul of cells (resuspended in PBS) were added per 6 vials of the CelGelTM bottles and the resulting mixture was incubated at 40oC for 3 minutes. Using a 1 ml pipette and avoiding air bubbles, the CelGelTM-cell mixture was added dropwise to the warmed CelMixTM in the scintillation vial. This mixture was then emulsified using the CellSyslOOTM MicroDrop maker as follows: 2200 rpm for 1 minute at room temperature (RT), then 2200 rpm for 1 minute on ice, then 1100 rpm for 6 minutes on ice, resulting in an encapsulation mixture comprised of microdrops that were approximately 10-20 microns in diameter. The encapsulation mixture was then divided into two 15 ml conical tubes and in each vial, the emulsion was overlayed with 5 ml of PBS. The vials tubes were then centrifuged at 1800 rpm in a bench top centrifuge for 10 minutes at RT, resulting in a visible Gel MicroDrop (GMD) pellet. The oil phase was then removed with a pipette and disposed of in an oil waste container. The remaining aqueous supernatant was aspirated and each pellet was resuspended in 2 ml of PBS. Each resuspended pellet was then overlayed with 10 ml of PBS. The GMD suspension was then centrifuged at 1500 rpm for 5 minutes at RT. Overlaying process is repeated and the GMD suspension is centrifuged again to remove all free-living bacteria. The supernatant was then removed and the pellet was resuspended in 1 ml of seawater. 10 ul of the GMD suspension was then examined under the microscope in order to check for uniform GMD size and containment of then encapsulated organism into the GMD. This protocol resulted in 1 to 4 cells encapsulated in each GMD.

3. Sorting of GMDs containing single cells for identification by 16S rRNA gene sequence [00264] On the first day of cultivation we sorted occupied GMDs that contained one to 4 cells, although most had only single cells. The sorting was done in a Mo-Flo instrument (Cytomation) by staining the cells inside the GMDs with Syto9 and then selecting green fluorescence (from the stain) and side-scatter as parameters for sorting gates. The staining was necessary since the cells are much smaller than E.coli and therefore show very low light-scatter signals. The target GMDs were sorted into a 96-well plate containing a PCR mixture and ready to be amplified immediately after sorting. We used a Hotstart enzyme (Qiagen) such as no reaction would occur before boiling for 15 min and therefore allows to work at room temperature before amplification. Before starting the PCR it was necessary to radiate the PCR mixture with a Stratalinker (Stratagene) at full power for 14 min to cross-link any potential genomic DNA present in the mixture before sorting. The primers used include the pair 27F and 1392R and 27F and 1522R according to the positions in E.coli gene sequence. The primers were obtained from IDT-DNA Technologies and were purified by HPLC. The primer concentration used in the reactions was 0.2 μM. We used a "touchdown" program consisting of 3 stages: a) boiling 15 min, b) 15 cycles decreasing the annealing temperature from 62 to 55°C by 0.5 degrees per cycle, c) a series of cycles (20-40) increasing the annealing time 1 sec per cycle starting with 30 sec but keeping the temperature constant at 55°C. All the other stages of the PCR were as recommended by manufacturer. This protocol allowed the amplification of the 16S rRNA gene from individual cells encapsulated or small consortia of cells. The PCR products were then cloned into TOPO-TA (Invitrogen) cloning vectors and sequenced by dye- termination cycle sequencing (Perkin-Elmer ABI).

Cell growth of encapsulated cells inside GMDs [00265] The encapsulated GMDs were placed into chromatography columns that allowed the flow of culture media providing nutrients for growth and also washed out waste products from cells. The experiment consisted of 4 treatments including the use of seawater, and amendments (inorganic nutrients including trace metals and vitamins, amino acids including trace metals and vitamins, and diluted rich organic marine media). This different set of nutrients provided a gradient to bias different microbial populations. The seawater used as base for the media was filter sterilized through a 1000 kDa and a 0.22 μm filter membranes prior to amendment and introduction to the columns. The cells were then incubated for a period of 17 weeks and cell growth was monitored by phase contrast microscopy. Cell identification was done by 16S rRNA gene sequence of grown colonies. 4. Sorting of GMDs containing colonies consisting of one or more cell types [00266] To identify the diversity and the community composition of the different treatments we performed a "bulk sorting" of the GMDs. This was done by taking a subsample of the GMDs from each column and run them into the Flow-cytometer. We selected as gating criteria forward- and side-scatter as occupied GMDs with a colony of 10 or more cells of individual cell sizes ranging from 0.5 to 5 μm were easy to discriminate from empty GMDs. We verified each time by phase contrast microscopy that we selected the correct gate for sorting. We then sorted a total of 300 GMDs per each individual PCR reaction (prepared as above) and ran the reaction in a thermocycler for a total of 50 to 60 cycles to have enough PCR product to be visualized by gel electrophoresis. The resulting

PCR reactions from the same column were combined (2 to 4 replicates), cloned and sequenced as above to assess the phylogenetic diversity from each column and observe the bias effect resulting from the use of different nutrient regimes.

Gene sequencing and phylogenetic analyses [00267] The gene sequences were aligned and compared to our 16S rRNA database with the ARB phylogenetic program. Maximum Parsimony and neighbor joining trees were constructed using the amplified gene sequences (approximately 1400 bp).

EXAMPLE 10: Microextraction Procedure [00268] A single copy of Streptomyces containing clones from a mixed population are

FACS-sorted onto agar, allowed to develop into individual colonies, and bioassayed as individual clones.

Construction Of A Clone Expressing A Bioactive Metabolite

[00269] A genomic library of Streptomyces murayamaensis is constructed in pJO436 (Bierman et al., Gene 1991 116:43-49) vector and hybridized with probes for polyketide synthase. A clone (IB) which hybridized was chosen and shuttled into Streptomyces venezuelae ATCC 10712 strain. The vector pMF17 was also introduced into S. diversa as a negative control. When bioassayed on solid media, clone IB expressed strong bioactivity towards Micrococcus luteus demonstrating that the insert present in clone IB encoded a bioactive polyketide molecule. EXAMPLE 11: FACS-sorting of S. venezuelae clones

[00270] The S. venezuelae exconjugant spores containing clone IB, as well as pJO436 vector, are FACS-sorted in 48-well, 96-well, and 384-well format into corresponding plates containing MYM agar + Apramycin 50ug/ml. The single spore clones were allowed to germinate, grow and sporulate for 4-5 days.

Natural product extraction procedure:

[00271] After the clones were fully grown and sporulated for 4-5 days, following volumes of solvent methanol were added to the each well containing the clones.

48 well format: 0.8 ml

96 well format : 0.100 ml 384 well format : 0.06 ml

[00272] The plates were incubated at room temperature overnight.

[00273] The next day, the following volumes were recovered from the wells containing the clones.

48 well format : 0.3 ml 96 well format : 0.060 ml 384 well format: 0.030 ml

[00274] The extracts were assayed from a single well, and after combining extracts from 2, 4 and 10 wells. The methanol extract was dried and resuspended in 40 ul of methanol: water and 20 ul of which was assayed against M. luteus as the indicator strain.

[00275] A single colony of S. venezuelae_contam g clone IB produced enough bioactive molecule, in 48-well, 96-well as well as 384-well format, to be extracted by the microextraction procedure and to be detected by bioassay.

EXAMPLE 12: Expression of actinorhodin pathway in S. venezuelae 10712 [00276] When Sau3A pIJ2303 library constructed in pJO436 was introduced into S. venezuelae, one exconjugant which appeared blue-grey in color was spotted. This exconjugant showed blue pigment on R2-S agar demonstrating the successful expression of a heterologous pathway (actinorhodin) pathway in S. venezuelae. JO436 Segregational stability ofS. venezuelae 10712 (pJ0436:: actinorhodin) [00277] Since Streptomyces clones for small molecule production are grown in absence of antibiotic selection, it was important to determine how stable the S. venezuelae pJO436 recombinant clones are. The S. venezuelae 10712 (pJO436::actinorhodin) clone was used as an example.

[00278] The act clone was grown in R2-S liquid cultures with and without apramycin and total cell count was done by plating on R2-S agar with and without apramycin. The act clone gave 100% and 96% apramycin resistant colonies when grown with and without apramycin, respectively. This demonstrates that S. venezuelae pJO436 clones are quite stable segregationally.

Expression stability ofS. venezuelae 10712 (pJQ436:: actinorhodin) [00279] Expression of the actinorhodin gene cluster in S. venezuelae 10712 has been demonstrated. However, when this clone was grown in liquid cultures it failed to produce actinorhodin, as determined by the absence of its blue color. Nonetheless, when mycelia from such cultures were plated on solid media, actinorhodin producing colonies were clearly evident. The majority of the colonies produced a faint blue color while a few colonies produced abundant actinorhodin. These colonies which produce actinorhodin abundantly have been named as HBC (hyper blue clones) clones.

[00280] These observations demonstrate that perhaps in HBC clones, a host mutation has occurred which allows very efficient actinorhodin expression. Mutations which could lead to efficient actinorhodin expression could include a variety of targets such as, elimination of negative regulators like cutRS, over-expression of positive regulators, or efficient expression of pathways which provide precursors for actinorhodin. The hyper production of actinorhodin by the HBC clones thus strongly demonstrates that it is indeed possible for us to construct a strain which is more optimized for heterologous expression of small molecules, by random mutagenesis or by specific cutRS knockout mutagenesis.

Construction ofajadomycin blocked mutant ofS. venezuelae

[00281] Orfl of the jadomycin biosynthetic gene cluster was chosen as a target. Primers were designed so as to amplify jad-L and jad-R fragments with proper restriction sites for future subcloning. S. venezuelae is reasonably sensitive to hygromycin and therefore, hygromycin resistance gene will be used to disrupt the orf-1 gene. The strategy used for disrupting the jadomycin orf-1 is described in the attached figure. The hyg- disrupted copy of the orf-1 gene will then be placed on pKC1218 and used for gene replacement in the S. venezuelae 10712, as well as VS153 chromosome.

Expression of the yellow clone in S. venezuelae [00282] The single arm rescue technique to recover the yellow clone insert from S. lividans clone 525Sm575 was described. The recovered clone #3 was mated into S. venezuelae 10712 as well as VS153. Yellow color was evident after several days on both 10712 as well as VS153 plates but absent in the pJO436 vector alone controls. Three 10712 yellow clones were grown in liquid R2-S medium and all three produced yellow color profusely. This experiment has validated S. venezuelae as a host and pJO436 as the vector for heterologous expression for the second time, the first time being with the actinorhodin gene cluster. This yellow clone insert could now be used in validation of different strains in our strain improvement program.

Development of a mating protocol in a microtiter plate format. [00283] In order to have the individual E. coli donor clones archived, we are attempting to develop a mating protocol in a microtiter plate format. According to this protocol, we plan to sort the E. coli library into a 96-well microtiter plate. The matings with S. diversa would then be done in on a R2-S agar plate in an array format corresponding to the 96-well microtiter plate containing the E. coli clones. The bioassays can be either conducted on the mating R2-S plate or the clones can be first replica plated on to another suitable agar plate and then bioassayed. This approach will allow us to go back to the E. coli clones once we detect a bioactive clone among the S. diversa exconjugant library. The E. coli clone can then be mated back into S. diversa for re- transformation and confirmation of the bioactivity.

[00284] In a preliminary experiment, matings were done by spotting S. diversa spores together with E. coli donor cells on R2-S agar plate (rather than spreading). After about 8 hours the plate was overlayed as usual with apramycin and nalidixic acid. The exconjugants appeared only on those spots were E. coli donor was added, but not on those spots containing S. diversa spores alone. These initial data are very promising, although some more standardization needs to be done to develop this technique fully.

EXAMPLE 13: Production of single cells or fragmented mycelia [00285] In order to produce single cells or fragmented mycelia, 25ml MYM media was inoculated (see recipe below) in 250 ml baffled flask with 100 ul of Streptomyces 10712 spore suspension and incubated overnight at 30°C 250rpm. After a 24 hour incubation, 10 ml was transferred to 50ml conical polypropylene centrifuge tube and centrifuged at 4,000rpm for 10 minutes @ 25°C. Supernatant was decanted and the pellet was resuspended in 10ml 0.05M TES buffer. The cells were sorted into MYM agar plates (sort 1 cell per drop, 5 cells per drop, 10 cells per drop) and we incubated the plates at 30°C.

[00286] MYM media (Stuttard, 1982, J. Gen .Microbiol. 128:115-121) contains: 4 g maltose, 10 g malt ext, 4 g yeast extract, 20 g agar, pH 7.3, water to 1 L.

EXAMPLE 14: An exemplary method for the discovery of novel enzymes [00287] The following describes a method for the discovery of novel enzymes requiring large substrates (e.g., cellulases, amylases, xylanases) using the ultra high throughput capacity of the flow cytometer. As these substrates are too large to get into a bacterial cell, a strategy other than single infracellular detection must be employed in order to use the flow cytometer.

[00288] For this purpose, we have adapted the gel microdrop (GMD) technology (One Cell Systems, Inc.) Specifically, the enzyme substrate is captured within the GMD and the enzyme allowed to hydrolyze the substrate within this microenvironment. However, this method is not limited to any particular gel microdrop technology. Any microdrop-forming material that can be derivatized with a capture molecule can be used.

[00289] The basic experimental design is as follows: Encapsulate individual bacteria containing DNA libraries within the GMDs and allow the bacteria to grow to a colony size containing hundreds to thousands of cells each. The GMDs are made with agarose derivatized with biotin, which is commercially available (One Cell Systems). After appropriate colony growth, streptavidin is added to serve as a bridge between a biotinylated substrate and the biotin-labeled agarose. Finally, the biotinylated substrate will be added to the GMD and captured within the GMD through the biotin-streptavidin- biotin bridge.

[00290] The bacterial cells will be lysed and the enzyme released from the cells. The enzyme will catalyze the hydrolysis of the subsfrate, thereby increasing the fluorescence of the substrate within the GMD. The fluorescent substrate will be retained within GMD through the biotin-streptavidin-biotin bridge and thus, will allow isolation of the GMD based on fluorescence using the flow cytometer. The entire microdrop will be sorted and the DNA from the bacterial colony recovered using PCR techniques. This technique can be applied to the discovery of any enzyme that hydrolyzes a substrate with the result of an increased fluorescence. Examples include but are not limited to glycosidases, proteases, lipases, ferullic acid esterases, secondary amidases, and the like.

[00291] One system uses a biotin capture system to retain secreted antibodies within the GMD. The system is designed to isolate hybridomas that secrete high levels of a desired antibody. This basic design is to form a biotin-streptavidin-biotin sandwich using the biotinylated agarose, streptavidin, and a biotinylated capture antibody that recognizes the secreted antibody. The "captured" antibody is detected by a fluoresceinated reporter antibody. The flow cytometer is then used to isolate the microdrop based on increased fluorescence intensity. The potentially unique aspect to the method described here is the use of large fluorogenic substrates for the determination of enzyme activity within the

GMD. Additionally, this example uses bacterial cells containing DNA libraries instead of eukaryotic cells and is not confined to secreted proteins as the bacterial cells will be lysed to allow access to the enzymes.

[00292] The fluorogenic substrates can be easily tailored to the particular enzyme of interest. Described below is a specific example of the chemical synthesis of an esterase substrate. Additionally, two examples are given which describe the different possible chemical combinations that can be used to make a wide variety of substrates.

Example of Reaction Sequence Leading to GMD-Attachable Substrate

[00293] In the first step, l-amino-l l-azido-3,6,9-frioxaundecane [Reference 3], an asymmetric spacer, is attached to N-hydroxysuccinamide ester of 5-carboxyfluorescein (Molecular Probes). After reduction of the azide functional group on the end of the attached spacer (step 2), activated biotin (Molecular Probes) is attached to the amine terminus (step 3), and the sequence is completed by esterification of phenolic groups of the fluorescein moiety (step 4). The resulting compound can be used as a substrate in screens for esterase activity. Design of GMD-Attachable Fluorogenic Substrates

[00294] Fluor - core fluorophore structure, capable of forming fluorogenic derivatives, e.g. coumarins, resorufms, xanthenes, and others. [00295] Spacer - a chemically inert moiety providing connection between biotin moiety and the fluorophore. Examples include alkanes and oligoethyleneglycols. The choice of the type and length of the spacer will affect synthetic routes to the desired products, physical properties of the products (such as solubility in various solvents), and the ability of biotin to bind to deep pockets in avidin.

[00296] CI, C2, C3, C4 - connector units, providing covalent links between the core fluorophore structure and other moieties. CI and C2 affect the specificity of the substrates towards different enzymes. C3 and C4 determine stability of the desired product and synthetic routes to it. Examples include ether, amine, amide, ester, urea, thiourea, and other moieties.

[00297] Rl and R2 - functional groups, attachment of which provides for quenching of fluorescence of the fluorophore. These groups determine the specificity of substrates towards different enzymes. Examples include sfraight and branched alkanes, mono- and oligosaccharides, unsaturated hydrocarbons and aromatic groups.

Design of GMD-Attachable Fluorescence Resonance Energy Transfer Substrates

[00255] Fluor - A fluorophore. Examples include acridines, coumarins, fluorescein, rhodamine, BODIPY, resorufin, porphyrins, etc. [00256] Quencher - A moiety, which is capable of quenching fluorescence of the fluorophore when located at a close enough distance. Quencher can be the same moiety as the fluorophore or a different one.

[00257] Polymer is a moiety, consisting of several blocks, a bond between which can be cleaved by an enzyme. Examples include amines, ethers, esters, amides, peptides, and oligosaccharides, CI and C2 are equivalent to C3 and C4 in the previous design.

[00258] Spacer is equivalent to Spacer in the previous design.

References: [1] Gray, F, Kenney, J.S., Dunne, J.F. Secretion capture and report web: use of affinity derivatized agarose microdroplets for the selection of hybridoma cells. J Immunol. Meth. 1995, 182, 155-163. [2] Powell, K.T. and Weaver, J.C. Gel microdroplets and flow cytometry: Rapid determination of antibody secretion by individual cells within a cell population. Bio/technology 1990, 8, 333-337. [3] Schwabacher, A. W.; Lane, J. W.; Schiesher, M. W.; Leigh, K. M.; Johnson, C. W. J. Org. Chem. 1998, 63, 1727 - 1729.

EXAMPLE 15: An exemplary ultra high throughput screen: a recombinant approach [00259] This example demonstrates an ultra high throughput screen for the discovery of novel anticancer agents. This method uses a recombinant approach to the discovery of bioactive molecules. The examples use complex DNA libraries from a mixed population of uncultured microorganisms that provide a vast source of natural products through recombinant expression from whole gene pathways. The two objectives of this Example include: [00298] 1) Engineering of mammalian cell lines as reporter cells for cancer targets to be used in ultra-high throughput assay system. [00299] 2) Detection of novel anticancer agents using an ultra high throughput FACS- based screening format. [00260] The present invention provides a new paradigm for screening technologies that brings the small molecule libraries and target together in a three dimensional ultra high throughput screen using the flow cytometer. In this format, it is possible to achieve screening rates of up to 10⁸ per day. The feasibility of this system is tested using assays focused on the discovery of novel anti-cancer agents in the areas of signal transduction and apoptosis. Development of a validated assay should have a profound impact on the rate of discovery of novel lead compounds. Experimental Design and Methods 1. Development of cell lines [00261] The goal of this example is to develop an ultra high throughput screening format that can be used to discover novel chemotherapeutic agents active against a range of molecular targets known to be important in cancers. The feasibility of this approach will be tested using mammalian cell lines that respond to activation of the epidermal growth factor receptor (EGFR) with induction of expression of a reporter protein. The EGFR- responsive cells will be brought together with our microbial expression host within a microdrop (see Example 13 and co-pending U.S. patent 6,280,926, and U.S. application Serial No. 09/894,956, both herein incorporated by reference). These expression hosts will be Streptomyces or E coli and will contain libraries derived from a mixed population of organisms, i.e. high molecular weight environmental DNA (10-lOOkb fragments) cloned into the appropriate vectors and transferred to the host. These large DNA fragments will contain biosynthetic operons which consist of the genes necessary to produce a bioactive small molecule. A bioactive molecule from the microbial host will elicit a biological response in the mammalian cell which will induce expression of a fluorescent reporter. The entire microdrop will be individually sorted on the flow cytometer based on fluorescence and the DNA from the host recovered. The mixed population libraries may contain from 10⁴-10¹⁰ clones, including 10⁵, 10⁶, 10⁷, 10^s, 10⁹, or any multiple thereof.

[00262] An assay based on the EGF receptor was chosen because of its possible role in the pathogenesis of several human cancers. The EGF-mediated signal transduction pathway is very well characterized and several inhibitors of the EGF receptor have been found from natural sources (21,22). The EGFR is one of the early oncogenes discovered (erbB) from the avian erythroblastosis refrovirus and due to a deletion of nearly all of the extracellular domain, is constitutively active (23). Similar types of mutations have been found in 20- 30% of cases of glioblastoma multiforme, a major human brain tumor (24). Overexpression of EGFR correlates with a poor prognosis in bladder cancer (25), breast cancer (26,27), and glioblastoma multiforme (28). Most of these cancers occur in an EGF- secreting background and demonstrates an autocrine growth mechanism in these cancers.

Additionally, EGFR is over-expressed in 40-80% of non-small cell lung cancers and EGF is over-expressed in half of primary lung cancers, with patient prognosis significantly reduced in cases with concurrent expression of EGFR and EGF (29,30). For these reasons, inhibitors of the EGF receptor are potentially useful as chemotherapeutic agents for the treatment of these cancers .

[00263] The goal of this experiment is to create mammalian cell lines that serve as reporter cells for anticancer agents. HeLa cells endogenously express the EGFR as confirmed by

FACS analysis using the anti-EGFR antibody, Ab-1 (Calbiochem). In contrast, CHO cells have little or no expression of the EGFR. The gene encoding EGFR was obtained from Dr. Gordon Gill (University of California, San Diego) and cloned it into the pcDNA3/hygro vector. The resulting vector was transfected into CHO cells and stable transformants selected with hygromycin. Enrichment of high EGFR-expressing CHO cells was performed through two rounds of FACS sorting using the anti-EGFR antibody. For detection of the activated pathway, a parallel approach is being taken utilizing both the

PathDetect system from Stratagene (San Diego, CA) and the Mercury Profiling system from Clontech (San Diego, CA). The Path Detect system has been validated by researchers as a means of detecting mitogenic stimuli (31,32).

[00264] The EGFR is a tyrosine kinase receptor that functions through the MAP-kinase pathway to activate the transcription factor Elk-1 (33). The PathDetect product includes a fusion trans-activator plasmid (pFA-Elkl) that encodes for expression of a fusion protein containing the activation domain of the Elk-1 transcription activator and the DNA binding domain of the yeast GAL4. A second plasmid contains a synthetic promoter with five tandem repeats of the yeast GAL4 binding sites that control expression of the Photinus pyralis luciferase gene. The luciferase gene was removed and replaced with the gene encoding for the destabilized version of the enhanced green fluorescent protein (EGFP) (plasmid designated pFR-d2EGFP). The two plasmids were transfected together into the EGFR/CHO and HeLa cells at a ratio of 10:1 (pFR-EGFP: pFA-Elkl) and stable transformants selected using the neomycin resistance gene located on the pFA-Elkl plasmid. Thus, ligand binding to the EGFR will initiate a signal transduction cascade that results in activation of the Elkl portion of the fusion protein, allowing the DNA binding domain of the yeast GAL4 to bind to its promoter and turn on expression of EGFP.

[00265] Stimulation in the presence of serum is not surprising as this signal transduction pathway is common to most growth factors and it is likely that many growth factors including EGF are present in the serum. After 24 hours of significant serum starvation, this response is greatly reduced (Figure 2A). The next step will be to selectively stimulate these cells with recombinant EGF (Calbiochem) and isolate the highly responsive single clones using the flow cytometer. These clones will be selected by sorting simultaneously for high levels of GFP and the EGFR. The EGFR will be detected using an anti-EGFR antibody with a secondary antibody labeled with phycoerythrin. This system has the advantage that use of the yeast GAL4 promoter in these cells should keep background or spurious induction of EGFP to a minimum. [00266] The second group of cell lines uses the Mercury Profiling system to assay the same EGFR pathway. This system responds to activation of the pathway with an increase in the expression of human placental secreted alkaline phosphatase (SEAP). A fluorescent signal will be obtained by the addition of the phosphatase subsfrate ELF-97-phosphate (Molecular Probes), which yields a bright fluorescent precipitate upon cleavage. The advantage of this approach over the PathDetect system is the ability to amplify the signal through enzyme catalysis for low-level activation of the pathway. This parallel approach will increase the probability of success in finding bioactive compounds. In the Mercury Profiling system, a vector containing the cis-acting enhancer element SRE and the TATA box from the thymidme kinase promoter is used to drive expression of alkaline phosphatase (pTA-SEAP). This system relies on the endogenous fransactivators present in the cell, such as Elk-1, to bind the SRE element on the vector and drive expression of SEAP upon stimulation of EGFR. The pTA-SEAP vector was transfected into the EGFR/CHO and HeLa cells and stable transformants selected using neomycin. Again, stimulation of the pathway occurred in the presence of serum factors in the media. Upon serum starvation, this response was greatly reduced (Figure 2B). Single high expressing clones will be isolated following stimulation with EGF and sorting using a flow cytometer.

Development of ultra high throughput FACS assay [00267] A complex mixed population libraries (>10⁶ primary clones/library) was generated that provided access to the untapped biodiversity that exist in the >99% uncultivable microorganisms. These novel libraries require the development of ultra high throughput screening methods to obtain complete coverage of the library. We propose developing an assay using the flow cytometer that allows detection of up to 10⁸ clones/day.

[00268] hi this assay format (Figure 1), an expression host (Streptomyces, E. coli) and a mammalian reporter cell will be co-encapsulated together within a microdrop. The microdrop holds the cells in close proximity to each other and provide a microenvironment that facilitates the exchange of biomolecules between the two cell types. The reporter cell will have a fluorescent readout and the entire microdrop will be run through the flow cytometer for clonal isolation. The DNA from the genes or pathway of interest will subsequently be recovered using in vitro molecular techniques. This assay format will be validated for the discovery of both EGFR inhibitors as well as for small molecules that induce apoptosis. With validation of this format, we will progress to the ultra high throughput screening phase designed to discover novel chemotherapeutic agents active against these important molecular mechanisms underlying tumorigenesis.

[00269] The feasibility of this approach will be analyzed initially using the engineered cell lines described above that respond to activation by EGF with increased expression of a reporter protein (i.e. EGFP or alkaline phosphatase). Additionally, this initial study will use an E. coli host that over-expresses human EGF as a secreted protein directed to the bacterial periplasm (34). This approach will allow us to validate the assay format prior to screening for inhibitors of the EGFR pathway using our E. coli and Streptomyces expression libraries. For this experiment, the engineered cell lines will be co-encapsulated together with the E. coli host at a ratio of one to one. The EGF-expressing bacteria will be allowed to grow and form a colony within the microdrop. Due to the vastly higher growth rate of bacteria, a colony of bacteria will form prior to any or minimal cell division of the eukaryotic cell. This colony will then provide a significantly increased concentration of the bioactive molecule. The bacterial colony will be selectively lysed using the antibiotic polymyxin at a concenfration that allows cell survival (35). This antibiotic acts to perforate bacterial cell walls and should result in the release of EGF from these cells without affecting the eukaryotic cell. In the final discovery assays, this lysis treatment should not be necessary as the small molecule products will likely be able to freely diffuse out of the cell. The EGF will activate the signal transduction pathway in the eukaryotic cell and turn on expression of the reporter protein.

[00270] The microdrops will be run through the flow cytometer and those microdrops exhibiting an increased fluorescence will be sorted. The DNA from the sorted microdrops will be recovered using PCR amplification of the insert encoding for EGF. For the reporter cells expressing secreted alkaline phosphatase, a couple of additional steps are required to achieve a fluorescent readout. As the enzyme is secreted from the cell, it is possible to prevent the diffusion of the protein from the microdrop by selectively capturing it within the matrix of the microdrop. This can be accomplished by using microdrops made with agarose derivatized with biotin. By forming a sandwich with streptavidin and a biotinylated anti-alkaline phosphatase antibody, it is possible to capture alkaline phosphatase where it can catalyze the conversion of the ELF-97 phosphate substrate within the microdrop (Figure 3A). This technique was successfully developed by One Cell Systems for the isolation of high expressing hybridomas (36, 37). In our hands, with the encapsulation of the SEAP expressing cells, we have shown that upon addition of the Elf-97 phosphatase subsfrate, a fluorescent precipitate forms within the microdrop (Figure 3B&C).

[00300] Initial experiments demonstrate the feasibility of co-encapsulating E. coli and mammalian cells (e.g., CHO) within microdrops. Microdrops were formed using 3% agarose dropped in oil and blended at 2600 rpm. The E. coli and CHO cells were encapsulated at a ratio of 1 : 1 (Figure 4A). After 6 hours, the single bacterial cell grew into a colony containing thousands of cells (Figure 4B). The cells within the microdrops were stained with propidium iodide to determine viability and approximately 70-85 % of the CHO cells remained viable after 24 hours. Subsequent steps include determining the response of encapsulated clonal EGF-responsive mammalian cells to varying concentrations of EGF in the presence and absence of EGFR inhibitors such as Tyrphostin A46 or Tyrphostin A48 (Calbiochem). In addition, E. coli clones producing high levels of secreted EGF will be isolated using the Quantikine human EGF immunoassay (R&D Systems). Finally, these two cell types will be brought together within the microdrop and a change in fluorescence of the eukaryotic cell will be analyzed on the flow cytometer in the presence and absence of the EGFR inhibitors. A positive result in this experiment would be an increase in fluorescence that can be blocked by the EGFR inhibitors.

[00301] The next step will be to mix the EGF-expressing E. coli with non-expressing cells at varying ratios from 1:1,000 to 1:1,000,000 to mimic the conditions of an mixed population library discovery screen. The bacterial mixtures and the mammalian cells will be co-encapsulated as described above. The highly fluorescent microdrops will be individually sorted by the flow cytometer. To confirm a positive hit, the DNA will be recovered by PCR amplification using primers directed against the EGF gene. To improve the signal to noise ratio, it is likely that it will be necessary to undergo several rounds of enrichment before isolation of positive EGF-expressing clones, especially for the higher mixture ratios.

[00302] In this case, the microdrops will first be sorted in bulk, the microdrop material removed with GELase (Epicentre Technologies) and the bacteria allowed to grow. The encapsulation protocol will be repeated with fresh eukaryotic cells until a highly enriched population is observed. At this point, single microdrops will be isolated and recovery of the EGF-expressing clone confirmed by PCR. With validation of this assay, the goal will be to screen for inhibitors of the EGFR using our mixed population libraries expressed in optimized E. coli and Streptomyces hosts. This assay will be done in the presence of EGF 05/012550

and the assay endpoint will be a decrease in fluorescence. This format is not limited to only EGFR inhibitors as any protein within this pathway could be inhibited and would appear positive in this screen. Likewise, this screen can also be adapted to the multitude of anti-cancer targets that are known to regulate gene expression. In fact, using this present system, with the addition of the appropriate receptors, it would be possible to screen for inhibitors of other growth factors such as PDGF and VEGF.

[00303] If an increase in fluorescence is not observed with co-encapsulation of the EGF-expressing cells and the mammalian reporter cell, there could be several reasons. First, it is possible that the EGF diffuses out of the cell too quickly to elicit a response. In this case, it will be necessary to modify the microdrops to limit diffusion and concentrate the bioactive molecule at the site of the reporter cell. It is also possible that in the specific case of the EGF assay, the cells will not continue to produce EGF after polymyxin treatment and thus, the incubation time of the reporter cells with EGF will be minimal. This is unlikely as the polymyxin treatment used will be at concentrations well below that which produces decreased cell viability. However, if EGF is not continually expressed in this system, other permeabilization methods will be explored that do not significantly affect cell metabolism, such as the bacteriocin release protein (BRP) system (Display Systems Biotech). The BRP opens the inner and outer membranes of E. coli in a controlled manner enabling protein release into the culture medium. This system can be used for large-scale protein production in a continuous culture and thus should be compatible with cell survival.

[00304] Apoptosis, or programmed cell death, is the process by which the cell undergoes genetically determined death in a predictable and reproducible sequence. This process is associated with distinct morphological and biochemical changes that distinguish apoptosis from necrosis. The malfunctioning of this essential process can often lead to cancer by allowing cells to proliferate when they should either self-destruct or stop dividing. Thus, the mechanisms underlying apoptosis are currently under intense scrutiny from the research community and the search for agents that induce apoptosis is a very active area of discovery.

[00305] The present invention provides an assay for the discovery of apoptotic molecules using our ultra high throughput encapsulation technology. The source of these small molecules will come from our extremely complex mixed population libraries expressed in Streptomyces and E. coli host strains. These host strains will be co- encapsulated together with a eukaryotic reporter cell, the small molecule will be produced in the bacterial strain, and will act on the mammalian reporter cell which will respond by induction of apoptosis. Apoptosis will be detected using a fluorescent marker, the entire microdrop sorted using the flow cytometer, and the DNA of interest recovered. The feasibility of this assay will be determined using our optimized Streptomyces host strain,

S. diversa, co-encapsulated with the apoptotic reporter cell derived from human T cell leukemia (e.g., Jurkat cells). The pathway controlling production of the anti-tumor antibiotic, bleomycin, will be cloned into S. diversa as the source of an apoptosis-inducing agent. The readout for induction of apoptosis in Jurkat cells will be obtained using the fluorescent marker, Alexis 488-annexin V™.

[00306] The bleomycin group of compounds are anti-tumor antibiotics that are currently being used clinically in the treatment of several types of tumors, notably squamous cell carcinomas and malignant lymphomas. However, widespread use of bleomycin congeners has been limited due to early drug resistance and the pulmonary toxicity that develops concurrent with administration of this drug. Thus, there is continuing effort to find novel small molecules with better clinical efficacy and lower toxicity. Bleomycin congeners are peptide/polyketide metabolites that function by binding to sequence selective regions of DNA and creating single and double stranded DNA breaks. Several in vitro and in vivo assays have shown that bleomycin induces apoptosis in eukaryotic cells (43-45). The biosynthetic gene cluster encoding for the production of bleomycin has recently been cloned from Streptomyces verticillus and is encoded on a contiguous 85 kb fragment (46). We propose to clone this pathway into a BAC vector to use as a source of apoptotic agents in eukaryotic cells. A library will be made from the S. verticillus ATCC 15003 strain and cloned into the BAC vector, pBlumate2. As the sequence for this pathway is known, probes will be designed against sequences from the 5' and 3' ends of the pathway. The library will be introduced into E. coli and screened using colony hybridization with the probe generated against one end of the pathway. Positive clones will subsequently be screened with the second probe to identify which clone contains the entire pathway. Clones containing the complete pathway will be transferred into our optimized expression host S. diversa by mating. Expression of bleomycin will be detected using whole cell bioassays with Bacillus subtillis.

[00307] Jurkat cells are the classic human cell line used for studies of apoptosis. The fluorescent Alexis 488 conjugate of annexin V (Molecular Probes) will be used as the marker of apoptosis in these cells. Annexin V binds to phosphotidylserine molecules normally located on the internal portion of the membrane in healthy cells. During early apoptosis, this molecule flips to the outer leaf of the membrane and can be detected on the cell surface using fluorescent markers such as the annexin V-conjugates. The bleomycin- induced apoptotic response in Jurkat cells will initially be characterized by varying both the concentrations of the exogenously administered drug and the incubation time with the drug. Alexis 488-annexin V will then be add to the cells and the level of fluorescence analyzed on the flow cytometer. Necrotic cell death will be determined using propidium iodide and the apoptotic population will be normalized to this value.

[00308] Co-encapsulation of S. diversa with CHO cells within microdrops produced very similar results to the E. coli co-encapsulation. S. diversa grew well in the eukaryotic media and the CHO cell survival rate was high after 24 hours. In this experiment, the S. diversa clone expressing bleomycin will be co-encapsulated with the Jurkat cell line. S. diversa will be allowed to grow into a colony within the microdrop and begin production of bleomycin. The microdrops will be periodically analyzed over time for induction of apoptosis using the Alexis 488-annexin V conjugate on the microscope and flow cytometer. After noting the time for induction of apoptosis, a mixing experiment similar to that described for the EGF experiment will be performed. Bleomycin-expressing and non-expressing cells will be mixed together at ratios of 1:1000 to 1:1,000,000. Co- encapsulation of the mixtures with Jurkat cells will be performed and the appropriate incubation time maintained. These microdrops will then be stained with Alexis 488- annexin V and sorted on the flow cytometer. Confirmation of a positive bleomycin- expressing sorted clone will be performed by PCR amplification of a portion of the pathway. Again, it is likely that enrichment of these mixtures will be necessary using a few rounds of bulking sorting on the flow cytometer.

[00309] If no apoptosis is observed in the initial assay, confirmation of bleomycin production will be performed by sorting of the encapsulated S. diversa clone into 1536 well plates. After a predetermined incubation period, the supernatant will be removed and spotted on filter disks for whole cell bioassays using the susceptible strain B. subtilis. Use of the 1536 well plates will hopefully avoid significant dilution of the antibiotic in the media. As cloning of the bleomycin pathway is quite recent, it has not yet been heterologously expressed from the complete pathway. However, Du et al demonstrated the heterologous bioconversion of the inactive aglycones into active bleomycin congeners by cloning a portion of the pathway into a S. lividans host (46). If bleomycin expression is not detectable in our assay, we will employ a similar strategy using our host strain S. diversa. If little bleomycin production is detected under these conditions, it will be necessary to optimize the culture conditions for S. diversa to induce pathway expression within the microdrop. On the other hand, if bleomycin is produced but apoptosis is not observed, it is possible that the molecule is diffusing away from the microdrop too quickly and it will be necessary to optimize the microdrop technology to concentrate the metabolite at the site of the reporter cell.

Optimization ofS. diversa secondary metabolite expression in microdrops

[00310] -Induction of pathway expression is an issue that is not limited to the bleomycin example. Bioactive small molecules within microorganisms are often produced to increase the host's ability to survive and proliferate. These compounds are generally thought to be nonessential for growth of the organism and are synthesized with the aid of genes involved in intermediary metabolism, hence the name "secondary metabolites." Thus, the pathways controlling expression of these secondary metabolites are often regulated under non-optimal conditions such as stress or nutrient limitation. As our system relies on use of the endogenous promoters and regulators, it might be necessary to optimize conditions for maximal pathway expression.

[00311] There are several methods that can used to optimize for increased pathway expression within the microdrops. For easy detection of maximal expression, we will construct a transposon containing a promoter-less GFP. The enhanced GFP optimized for eukaryotes will be used as it has a codon bias for high GC organisms. Transposition into a known pathway (e.g., actinorhodin) will be done in vitro and the vector containing the pathway purified. The transposants will be introduced into an E. coli host, screened for clones that express GFP, and positive clones isolated on the flow cytometer. With the transfer of the promoter-less gene for GFP into the pathway, increased fluorescence within the cells would demonstrate transcription of the pathway using the endogenous promoters located within the pathway. This clone will be used as a tool for quick detection of upregulation in pathway expression due to changes in the experimental conditions.

[00312] The S. diversa clone containing GFP and the actinorhodin pathway will be encapsulated in the microdrops and several different growth conditions will be tested, e.g., conditioned media, nutrient limiting media, known inducing factors, varying incubation times, etc. The microdrops will be analyzed under the microscope and on the flow cytometer to determine which conditions produce optimal expression of the pathway. These conditions will be verified for viability in eukaryotic cells as well. These optimized growth conditions will be confirmed using the bleomycin pathway to assess production of the secondary metabolite. Additionally, whole cell optimization of S. diversa is ongoing with production of strains that are missing different pleiofropic regulators that often negatively impact secondary metabolite production. As these strains are developed, they will be analyzed in the microdrops for enhanced pathway expression.

[00313] The proximity of the two cell types within the microdrop should result in a high concentration of the bioactive molecule at the site of the reporting cell. However, if rapid diffusion of the molecule from the microdrop prevents detection of the desired signal, it will be necessary to optimize the microdrop protocol or develop a new encapsulation technology. Concentration of the molecule at the site of the reporter cell could be achieved by a reduction in the microdrop pore size. Pore size reduction can be accomplished by one or a combination of the following approaches:

1. "plugging" the holes with particles of an appropriate size, which are held in the pores by non-covalent or covalent interactions; (ii) cross-linking of the microdrop-forming polymer with low molecular weight agents; (iii) creation of an external shell around the microdrop with pores of smaller size than those in the current microdrop. 2. Plugging the pores can be accomplished using polydisperse latexes with particles sized to fit within the pores of the microdrop. Latex particles may be modified on their surface such that they are attracted to the microdrop-forming polymer. For example, agarose-based microdrops carry a negative electrostatic charge on the surface. Thus, amidine- modified polystyrene latex particles (friterfacial Dynamics Corporation) will be attracted to the microdrop surface and the latex particles will effectively plug the microdrop pores provided that the charge density on the latex particles and the microdrop surface is high enough to sustain strong electrostatic bonds. 3. Cross-linking of agarose beads can be achieved by treating them with various reagents according to known procedures (47). For our purposes, the cross-linking needs to occur only on the surface of microdrop. Thus, it may be advantageous to use polymers carrying reactive groups for cross- linking of agarose, such that permeation of the cross-linking agent inside the microdrop is prevented. 4. Formation of classical (48) or polymerizable liposomes (49,50) around microdrops would provide a shell that could be an effective barrier even to small molecules. A wide variety of precursors for such liposomes as well as methods for their preparation have been reported (48-50) and most of them are applicable for our purposes. One of the possible limitations in choice of precursors stems from the intended use of microdrops for eventual screening by the flow cytometer. Thus, the liposomes should not absorb in the visible part of the spectrum.

[00314] It might also be necessary to use alternative methods and materials for preparation of the microdrops. Encapsulation of cells in polyacrylamide, alginate, fibrin, and other gel-forming polymers has been described (51). Another plausible candidate for encapsulation material is silica gel, which can be formed under physiological conditions with the assistance of enzymes (silicateins) (52) or enzyme mimetics (53). Additionally, various polymers may be used as the material for microdrop construction. Microdrops may be formed either upon polymerization of monomers (i.e. water-soluble acrylates or metacrylates) or upon gelation and/or cross-linking of preformed polymers (polyacrylates, polymetacrylates, polyvinyl alcohol). Since the formation of microdrops occurs simultaneously with encapsulation of living cells, such formation has to proceed under conditions compatible with cell survival. Thus, the precursors for microdrops (monomers or non-gelated polymers) should be soluble in aqueous media at physiological conditions and capable of the transformation into the microdrop material without any significant participation and/or emission of toxic compounds.

Example 16: Identification of a Novel Bioactivity or Biomolecule of Interest by Mass Spectroscopic Screening [00315] An integrated method for the high throughput identification of novel compounds derived from large insert libraries by Liquid Chromotography - Mass Spectrometry was performed as described below.

[00316] A library from a mixed population of organisms was prepared. An extract of the library was collected. Extracts from the libraries were either pooled or kept separate. Control extracts, without a bioactivity or biomolecule of interest were also prepared.

[00317] Rapid chromatography was used with each extract, or combination of extracts to aid the ionization of the compound in the spectra. Mass spectra were generated for the natural product expression host (e.g. S. venezuelae) and vector alone (e.g.pJO436) system. Mass spectra were also generated for the host cells containing the library extracts, alone or pooled. The spectra generated from multiple runs of either the background samples or the library samples were combined within each set to create a composite spectra. Composite spectra may be generated by using a percentage occurrence of an average intensity of each binned mass per time period or by using multiple aligned single mass spectra over a time period. By using a redundant sampling method where each sample was measured several times in the presence of other extracts, the novel signals that consistently occurred within a sample extract but not within the background spectra were determined.

[00318] The host-vector background spectrum was compared to the mass spectra obtained from large insert library clone extracts. Extra peaks observed in the large insert library clone extracts were considered as novel compounds and the cultures responsible for the extracts were selected for scale culture so the compound can be isolated and identified.

Novel metabolite identification by mass spectroscopic screening.

[00319] In integrated method for the high throughput identification of novel compounds derived from large insert libraries by LC-MS is described below. Liquid chromatography-mass spectrometry is used to determine the background mass spectra of the natural product expression host (e.g. S. diversa DS10 or DS4) and vector alone (e.g.pmΩJ) system. This host-vector background spectrum is compared to the mass spectra obtained from large insert library clone extracts. Extra peaks observed in the large insert library clone extracts are considered as novel compounds and the cultures responsible for the extracts are selected for scale culture so the compound can be isolated and identified.

[00320] In order to create the background and sample spectra, rapid chromatography is used to aid the ionization of the compounds in the extract. The spectra generated from multiple runs of either the background samples or the library samples are combined within each set to create a composite spectra. Composite spectra may be generated by using a percentage occurrence of an average intensity of each binned mass per time period or by using multiple aligned single mass specfra over a time period. Using a redundant sampling method where by each sample is measured several times in the presence of other extracts the novel signals that consistently occur within a sample extract but not present in the background spectra can be determined. The purpose of this invention is to identify novel compounds produced by recombinant genes encoding biosynthetic pathways without relying on the compounds having bioactivity. This detection method is expected to be more universal than bioactivity for identifying novel compounds.

[00321] Currently there is a similar method of examining culture mixtures by LC-MS with long chromatographic times (30-60 min) to bring compounds to a fairly high level of purity. This method relies on molecular weight searches for de-replication of known compounds. This slow method would also work to identify novel compounds in S. diversa libraries however the throughput would be inadequate for the number of samples we need to screen. There are a pair of publications describing rapid direct infusion analysis of samples to identify fermentation conditions which improve the biosynthetic productivity of strains. This method does not identify specific compound, it just correlates greater, more complex production with different culture conditions. Shown below are the following:

1. Chromatographic gradient and mass spec conditions • HPLC and MS setting for Mass Spec Screening.TXT 2. Pooling of samples sheet • Sampling Sfrategy.htm 3. Sample flow using average method • Mass Spec Screening Flow chart.doc 4. Matlab code for original average background • Mass Spec Screening Summaryβ Matlab code.txt 5. Matlab code under development for new single aligned peaks background determination for more accurate data analysis. • Mass Spec Screening 2^nd Data Analysis Program.txt [00322] The method is best practiced with a set of control extracts and sample extracts. Mixing of the compounds in pools prior to analysis and deconvolution of the mixed extract pools will provide high throughput while maintaining the ability to measure each extract several times.

[00323] A secondary screen may be required to eliminate false positives.

[00324] This method is more specific for identifying potential novel compounds by molecular ion than current methods. This method uses a different data analysis strategy than the de-replication methods for the identification of specific peaks for new compounds in extracts. Using the molecular ion as a signal to collect on this method may be coupled to mass based collection methods for the rapid isolation of compounds.

Related references: "Rapid Method to Estimate the Presence of Secondary Metabolites in Microbial", Higgs, R.E.;

Zahn, et al., Appl. Environ. Microbiol. 67:371-376.

"Use of direct-infusion electrospray mass spectrometry to guide empirical development of improved conditions for expression of secondary metabolites from Actinomycetes", Zahn, et al., Appl. Envron. Microbiol. 67:377-386. "A general method for the de-replication of flavonoid glycosides utilizing high performance liquid chromatography mass spectrometric analysis." Constant, et al., Phytochemical analysis,

1997, 8:176-180.

Method Information Gradient column analysis of crude extracts by positive ion mode. 1100 Quaternary Pump 1

Control Column Flow : 1.000 ml/min Stoptime : 4.00 min Posttime : Off

Solvents Solvent A : 98.0 % (Water) Solvent B : 0.0 % (MeOH) Solvent C : 2.0 % (AcCN) Solvent D : 0.0 % (iPrOH)

PressureLimits Minimum Pressure : 0 bar Maximum Pressure : 400 bar

Auxiliary Maximal Flow Ramp : 100.00 ml/min^Λ2 Primary Channel : Auto Compressibility : 100*10^Λ-6/bar Minimal Stroke : Auto

Store Parameters Store Ratio A : Yes Store Ratio B : Yes Store Ratio C : Yes Store Ratio D : Yes Store Flow : Yes Store Pressure : Yes

Agilent 1100 Contacts Option Contact 1 : Open Contact 2 : Open Contact 3 : Open Contact 4 : Open

Timetable Time Solv.B Solv.C Solv.D Flow Pressure

0.00 0.0 2.0 0.0 1.000 0.01 0.0 2.0 0.0 0.30 0 0 95.0 0.0 1.50 0 0 95.0 0.0 1.60 0 0 2.0 0.0 4.00 0 0 2.0 0.0 Agilent 1100 Contacts Option Timetable

Timetable is empty Agilent 1100 Diode Array Detector 1

Signals Signal Store Signal , Bw Reference, Bw [nm] A: Yes 215 4 450 100 B: No 254 4 450 100 C: No 280 4 450 100 D: NO 250 16 Off E: NO 280 16 Off

Spectrum Store Spectra Apex + Baselines Range from 190 nm Range to 600 nm Range step 2.00 nm Threshold 1.00 mAU

Time Stoptime ^: As pump Posttime : Off

Required Lamps UV lamp required : Yes Vis lamp required • Yes

Autobalance Prerun balancing : Yes Postrun balancing : No Margin for negative Absorbance: 100 mAU

Peakwidth : > 0.1 min

Slit : 4 nm

Analog Outputs Zero offset ana. out. 1: 5 % Zero offset ana. out. 2: 5 % Attenuation ana . out . 1 : 1000 mAU Attenuation ana . out . 2 : 1000 mAU Mass Spectrometer Detector

General Information

Use MSD Enabled

Ionization Mode APCI

Tune File atunes . tun

StopTime asPump

Time Filter Enabled

Data Storage Condensed

Peakwidth 0.15 min

Scan Speed Override Disabled

Signals [Signal 1]

Polarity : Positive

Fragmentor Ramp : Disabled

Polarity : Positive

Fragmentor Ramp : Disabled

[MSZones] Gas Temp 350 C maximum 350 C Vaporizer 375 C maximum 500 C DryingGas 3 . 0 1/min maximum 13 . 0 1/min Neb Pres 60 psig maximum 60 psig

VCap (Positive) 3000 V VCap (Negative) 3000 V Corona (Positive) 4.0 μA Corona (Negative) 15 μA

FIA Series

FIA Series in this Method Disabled Time Setting Time between Injections 1.00 min

Agilent 1100 Column Thermostat 1

Temperature settings Left temperature 35.0°C Right temperature Same as left Enable analysis When Temp, is within setpoint +/- 0.8°C Store left temperature Yes Store right temperature No

Time Stoptime : As pump Posttime : Off Column Switching Valve : Column 2 Timetable is empty [00325] During the process create a background file by looking for a certain percentage signal occurrence per mass unit. Use the Summary.m program to create this background specfra for use later in step 5 below.

clear dir CompressCount=l ; TestFileData=[12 3445 56 67] MasterDir='C:\HPCHEM\lVDATA\MS20FEBA\IND4TST'; % User inputed directory containing other directories with files cd(MasterDir); MasterDirFiles = dir % Load all files in master directory to one variable. TotalFiles = size(MasterDirFiles) Origi-nal_Files=Original Files'; X=990099

% Loop to create compressed directory listing containing only directories. for ExfractDir= 1 :TotalFiles( 1,1) % Look through find directories in master directory if MasterDirFiles(ExtractDir).isdir==l % Test each dir item to see if it is a directory Is_Original_Files=sfrcmp(MasterDirFiles(ExtractDir).name, Original_Files); if not(Is_Original_Files) CompressedDirList(CompressCount).name = MasterDirFiles(ExtractDir).name; % assign new directories. CompressCount=CompressCount+l ; % Increment count compressed directories end end end CompressCount TotalDirectories=size(CompressedDirList); CompressCount=l ; for CompressCount= 3 :TotalDirectories(l ,2) % Main loop for moving in and out of directories. CurrentDirectory = CompressedDirList(CompressCount).name; cd(CurrentDirectory); FileNameStub=char(pwd) % Loop to replace backslash in directory names to dash so directory names can be labels i=0; FileNameLength= size(FileNameStub) for i=l:FileNameLength(l,2) if FileNameStub(l ,i)=='V FileNameStub(l,i)='-' end end ListOfCsvFiles=dir('* .csv') PrintHistograms=0; % 1 means print histogram, 0 means no print. % Whether they are printed or not the files will be saved. spectra=[]; %

Clear spectra mass=109.8 %

Initial starting mass. CutoffPercent=40; % Cutoff percent to check if peak is consistently present specfra=dlmread(ListOfCsvFiles(l).name); % Loads first item in dir call into specfra sizespectra=size(spectra); % Determines size of first specfra loaded. master=[];d=l;SignalOne=[]; SignalTwo=[]; endspectra=0; format compact % Output form for any variables displayed during run. BiggestSpecfra=0; % Initialize the biggest specfra in batch BiggestObsMass=0; % frititialze the Biggest Observed mass in any spectra

% Routine to sort filenames into alphabetical order - should correspond to chronological order for % individual mass spectra. SizeDirList = size(ListOfCsvFiles); for FileNameOrder = 1 : SizeDirList(l,l) DataFileName(FileNameOrder, :) = ListOfCsvFiles(FileNameOrder) .name end SortedDataFileName = sortrows(DataFileName)

% Routine to prepare NameFile.Csv file for writing FileNames=strcat(FileNameStub,FileNameRoot); % Create full filename as a variable. NameFile=fopen(FileNames,'a+') % Open file to record filenames used to create master matrix NameOut=char('Mass'); rintf(NameFile,NameOut); φrintf(NameFile,'\n'); % Prints headerline of name file % loop to determine largest measured mass and to write filenames in output files % to allow matching filenames and columns from directory lists imported into summaryl for testlength= 1 : SizeDirList(l , 1 ) spectra=dlmread(SortedDataFileName(testlength,:)); sizespectra=size(spectra); if sizespectra( 1 , 1 )>BiggestSpectra BiggestSpectra=sizespectra( 1,1); end if specfra(sizespectra(l , 1), l)>BiggestObsMass BiggestObsMass=spectra(sizespecfra(l , 1), 1); end OddCol=((testlength*2)+l); EvenCol=testlength*2; Name(OddCol)=cellstr('X'); Name(EvenCol)=cellstr(SortedDataFileName(testlength,:)); NameOut=char(Name(EvenCol)) Spacer=char(Name(OddCol)) φrintf(NameFile,NameOut);

% Writes even rows filenames, with linebreak between. φrintf(NameFile,Spacer); φrmtf(NameFile,'\n'); % Writes odd row with the spacer, with a linebreak between, end fclose(NameFile); % Close the file with the file names. Name(l)=cellstr('Mass'); for i=l:(BiggestObsMass - 100) %loop to fill master matrix from 100 to high mass value master(i, l)=mass; %fills in the first column of master with mass units mass=mass+l; end for d=l:SizeDirList(l,l)% loop to bin spectral intensities into master matrix spectra=dlmread(SortedDataFileName(d,:)); % reads current file in to variable spectra mass=109.8; % Re initialize starting point sizemaster=size(master); mcol=d*2 ; sizespectra=size(spectra); % Print current index and current filename being operated on d FileNameStub SortedDataFileName(d) PreviousMass=0; PreviousIntensity=0; MaxColmIntensity(l,mcol)=0; %Sets column intensity to zero so a comparison can be made. MaxColmIntensity(l,mcol+l)=0; %Sets column intensity to zero so a comparison can be made. for i= 1 : sizemaster( 1,1) % loop that goes through every row of master, adding columns as spectral data is read j=i; endspectra=0; while spectra(j,l) < (mass+1) & endspectra==0 % loop that checks if there is a data point at a mass intensity=spectra(j,2); % Mass signal intensity is in column 2 of Masstab files smass=spectra(j, 1); % m/z value for each mass is in column 1 of

Masstab files. % InBin = Logical variable to determine if the current mass is in a bin InBin=((smass>=mass) & (smass < (mass+1)) & (intensity >0)); % InSameBin = Logical variable to determine if there is a second signal at the same mass as the previous one InSameBin=(PreviousMass>=mass & PreviousMass < (mass+1)) &(PreviousIntensity>0); if InBin & ~InSameBin % see the mass for the first time - generates SignalOne master(i,mcol)=spectra(j ,2); if intensity > MaxColmIntensity( 1 ,mcol) % determine largest value per column MaxColmIntensity(l,mcol)=intensity; % and store it in

MaxColmlntensity for later use. end end if InSameBin & fr-Bin % see the mass for the second time. master(i,(mcol+l))=spectra(j,2); % assign mass to master matrix in second signal column if intensity > MaxColmIntensity( 1 ,mcol+ 1 ) % determine largest value per second signal column MaxColumIntensity(l ,mcol+l)=intensity; % and store it in

MaxColmfritensity for later use. end end j=j+l; % this may not be working as I had hoped - should be comparing mass units, if j>sizespectra(l,l) % Do not look for more masses once the position in master has been reached endspectra=l; H-2; if j==0 % prevents j from being set to zero and putting spectra out of range j=i; end end PreviousMass=smass; Previousϊntensity=intensity; end mass=mass+l; end end mass OutputRoot=char('-ouφut.csv'); Output_File=strcat(FileNameStub,OuφutRoot); dlmwrite(Ouφut_File,master); % Write master matrix to file. sizemaster=size(master); SignalOne(l,l)=0; SignalTwo(l,l)=0; Even-Even'; Odd='Odd'; SignalOneNormalizedExists=0; SignalTwoNormalizedExists=0; % Loop to sort out the two signals into the SignalOne and SignalTwo matrices. % Will also create the relative intensity matrices SignalOnePercent and SignalTwoPercent % so that the signals can be analyzed on a relative intensity basis, for d=l :sizemaster(l,2) % Go through full length of the master matrix, d; for i=l:(BiggestObsMass - 100) % Go through all the masses, i; Halfd=d/2; master(i,d); % Put in the mass labels down the first column of the seperates signal files. SignalOne(i, l)=master(i, 1); SignalTwo(i, l)=master(i, 1); SignalOnePercent(i,l)=master(i,l); SignalTwoPercent(i, l)=master(i, 1); if Halfd==round(Halfd) % Put the even rows in SignalOne Comprsd_even_d=(d/2)+ 1 ; SignalOne(i,Comprsd_even_d)=master(i,d); if MaxColmIntensity(l,d)~=0 % Determine relative intensities of first signal. SignalOnePercent(i,Comprsd_even_d)=master(i,d)/MaxColmIntensity(l,d)*100; SignalOneNormalizedExists=l; % Flag to prevent SignalOnePercent save if empty end %Even end if Halfd~=round(Halfd) %Puts the odd rows in SignalTwo Comprsd_odd_d=round(Halfd); % size_signal_2=size(SignalTwo); if d <= sizemaster(l,2) % prevents out of range in master because of missing signal 2 column SignalTwo(i,Comprsd_odd_d)=master(i,d); if MaxColmIntensity(l,d)~=0 % Determine relative intensities of second signal. SignalTwoPercent(i,Comprsd_odd_d)=master(i,d)/MaxColmlntensity(l,d)*100; SignalTwoNormalizedExists=l; % Flag to prevent SignalOnePercent save if empty end %Odd end end end % i = end % d= SignallRoot=char('-SignalOne-ouφut.csv'); Signal_l_File=sfrcat(FileNameStub,SignallRoot); dlmwrite(Signal_l_File,SignalOne); % Write first signal data file. Signal2Root=char('-SignalTwo-ouφut.csv'); Signal_2_File=sfrcat(FileNameStub,Signal2Root); dlmwrite(Signal_2_File,SignalTwo); % Write second signal data file. if SignalOneNormalizedExists NormallRoot=char('-Normal-SignalOne-output.csv'); Normal_l_File=strcat(FileNameStub,NormallRoot); dlmwrite(Normal_l_File,SignalOnePercent); % Write first signal relative

(normalized) data file. end if SignalTwoNormalizedExists Normal2Root=char('-Normal-SignalTwo-output.csv') Normal_2_File=strcat(FileNameStub,Normal2Root); dlmwrite(Normal_2_File,SignalTwoPercent); % % Write second signal relative (normalized) data file, end % Procedure to create percentage occurrence summaries and to send out histograms of backgrounds. size_signal_l=size(SignalOne); size_signal_2=size(SignalTwo); ZeroPercent=0; TwoFivePercent=2.5; FivePercent=5; for row= 1 : size_signal_l (1,1) % Main loop to create counts at certain frequencies. row FileNameStub GreaterThanZero=0; %Initialize each counter per row. GreaterThanTwoFive=0; GreaterThanFive=0; for colm=2:size_signal_l(l,2) %colm % Count number of times a signal intensity occurs per mass unit, if SignalOnePercent(row,colm) > ZeroPercent GreaterThanZero=GreaterThanZero+ 1 ; end if SignalOnePercent(row,colm) > TwoFivePercent GreaterThanTwoFive=GreaterThanTwoFive+ 1 ; end if SignalOnePercent(row,colm) > FivePercent GreaterThanFive=GreaterThaι Five+l; end end % end column for loop % Determine percent times there is a signal per mass % First column of Summary=mass index, % Columns 2-4 of Summary = percent occurence of intensity. % Columns 5-7 of Summary = Greater than PercentCutoff Occurrence of signals per run. if SignalOneNormalizedExists Summaryl (row, l)=master(row, 1); Summary 1 (row,2)=GreaterTha-ιZero/(size_signal_l (1,2)-1)*100; Summary 1 (row,3)=GreaterThanTwoFive/(size_signal_l (1 ,2)- 1 )* 100; Summaryl(row,4)=GreaterThanFive/(size_signal_l(l,2)-l)*100; TwoColSummary(row, l)=master(row, 1); if Summary l(row,2)>=CutoffPercent Summaryl (row,5)=l; TwoColSummary(row,2)= 1 ; else Summaryl (row,5)=0; TwoColSummary(row,2)=0.01 ; end if Summaryl (row,3)>=CutoffPercent Summaryl(row,6)=l; else Summaryl(row,6)=0; end if Summaryl (row,4)>=CutoffPercent Summary 1 (row,7)= 1 ; else Summaryl (row,7)=0; end end % of if statement end % end row for loop. % Routine to write 6 col and 2 col summary file of peak occurrence, if SignalOneNormalizedExists SummaryRoot=char('-SignalOne-Summary.csv'); SummaryFile=strcat(FileNameStub,SummaryRoot); dlmwrite(SummaryFile,Summaryl); TwoColSummaryRoot=char('-SignalOne-TwoColSummary.csv'); TwoColSummaryFile=sfrcat(FileNameStub,TwoColSummaryRoot); % Use φrintf file save method to enter zeros into csv files. TwoColSummaryFileOpen = fopen(TwoColSummaryFile, 'a+') TwoColLength = size(TwoColSummary); i=0; for i=l :TwoColLength(l , 1) φrintf(TwoColSummaryFileOpen,'%f %c %f\r', TwoColSummary(i,l),',',TwoColSummary(i,2)); end %φrintf(TwoColSummaryFileOpen,'\n') fclose(TwoColSummaryFileOpen); %dlmwrite(TwoColSιu-ιma]yFile,TwoColSummary); end %Create histograms showing binning of percentage occurence, in 5 percent divisions. if SignalOneNormalizedExists figure(l);hist(Summaryl(:,2),20); OverZero- Occurence over 0% ~ '; FigureTitle=char('- 0% histogram'); TitleWord(l,:)=cellstr(OverZero); TitleWord(2, :)=cellstr(FileNameStub); xlabel('Percent Occurrence'); ylabel('Counts'); title(TitleWord); if PrintHistogr ams== 1 print end FileName=sfrcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName) figure(2);hist(Summaryl(:,3),20); OverTwoFive='Occurence over 2.5% intensity — ■'; FigureTitle=char('- 2.5% histogram'); TitleWord(l,:)=cellstr(OverTwoFive) TitleWord(2,:)=cellstr(FileNameStub); xlabel('Percent Occurrence'); ylabel('Counts'); title(TitleWord); if PrintHistograms== 1 print end FileName=strcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName) figure(3);hist(Summaryl (:,4),20); OverFive- Occurence over 5% intensity — '; FigureTitle=char('- 5% histogram'); TitleWord(l,:)=cellsfr(OverFive)

TitleWord(2,:)=cellsfr(FileNameStub); xlabel('Percent Occurrence'); ylabel('Counts'); title(TitleWord); ifPrintHistograms==l print end

FileName=strcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName)

% Create bar graphs showing positions observed more than 50% of the time figure(4);bar(Summary 1 (: , l),Summary 1 (: ,5));

OverZero2='Greater than 50% occurrence of signal over 0% FigureTitle=char('- 50% - 0% intensity');

TitleWord( 1 , :)=cellsfr(OverZero2)

TitleWord(2,:)=cellstr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrintHistograms== 1 print end

FileName=strcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName)

figure(5);bar(Summary 1 (:, l),Summary 1 (:,6));

OverTwoFive2- Greater than 50% occurrence of signal over 2.5% — ' FigureTitle=char('- 50% - 2.5% intensity');

TitleWord(l,:)=cellstr(OverTwoFive2)

TitleWord(2,:)=cellstr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrinfHistograms== 1 print end

FileName=sfrcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName)

figure(6);bar(Summaryl(:,l),Summaryl(:J)); OverFive2='Greater than 50% occurrence of signal over 5% FigureTitle=char('- 50% - 5% intensity');

TitleWord(l,:)=cellsfr(OverFive2) TitleWord(2,:)=cellsfr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrintHistograms== 1 print end FileName=sfrcat(FileNameStub,FigureTitle); ρrint('-djpeg','-r200*,FileName) % Create percent occurrence vs mass bar graph across all masses. figure(7);bar(Summaryl(:,l),Summaryl(:,2)); OverZero3- Percentage occurrence of signal over 0% — '; FigureTitle=char('- occur per mass at 0 percent'); TitleWord(l,:)=cellstr(OverZero3) TitleWord(2, :)=cellstr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrintHistograms== 1 print end FileName=sfrcat(FileNameStub,FigureTitle); ρrint('-djρeg','-r200',FileName) figure(8);bar(Summaryl(:,l),Summaryl(:,3)); OverTwoFive3='Percentage occurrence of signal over 2.5% ~ '; FigureTitle=char('- occur per mass at 2.5 percent'); Title Word( 1 , :)=cellstr(OverTwoFive3) TitleWord(2,:)=cellsfr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrintHistograms== 1 print end FileName=strcat(FileNameStub,FigureTitle); print('-djρeg','-r200',FileName) figure(9);bar(Summary 1 (: , l),Summary 1 (: ,4)); OverFive3='Percentage occurrence of signal over 5% — '; FigureTitle=char('- occur per mass at 5 percent'); TitleWord(l,:)=cellsfr(OverFive3) TitleWord(2,:)=cellsfr(FileNameStub); xlabel('Mass'); ylabel('Percent Occurrence'); title(TitleWord); if PrintHistograms= 1 print end FileName=strcat(FileNameStub,FigureTitle); print('-djpeg','-r200',FileName) end % of if SignalOneNormalizedExists statement. I l l %Return to matlab directory %cd C:\matlabrl l\work %to_ds %pwd dlmwrite('FILE.txt',TestFileData) cd . X % prints after while end % Main loop for moving in and out of directories.

% Aline 1. m

% % The program determines the average background value looking at the entire peak shape of the spectra.

% Will need another program to take the measured spectra of true samples and compare them to the average

% values of the average spectra determined here and the see if they fall within a certain percentage of the

% RMSD values to see if they are correct, clear dir CompressCount=l ; TestFileData=[12 34 45 56 67] %Test data for file written as test of program - remove later

MasterDir='C:\MATLABRl l\work\TestData'; % User inputed directory containing other directories with files cd(MasterDir); MasterDirFiles = dir % Load all files in master directory to one variable. TotalFiles = size(MasterDirFiles) OriginalJFiles-Original Files'; X=99099 % Value used to show completion of loop. % Loop to create compressed directory listing containing only directories, for ExtractDir=l :TotalFiles(l , 1) % Look through find directories in master directory if MasterDirFiles(ExtractDir).isdir=l % Test each dir item to see if it is a directory Is_Original_Files=strcmp(MasterDirFiles(ExtractDir).name, Original_Files); if not(Is_Original_Files) CompressedDirList(CompressCount).name = MasterDirFiles(ExtractDir).name; % assign new directories. CompressCount=CompressCount+l ; % Increment count compressed directories end end end TotalDirectories=size(CompressedDirList); CompressCount=l ; for CompressCount= 3 :TotalDirectories( 1,2) % Main loop for moving in and out of directories. CurrentDirectory = CompressedDirList(CompressCount).name; cd(CurrentDirectory); FileNameStub=char(pwd) % Loop to replace backslash in directory names to dash so directory names can be labels i=0; FileNameLength= size(FileNameStub) for i=l:FileNameLength(l,2) ifFileNameStub(l,i)=='V FileNameStub(l,i)='-' end end ListOfCsvFiles=dir('* .csv') Sρectra=[]; %

Clear Specfra mass=109.8 %

Initial starting mass. Spectra=dlmread(ListOfCsvFiles(l).name); % Loads first item in dir call into Spectra sizespectra=size(Specfra); % Determines size of first Specfra loaded. % master=[];d=l;SignalOne=[]; SignalTwo=[]; % Clear master, SignalOne,

SignalTwo endspectra=0; format compact % Ouφut form for any variables displayed during run. BiggestSpectra=0; % Initialize the biggest spectra in batch BiggestObsMass=0; % frititialze the Biggest Observed mass in any spectra FileNameRoot^('-Names.csv'); % Routine to sort filenames into alphabetical order - should correspond to chronological order for % individual mass spectra. SizeDirList = size(ListOfCsvFiles); for FileNameOrder = 1 : SizeDirList(l,l) DataFileName(FileNameOrder,:) = ListOfCsvFiles(FileNameOrder).name end SortedDataFileName = sortrows(DataFileName) % Routine to prepare NameFile.Csv file for writing FileNames=strcat(FileNameStub,FileNameRoot); % Create full filename as a variable. NameFile=fopen(FileNames,'a+') % Open file to record filenames used to create master matrix NameOut=char('Mass'); φrintf(NameFile,NameOut); φrintf(NameFile,V); % Prints headerline of name file % loop to determine largest measured mass and to write filenames in output files % to allow matching filenames and columns from directory lists imported into Aline for testlength= 1 : SizeDirList(l , 1 ) Spectra=dlmread(SortedDataFileName(testlength,:)); sizespecfra=size(Specfra); if sizespectra(l , l)>BiggestSpectra BiggestSpectra=sizespectra(l , 1); end if Spectra(sizespecfra( 1 , 1), 1 )>BiggestObsMass BiggestObsMass=Spectra(sizespecfra( 1 , 1 ), 1 ) ; end OddCol=((testlength*2)+l); EvenCol=testlength*2; Name(OddCol)=cellsfr('X'); Name(EvenCol)=cellstr(SortedDataFileName(testlength, :)); NameOut=char(Name(EvenCol)) Spacer=char(Name(OddCol)) φrintf(NameFile,NameOut); φrintf(NameFile,'\n'); % Writes even rows filenames, with linebreak between. φrintf(NameFile,Spacer); φrintf(NameFile,W); % Writes odd row with the spacer, with a linebreak between, end fclose(NameFile); % Close the file with the file names. Name(l)=cellstr('Mass'); %loop to fill first column of matrices from 100 to high mass value with the mass labels. for i=l:(BiggestObsMass - 100) MaxPositionMaster(i, l)=mass; AverageMaxPos(i, l)=mass; TruncAverageMaxPos(i,l)=mass; MaxPosDifference(i, l)=mass; MasterMeanShiftedSpectra(i,l) = mass; MasterStDevShiftedSpectra(i,l)=mass; mass=mass+l; end

%%%%%%%%%%%%%%%%%%%%%% MAIN LOOP TO ORGANIZE ROWS OF MASSES FROM DIFFERENT FILES %%%%%%%%%%%%%%%%%% % Main loop to: % 1) Read data row by row into master matrix % 2) Determine first maxima of each peak % 3) Determine average max position for each mass % 4) Determine amount to shift each specfra % 5) Shift each spectra the appropriate amount to align the maxima % 6) Determine the mean specfra by averaging intensity at each point. % 7) Determine the standard deviation between the measured spectra and the average. % 8) Record the row by row averages and RMSD's into a master matrix for saving to files at the end. for MassPosition = 1 :(BiggestObsMass-100) %Loop to open each file and read values into MasterMassRowMatrix %Item 1 above for FileNumber = 1 :SizeDirList(l,l) Specfra=[]; % Clear specfra for new values from next file. Spectra = dlmread(SortedDataFileName(FileNumber,:)); % Read specfra sequentially for MasterMassPerRow % Need a line here to test that we are not past the end of the file - test at start with constant width files. SizeCurrentSpectra = size(Spectra); if MassPosition <= SizeCurrentSpecfra(l,l) MasterMassPerRow(FileNumber,:) =

Specfra(MassPosition,2:SizeCurrentSpectra(l,2)); % transfer row to master matrix else MasterMassPerRow(FileNumber,:) = 0; end % FileNumber else end %%%%%%%%%%%%%%%%% %%%%% May have to insert a routine to generate a zerofilled rectangular maxtrix for later manipulations. %%%%%%%%%%%%%%%%%% SizeMasterMassPerRow = size(MasterMassPerRow); % Find position of first maxima in the current files. % Item 2 of above for CurrentFile = 1 : SizeMasterMassPerRow( 1,1) % go through rows one by one. NoPeak = l; % Set marker for no maxima PosMarker = 2 % Start Current colm position after the mass labels. % Item 1 from top of loop while NoPeak % loop continues until the first max is found in each row YesPeak = 0 % Set YesPeak to negative at start of scan. CurrentPosValue = MasterMassPerRow(CurrentFile,PosMarker);% set the current position as the center value ifPosMarker > 2 PreviousPosNalue = MasterMassPerRow(CurrentFile,PosMarker-l); % Get previous position value during scan, else PreviousPosNalue = 0; % if at beginning of row let every signal start with a zero value end % end if PosMarker >2 if PosMarker == SizeMasterMassPerRow(l,2) ΝextPosNalue = MasterMassPerRow(CurrentFile,PosMarker)% if at end of row set next value to current value ΝoPeak=0; % Jump out if at the end of the row. else NextPosNalue = MasterMassPerRow(CurrentFile,PosMarker+l); end % End of if PosMarker at end %Determine if these three points describe a peak. % YesPeak = logical variable to see if CurrentPos is top of peak. YesPeak = (PreviousPosNalue < CurrentPosNalue) & (CurrentPosNalue > ΝextPosNalue); if YesPeak % Record position of maximum in Master MaxPos Matrix % Rows are masses; columns are FileΝumber positions % Offset CurrentFile by 1 b/c first col'm is the mass label. MaxPositio-ιMaster( assPosition,CurrentFile+l) = PosMarker; ΝoPeak = 0; % Set

NoPeak so while loop can end and can check next row. end % of if YesPeak PosMarker = PosMarker+ 1 ; % Increment Pos

Marker to next position. if PosMarker > SizeMasterMassPerRow( 1 ,2) NoPeak = 0; end % if PosMarker end % While NoPeak. end % CurrentFile for loop % Item 3 - Determine the average position of maxima for each mass SumMaxPos=0; for Avelndex = 2:(SizeMasterMassPerRow(l,l)+l) SumMaxPos = SumMaxPos+MaxPositionMaster(MassPosition,Ave] dex); end % for Avelndex TruncAverageMaxPos(MassPosition,2)= fix(SumMaxPos/SizeMasterMassPerRow(l , 1)); % Item 4 from top of the MassPosition loop % If a peak is forward (smaller pos #) of the average maxima then the shift is positive, % if the peak is behind the average maxima then the shift is negative. for Avelndex = 2:(SizeMasterMassPerRow(l,l)+l)

MaxPosDifference(MassPosition,Avel dex)=MaxPositior-Master(MassPosition,AveIndex)- TruncAverageMaxPos(MassPosition,2); end % for Avelndex 2^nd time. % Determine the largest positive and negative shift that needs to be made % Continuation of item 4. SizeMaxPositionMaster=size(MaxPositior-Master); LargestPositiveShift=0 ; LargestNegativeShift=0; for i= 2:SizeMaxPositionMaster(l,2) if MaxPosDifference(MassPosition,i) > LargestPositiveShift LargestPositiveShift = MaxPosDifference(MassPosition,i) end if MaxPosDifference(MassPosition,i) < LargestNegativeShift LargestNegativeShift = MaxPosDifference(MassPosition,i) end end % for i loop. % Item 5 - Shift the spectra depending on the position of their maxima. % Fill the ShiftedSpectra matrix with the appropriately shifted spectra from

MasterMassPerRow. ShiftedMatrixWidth = LargestPositiveShift+abs(LargestNegativeShift)+SizeMasterMassPerRow(l,2); ShiftedSpecfra = zeros(SizeMasterMassPerRow( 1 , 1 ),ShiftedMatrixWidth); % zero fill new shifted specfra matrix SizeMaxPosDifference= size(MaxPosDifference); for Shift = 2:SizeMaxPosDifference(l,2); Startlndex = l+LargestPositiveShift-MaxPosDifference(MassPosition,Shift); FinalPosition = Startfrιdex+SizeMasterMassPerRow(l,2)-l; FileNumber=Shift-l; MasterMassIndex = 1; for Index = Startfrιdex:FmalPosition

ShiftedSpecfra(FileNumber,Ihdex)=MasterMassPerRow(FileNumber,MasterMassIndex); MasterMassIndex=MasterMassIndex+ 1 ; end % Index loop end % Shift loop % Item 6 - Create average intensity spectra for each row. SizeShiftedSpecfra=size(ShiftedSpectra); MeanShiftedSpectra=mean(ShiftedSpecfra); % Item 7 - Determine Standard Deviation for each column of aligned specfra StDevShiftedSpecfra=std(ShiftedSpectra); % Item 8 - Record the average shifted specfra per mass and the standard dev per position. MasterDim = size(ShiftedSpectra); MasterColWidth = MasterDim(l,2)+l; MasterMeanShiftedSpectra(MassPosition,2:MasterColWidth)=MeanShiftedSp ectra(l,:); MasterStDevShiftedSpecfra(MassPosition,2:MasterColWidth) = StDevShiftedSpectra(:,:); dlmwrite('MasterMeanShiftedSpecfra.csv',MasterMeanShiftedSpectra); dlmwrite('MasterStDevShiftedSpecfra.csv',MasterStDevShiftedSpectra); end % MassPosition loop dlmwrite('FILE.txt',TestFileData) cd .. X end % Compress Count

EXAMPLE 17: Plasmid DNA transformation protocol for Pseudomonas

Preparation of electroporation competent cells [00326] 1ml of overnight culture is inoculated into 100ml LB, bacteria are incubated in the 30°C shaker until OD 600 reading reaches 0.5-OJ. The bacteria are harvested by spinning @ 3000rpm for 10 minutes at 4°C.

[00327] The resulting cell pellet is washed with 100ml ice-cold ddH20, spun @ 3000rpm for 10 minutes at 4°C to collect the cells. The washing is repeated. The cells are then washed with 50ml 10% ice-cold glycerol(in ddH20) once and collected by spinning @ 3000rpm for 10 minutes at 4°C. The bacteria cell is resuspended into 2ml ice-cold 10% glycerol(in ddH20) 50ul or lOOul is aliquotted into each of the tubes and stored at -80°C. Electroporation [00328] lμl plasmid DNA is mixed with 50μl competent cell and kept on ice for 5 minutes. The mixture is transferred to a pre-chilled cuvette(0.2cm gap, Bio-Rad). The DNA is transformed into bacteria by electroporation with Bio-Rad machine. (Setting: Volts: 2.25KV; time: 5ms; capacitance: 25μF).

[00329] 300μl SOC medium is added to the cell mixture and bacteria are incubated at 30°C shaker for one hour. A certain amount of culture is spread on LA plate with antibiotics and the plates were incubated at 30°C.

EXAMPLE 18: Transformation of Yeast Cells by Electroporation [00330] One day before the experiment, 10 ml of YPD medium is inoculated with a single yeast colony of the strain to be transformed. It is grown overnight to saturation at 30°C. On the day of competent cell preparation, the total volume of yeast overnight culture is transferred to a 2L baffled flask containing 500 ml YPD medium. The culture is grown with vigorous shaking at 30°C to an OD600 reading of 0.8-1.0.

[00331] 500 ml of culture is harvested by centrifuging at 4000 x g, 4°C, for 5 min in autoclaved bottles. The supernatant is subsequently discarded. The cell pellet is washed in 250 ml cold sterile water. Washing is repeated twice. The supernatant is discarded.

[00332] The pellet is resuspended in 30 ml of ice-cold IM Sorbitol. The suspension is fransferred into a sterile 50 ml conical tube. The mixture is centrifuged in a GP-8 centrifuge 2000 rpm, 4°C for 10 min. The supernatant is discarded. The pellet is resuspended in 50μl of ice-cold IM Sorbitol. The final volume of resuspended yeast should be 1.0 to 1.5 ml and the final OD600 should be -200.

[00333] In a sterile, ice-cold 1.5-ml microcentrifuge tube, 40μl concenfrated yeast cells are mixed with lμg of DNA contained in ~5 μl. The mixture is fransferred to an ice-cold 0.2-cm-gap disposable electroporation cuvette and pulsed at 1.5 kV, 25 μF, 200 D. It should be noted that the time constant reported by the Gene Pulser will vary from 4.2 to 4.9 msec. Times <4 msec or the presence of a current arc (evidenced by a spark and smoke) indicate that the conductance of the yeast/DNA mixture is too high. [00334] 400 μl ice-cold IM sorbitol is added to the cuvette and the yeast is recovered, with gentle mixing. 200 μl aliquots of the east suspension should be spread directly on sorbitol selection plates. Incubate 3 to 6 days at 30°C until colonies appear.

Literature Cited 1. Gibbs, J.B., Mechanism-Based Target Identification and Drug Discovery in Cancer Research. Science 2000, 287, 1969-73

2. Garret, M.D., Workman, P. Discovering Novel Chemotherapeutic Drugs for the Third Millennium. Eur. J. Cancer 1999, 35, 2010-30

3. Hanahan, et al., The Hallmarks of Cancer. Cell 2000, 100, 57-70 4. Druker, et al., Lessons learned from the development of an Abl tyrosine kinase inhibitor for chronic myelogenous leukemia. J. Clin. Invest. 2000, 105, 3-7

5. Sikic, B.I., New Approaches in cancer treatment. Ann. One. 1999, 10, S149-S153

6. Gibbs, J.B., Anticancer drug targets: growth factors and growth factor signaling. J. Clin. Invest. 2000, 105, 9-13 7. Drews, J., Drug Discovery: A historical perspective. Science 2000, 287, 1960-64

8. Harvey, A.L., Medicines from nature: are natural products still relevant to drug discovery? Trends Pharmacol. Sci. 1999, 20, 196-197

9. Cragg, G.M., Newman, D J., Snader, K.M. Natural products in drug discovery and development. J. Nat. Prod. 1997, 60, 52-60 10. Verdine, G.L., The combinatorial chemistry of nature. Nature 1996, 384, 11-13

11. Demain, A.L., and J.E. Davies. Manual of industrial Microbiology and biotechnology; ASM Press: Washington D.C., 1999

12. Mc Daniel, R., et al., Rational design of aromatic polyketide natural products by recombinant assembly of enzymatic subunits. Nature 1995, 375, 549-554 13. Jacobsen, J.R., D.E. Cane, and C. Khosla, Spontaneous priming of a downstream module in 6-deoxyerythronolide B synthase leads to polyketide biosynthesis. Biochem. 1998, 37, 4928-4934

14. Donadio, S., McAlpine, J.B., Sheldon, P.J., Jackson, M., and Katz, L., An erythromycin analog produced by reprogramming of polyketide synthesis.Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 7119-23

15. Cortes, J. et al, Science, Repositioning of a domain in a modular polyketide synthase to promote specific chain cleavagel995, 268, 1487-89

16. Amann, R.I.L.W., Schleifer K.H., Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 1995, 59, 143-169 17. Robertson, D.E., et al. The discovery of new biocatalysts from microbial diversity. SIM News 1996, 46, 3-8

18. Stein, J.L., et al., Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine Archaeon. J. Bacteriol. 1996, 178, 591-599

19. Short, J.M., Recombinant approaches for accessing biodiversity. Nat. Biotechnol. 1997, 15, 1322-23

20. Sundberg, S.A., High-throughput and ultra-high-throughout screening: solution- and cell-based approaches. Curr. Opin. Biotech. 2000, 11, 47-53 21. Alvi, K.A., Pu, H., Asterriquinones produced by Aspergillus candidus inhibit binding of the Grb-2 adapter to phosphorylated EGF receptor tyrosine kinase. J. Antibiotics 1999, 52, 215-223 22. Levitzki, A., Gazit, A., Tyrosine Kinase inhibition: an approach to drug development. Science 1995, 267, 1782-88 23. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., and J.D. Watson, Molecular biology of the cell; Garland Publishing, Inc.: New York, 1994

24. Kolibaba, K.S., Druker, B.J., Protein tyrosine kinases and cancer. Biochim Biophysica Acta 1997, 1333, F217-F248

25. Neal, D.E., Sharpies, L., Smith, K., Fennelly, J., Hall, R.R., Harris, A.L., The epidermal growth factor receptor and the prognosis of bladder cancer. Cancer 1990, 65, 1619-25

26. Nicholson, S., Richard, J., Sainsbury, C, Halcrow, P., Kelly, P., Angus, B., Wright, C, Henry, J., Farndon, J., Harris, A., Epidermal growth factor receptor (EGFr) status associated with failure of primary endocrine therapy in elderly postmenopausal patients with breast cancer. Br. J. Cancer 1991, 63, 146-150

27. Klijn, J.G.M., Berns, P.M.J.J., Schmitz, P.I.M., Foekens, J.A., The clinical significance of epidermal growth factor receptor (EGF-R) in human breast cancer: a review on 5232 patients. Endocr. Rev. 1992, 12, 3-17

28. Hiesiger, E., Hayes, R., Pierz, D., Budzilovich, G., Prognostic relevance of epidermal growth factor receptor (EGF-R) and c-neu/erbB2 expression in glioblastomas (GBMs). Neurooncol. 1993, 16, 93-104

29. Tateishi, M., Ishida, T., Mitsudomi, T., Kaneko, S., Sugimachi, K., Immunohistochemical evidence of autocrine growth factors in adenocarcinoma of the human lung Cancer Res. 1990, 50, 7077-80 30. Gorgoulis, V., Aninos, D., Mikou, P., Kanavaros, P., Karameris, A., Joardanoglu, J., Rasidakis, A., Veslemes, M., Ozanne, B., Spandidos, D.A., Expression of EGF, TGF- alpha and EGFR in squamous cell lung carcinomas Anticancer Res. 1992, 12, 1183-87

31. Sharif, T.R., Sharif, M., A high throughput system for the evaluation of protein kinase C inhibitors based on Elkl transcriptional activation in human asfrocytoma cells. Int. J. One. 1999, 14, 327-335

32. Li, Q., Vaingankar, S.M., Green, H.M., Green, M.M., Activation of the 9E3/cCAF chemokine by phorbol esters occurs via multiple signal transduction pathways that converge to MEK1/ERK2 and activate the Elkl transcription factor. J Biol Chem 1999, 274, 15454

33. Treisman, R., Regulation of transcription by MAP kinase cascades. Curr. Opin. Cell Biol. 1996, 8, 205-215

34. Engler, D.A., Matsunami, R.K., Campion, S.R., Stringer, CD., Stevens, A., Niyogi, S., Cloning of authentic human epidermal growth factor as a bacterial secretory protein and its initial stmcture-function analysis by site-directed mutagenesis. J. Biol. Chem. 1988, 263, 12384-390

35. Salmelin, C, Hovinen, J., Vilpo, J., Polymyxin permeabilization as a tool to investigate cytotoxicity of therapeutic aromatic alkylators in DNA repair-deficient Escherichia coli strains. Mut. Res. 2000, 467, 129-138 36. Gray, F., Kenney, J.S., Dunne, J.F., Secretion capture and report web: use of affinity derivatized agarose microdroplets for the selection of hybridoma cells. J. Immunol. Methods 1995, 182, 155-163

37. Powell, K.T., Weaver, J.C., Gel microdroplets and flow cytometry: rapid determination of antibody secretion by individual cells within a cell population. Bio/Technology 1990, 8, 333-337

38. Jan van der Wal, F., Luirink, J., Oudega, B., Bacteriocin release proteins: made of action, structure, and biotechnological application. FEMS Biol. Rev 1995, 17, 381-399

39. Majno, G., Joris, I., Apoptosis, oncosis, and necrosis: an overview of cell death. Am. J. Pathol. 1995, 146, 3-15 40. Wyllie, A.H., Kerr, J.F.R., Currie, A.R., Cell death; the significance of apoptosis. Int. Rev. Cytol. 1980, 68, 251-356

41. Sikic, B.I., Rozencweig, M., Carter, S.K., Eds. Bleomycin chemotherapy; Academic Press: Orlando, FL, 1985

42. Deng, JL., Newman, D.J., Hecht, S.M., Use of COMPARE analysis to discover functional analogues of bleomycin. J. Nat. Prod. 2000, 63, 1269-72 43. Ortiz, L.A., Moroz, K., Liu, JY., Hoyle, G.W., Hammond, T., Hamilton, R., Holian, A., Banks, W., Brody, A.R., Friedman, M., Alveolar macrophage apoptosis and TNF- a, but not p53, expression correlate with murine, response to bleomycin. Am. J. Physiol. 1998, 275, L1208-L1218 44. Kumagai, T., Sugiyama, M., Protection of mammalian cells from the toxicity of bleomycin by expression of a bleomycin-binding protein gene from streptomyces verticillus. J. Biochem. 1998, 124, 835-841 45. Benitez-Bribiesca, L., Sanchez-Suarez, P., Oxidative damage, bleomycin, and gamma radiation induce different types of DNA strand breaks in normal lymphocytes and thymocytes. Ann. NY Academy Sci. 1999, 887, 133-149 46. Du, L., Sanchez, C, Chen, M., Edwards, D.J., Shen, B., The biosynthetic gene cluster for the antitumor drug bleomycin from Streptomyces verticillus ATCC 15003 supporting functional interactions between nonribosomal peptide synthetases and a polyketide synthase. Chem. & Biol. 2000, 7, 623-642 49.Guiseley, K. B. US Patent 3,956,273, Modified Agarose and Agar and Methods of Making Same. May 11, 1976.

50. Phospholipids Handbook; Cevc, G., Ed.; Marcel Dekker: New York, 1993.

51. Ringsdorf, H.; Schlarb, B.; Venzmer, J. Molecular Architecture and Function of Polymeric Oriented Systems: Models for Study of Organization, Surface Recognition, and Dynamics of Biomembranes. Angew. Chem., Int. Ed. Engl. 1988, 27, 113 - 158 and references cited therein.

52.O'Brien, D. F.; Ramaswami, V. Polymerized Vesicles. Encycl. Polym. Sci. Eng. 1989, 17, 108 - 135.

53. Nilsson, K.; Brodelius, P.; Mosbach, K. Entrapment of Microbial and Plant Cells in Beaded Polymers. Methods in Emzymology, 1987, 135, 222 - 230 and references cited therein.

54. Kroger, N.; Deutzmaim, R.; Sumper, M. Polycationic Peptides from Diatom Biosilica That Direct Silica Nanosphere Formation. Science 1999, 286, 1129-1132.

55. Cha, et al., Biomimetic Synthesis of Ordered Silica Structures Mediated by Block Copolypeptides. Nature 2000, 403, 289 - 292.

56. Bukanov, N. O., Demidov, V. V., Nielsen, P. E. & Frank-Kamenetskii, M. D. (1998). PD- loop: A complex of duplex DNA with an oligonucleotide. PNAS, 95 (10), 5516-5520.

57. Brenner, S., Williams, S. R., Vermaas, E.H., Storck, T., Moon, K., McCollum, C, Mao, J., Luo, S., Kirchner, J. J., Eletr, S., DuBridge, R. B., Burcham, T. & Albrecht, G. (1999). In vitro cloning of complex mixtures of DNA on microbeads: Physical separation of differentially expressed cDNAs. PNAS, 97 (4), 1665-1670. 58. Goryshin, I. Y., & Reznikoff, W. S. (1998). Tn5 in vitro transposition. J. Biol. Chem., 273, 7367-7374. 59. Jayasena, V. K. & Johnston, B. H. (1993). Complement-stabilized D-loop: RecA- catalyzed stable pairing of linear DNA molecules at internal sites. J. Mol. Biol., 230, 1015-1024.

60. Lohse, J., Dahl, O. & Nielsen, P. E. (1999). Double duplex invasion by peptide nucleic acid: A general principle for sequence-specific targeting of double-stranded DNA. PNAS, 96 (21), 11804-11808.

61. Sena, E. P. & Zarling, D. A. (1993). Targeting in linear DNA duplexes with two complementary probe strands for hybrid stability. Nature Genetics.

EXAMPLE 19: An Exemplary Novel High Throughput Cultivation Method [00335] An aspect of the invention provides a novel high throughput cultivation method based on the combination of a single cell encapsulation procedure with flow cytometry that enables cells to grow with nutrients that are present at environmental concentrations. The resulting microcolonies can then be amplified by multiple displacement amplification for subsequent analysis. [00336] Seawater was collected from sites located in the Sargasso Sea. Individual cells were concentrated from this seawater by tangential flow filtration and encapsulated in gel microdroplets (GMD). Similar GMDs have been used previously to grow bacteria¹² and for screening purposes^13"15. Single encapsulated cells (see Methods) were transferred into chromatography columns (referred to henceforth as growth columns). Different culture media selective for aerobic, nonphotofrophic organisms were pumped through the growth columns containing 10 million GMDs (Figure 24). The pore size of the GMDs allows the free exchange of nutrients. The encapsulated microorganisms were able to divide and form microcolonies of approximately 20 to 100 cells within the GMDs. Based on their distinctive light scattering signature, these microcolonies were detected and separated by flow cytometry at a rate of 5,000 GMDs per second. The increase in forward and side scatter was shown by microscopy to be directly proportional to the size of the microcolony grown within the GMD. This property enabled discrimination between unencapsulated single cells, empty or singly occupied GMDs, and GMDs containing a microcolony (Figure 25). [00337] To determine the optimal growth medium for a broad diversity of organisms, four media were tested in the growth columns: Organic rich medium diluted in seawater (marine medium); seawater amended with a mixture of amino acids; seawater amended with inorganic nutrients; and sterile filtered seawater (Figure 24). After five weeks of incubation, 1200 GMDs, each containing a microcolony, were collected by flow cytometry from each of the four growth columns. A 16S rRNA gene clone library was generated from each group of 1200 microcolonies and analysed. In diluted marine medium, only four bacterial species were identified, belonging to the genera Vibrio, Marinobacter or Cytophaga, all common sea water bacteria that have been cultivated previously³'⁹. The media containing amino acids or inorganic minerals revealed slightly more diversity.

Analysis of 50 clones derived from each medium yielded twelve different bacterial species from the amino acid supplemented medium, and eleven species from the inorganic medium. Filtered seawater alone (taken from the original sampling site) yielded the highest biodiversity (39 species out of 50 clones analysed), with many different phylogenetic groups represented. These results demonstrated that organisms capable of rapid growth outgrew their more fastidious neighbours in the presence of organic rich medium.

[00338] Growth columns were next inoculated with GMDs again generated from samples obtained from the Sargasso Sea, but now using only filtered seawater as growth medium. From each of two growth columns, 500 GMDs containing microcolonies were sorted, and the 16S rRNA genes contained therein were amplified by PCR. A 16S rRNA gene library was also constructed from the original environmental sample from which the microorganisms were obtained for encapsulation. Most of the environmental 16S rRNA sequences derived from this latter sample fell within the nine common bacterioplankton groups³'¹¹. In contrast, many of the 150 16S rRNA gene sequences obtained from the microcolonies fell into clades which contain no previously cultivated representatives (see supplementary information). Three of the most notable examples, described in more detail below, were clades affiliated with the Planctomycetes and relatives, the Cytophaga- Flavobacterium-Bacteroides and relatives, and the alpha subclass of Proteobacteria (Figure 26). None of these groups were detected within the environmental 16S rRNA gene clone library (167 clones analysed).

[00339] Five microcolony 16S rRNA gene sequences were related to the Planctomycetales, one of the main phylogenetic branches of the domain Bacteria³ (Figure 26a). Sequencing of cloned rRNA genes from marine environments had previously revealed several new, apparently uncultivated phylotypes within the Planctomycetales^16"18. Many of these new phylotypes fall within a single, highly diverse monophyletic clade that, prior to this study, contained no cultivated representatives. The five Planctomycetales- related microcolonies identified in this study form two separate lineages within this deep branching Planctomycetales clade (Figure 26a). One lineage, represented by sequences

GMD21C08, GMD14H10, and GMD14H07 (Figure 26a), was most closely related to 16S rRNA gene clone sequences recovered from bacteria associated with marine corals (84.9- 89.2% similar)¹⁷. The second lineage, represented by GMD16E07 and GMD15D02 (Figure 26a), form a unique line of desent within this clade, and are <84% similar to all previously published 16S rRNA gene sequences.

[00340] Two microcolony 16S rRNA gene sequences fell within the Cytophaga- Flavobacterium-Bacteroides and their relatives. These two closely related sequences form a lineage within a cluster of gene clone sequences from predominantly marine and hypersaline environments^19"21. This cluster occupies one of the deepest phylogenetic branches of the Cytophaga-Flavobacterium-Bacteroides and relatives group; only the

Rhodothermus/Salinibacter lineage is deeper²⁰. Within this cluster, the two microcolony gene sequences were nearly identical (>99% similar) to environmental 16S rRNA gene clone sequences obtained from seawater collected off of the Atlantic coast of the United States²¹ (Figure 26b). Analysis of Phase II cultures (see later) obtained from these sorted microcolonies (Figure 24) revealed a culture (strain GMDJE10E6) with an identical 16S rRNA gene sequence that reached an optical density (OD₆oo_nm) of 0.3 (Figure 26d).

[00341] A cluster of six microcolonies was recovered that was phylogenetically affiliated with a previously uncultivated lineage of 16S rRNA gene clone sequences within the alpha subclass of the Proteobacteria (Figure 26c). The microcolony sequences formed two subclusters; one was closely related to two 16S rRNA gene clone sequences recovered from marine samples taken from a coral reef (95.1-98.6%) similar) (GenBank U87483 and U87512); the second was moderately related to the same coral reef-associated environmental gene clones (87.9-95.7% similar).

[00342] Thus, the application of this novel high throughput cultivation method resulted in the growth and isolation of several bacteria representing previously uncultured phylotypes (see supplementary information). This reflects the ability of GMDs to permit the simultaneous and non-competitive growth of both slow and fast growing microorganisms in media with very low subsfrate concentrations. The physical separation of cells (contained in the GMDs within the growth columns), combined with flow cytometry isolation of microcolonies at different times of incubation, enabled the cultivation of a broad range of bacteria, and prevented over-growth by the fast growing microorganisms (the "microbial weeds")⁹.

[00343] To test if this novel high throughput cultivation method is applicable to different environments, we applied the technology to an alkaline lake sediment (Lake Bogoria, Kenya, data not shown) and to a soil sample (Ghana). Microorganisms from the soil sample were separated from the soil matrix, encapsulated and incubated in the growth column under aerobic conditions in the dark. Diluted soil extract, obtained from the same sample, was used as growth medium. The microcolonies were analysed by 16S rRNA gene sequencing. To cater for bacteria with disparate growth rates, microcolonies were separated from the growth column by flow cytometry at different time points. 16S rRNA gene sequence analysis revealed that many phylogenetically different microorganisms could be cultivated within the GMDs in Phase I (Figure 24) (see supplementary information). This approach can be extended to many other physiological and environmental conditions. For example, it was demonsfrated that encapsulated cells of Methanococcus thermolithotrophicus can grow and form microcolonies within GMDs when incubated under strictly anaerobic conditions.

[00344] Physiological studies, natural product screening or studies of cell-cell interaction require the ability to grow microorganisms to a certain cell mass. Therefore we designed experiments to determine if these microcolonies are able to serve as inocula for larger scale microbial cultures (Figure 24, Phase II). Encouragingly, earlier microscopic analysis had revealed that encapsulated bacteria could indeed grow out of GMDs when provided with a rich supply of nutrients. GMDs were obtained from a soil sample (Ghana), as described above. After growth in diluted soil extract medium, microcolonies were sorted into organic rich medium (Figure 24, Phase II). A total of 960 GMDs containing microcolonies, each derived from a single organism, were sorted into 96 well microtifre plates filled with organic rich medium (1 GMD per well). The 960 cultures were analysed for growth by measuring optical densities (ODβoo_nm)- After one week of incubation, 67% of the cultures showed turbidity above OD 0.1, corresponding to at least 10⁷ cells per millilifre. Cell densities were high enough to permit the detection of antifungal activity among some of the cultures (data not shown). To analyse the diversity within these cultures in more detail, 100 randomly picked cultures were analysed by 16S rRNA gene sequencing, revealing many different species (see supplementary information). The remaining 33% of the cultures that did not grow to measurable densities (fewer then 10⁶ cells per millilitre), showed bacterial growth when assessed microscopically. This is consistent with recent reports indicating that certain bacteria do not grow to cell densities greater than 10⁶ cells per millilitre¹¹.

[00345] In order to maintain and access microcolonies for physiological studies, we evaluated the minimal number of cells required for passaging by re-encapsulation and detection by flow cytometry. Flow cytometry analysis of 1000 and 100 individually encapsulated cells resulted in the detection of 360 and 15 microcolonies, respectively. Even when using cultures comprising just 10 bacterial cells, this method allowed recovery of, on average, one viable bacterial culture. This experiment demonstrates that it is possible to transfer, and therefore maintain, a culture of 100 cells derived directly from a microcolony.

[00346] GMDs separate microorganisms from each other, while still allowing the free flow of signalling molecules between different microcolonies. Therefore, this method might be applicable for the analysis of interactions between different organisms under in situ conditions, for example by inserting the encapsulated cells back into the environment (e.g. the open ocean). The simultaneous encapsulation of more than one cell (prokaryotic as well as eukaryotic) into one GMD might also be used to mimic conditions found in nature, allowing analysis of cell-cell interactions. Another advantage of this technology is the very sensitive detection of growth. This high throughput cultivation method allows the detection of microcolonies containing as few as 20 to 100 cells. Nutrient sparse media, such as seawater, were sufficient to support growth, and yet their carbon content was low enough to prevent "microbial weeds" from overgrowing slow growing microorganisms. We have demonsfrated that this technology can be used to culture thus far uncultivated microorganisms. The microcolonies obtained can then be used as inocula for further cultivation.

[00347] In combination with rRNA analysis and mixed organism recombinant screening approaches²²'²³, this technology will permit a more complete understanding of unexplored microbial communities. It will find applications in environmental microbiology, whole cell optimisation, and drug discovery. The combination of cultivation with direct DNA amplification from microcolonies will undoubtedly contribute to a broader understanding of microbial ecology by linking microbial diversity with metabolic potential. Methods: Sample collection

[00348] Water samples were collected in the Sargasso Sea (31°50' N 64°10'W and 32°05' N 64°30'W) at depths of 3m and 300m. For each sample, a volume of 130 1 was concenfrated by tangential flow filtration. Soil samples were collected from tropical forest

(05°56'N 00°03') and chaparral (05°55'N 00°03'W) in Ghana and combined in equal amounts. Cells were separated from the soil matrix by repeated sheering cycles followed by density gradient centrifugation²⁴.

Cell encapsulation and growth conditions [00349] Concentrated cell suspensions were used for encapsulation. Single occupied gel microdroplets (GMDs) were generated by using a CellSys 100™ microdrop maker (OneCell System) according to the manufacturer's instructions. Encapsulation of single cells was monitored by microscopy. The GMDs were dispensed into sterile chromatography columns XK-16 (Pharmacia Biotec) containing 25 ml of media. Columns were equipped with two sets of filter membranes (0.1 μm at the inlet of the column and 8 μm at the outlet). The filters prevented free-living cells contaminating the media reservoir and retained GMDs in the column while allowing free-living cells to be washed out.

[00350] Media were pumped through the column at a flow rate of 13 ml/h. Media used for incubation of marine samples were: Sargasso Sea water filter sterilized (SSW); SSW amended with NaN0₃ (4.25 g/1), K₂HP0₄ (0.016 g/1), NH₄C1 (0.27 g/1), trace metals and vitamins²⁵; SSW amended with amino acids at concentrations between 6 to 30 nM²⁶ and marine medium (R2A, Difco) diluted in SSW (1:100, vol/vol). Soil extracts were prepared as previously described²⁷ and added to the media at final concentrations of 25 to 40 ml 1 in 0.85% NaCI (vol/vol). GMDs were incubated in the columns for a period of at least 5 weeks. Microcolonies that were sorted individually into 96 well microtitre plates were grown with marine medium (R2A, Difco) in SSW or with soil extracts amended with glucose, peptone, and yeast extract (1 g/1) and humic acids extract 0.001% (vol/vol).

Flow cytometry

[00351] GMDs containing colonies were separated from free-living cells and empty GMDs by using a flow cytometer (MoFlo, Cytomation). Precise sorting was confirmed by microscopy. For the re-encapsulation experiment, a series of 1000, 100 and 10 Escherichia coli cells (expressing a green fluorescent protein, ZsGreen, Clontech), were individually encapsulated and incubated for three hours to form microcolonies within the GMDs. GMDs were analysed by flow cytometry and sorted.

Phylogenetic analysis [00352] Ribosomal RNA genes from environmental samples, microcolonies and cultures were amplified by PCR using general oligonucleotide primers (27F and 1392R) for the domain Bacteria. To avoid nonspecific amplification, PCR reactions were irradiated with an UV Stratalinker (Stratagene) at maximum intensity prior to template addition. After cloning (TOPO-TA, Invitrogen), inserts were screened by their restriction pattern obtained with Aval, BamHI, EcoRl, Hindl-H, Kpnl, and Xbal. Nearly full length 16S rRNA gene sequences were obtained and added to an aligned database of over 12,000 homologous 16S rRNA primary structures maintained with the ARB software package²⁸. Phylogenetic relationships were evaluated using evolutionary distance, parsimony, and maximum likelihood methods, and were tested with a wide range of bacterial phyla as outgroups²⁹. Hypervariable regions were masked from the alignment. The phylogenetic frees shown in Figure 26 demonstrates the most robust relationships observed, and was determined using evolutionary distances calculated with the Kimura 2-parameter model for nucleotide change and neighbour-joining. Bootstrap proportions from 1000 resamplings were determined using both evolutionary distance and parsimony methods. Short reference sequences were added to the phylogenetic trees with the parsimony insertion tool of ARB, and are indicated by dotted lines.

References

1. Pace, N. R. A molecular view of microbial diversity and the biosphere. Science 276, 734- 740 (1997).

2. Amann, R. I., Ludwig, W. & Schleifer, K.-H. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59, 143-169 (1995).

3. Giovannoni, S. J. & Rappe, M. in Microbial Ecology of the Ocean (ed. Kirchman, D. L.) 47-84 (Wiley-Liss Inc., 2000).

4. Fuhrman, J. A., McCallum, K. & Davis, A. A. Phylogenetic diversity of subsurface marine microbial communities from the Atlantic and Pacific Oceans. Appl Environ Microbiol 59, 1294-1302 (1993).

5. Kaeberlein, T., Lewis, K. & Epstein, S. S. Isolating "uncultivable" microorganisms in pure culture in a simulated natural environment. Science 296, 1127-1129 (2002). 6. Beja, O. et al. Bacterial rhodopsin: evidence for a new type of photofrophy in the sea. Science 289, 1902-1906 (2000).

7. Beja, O. et al. Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415, 630-633 (2002). 8. Ferguson, R. L., Buckley, E. N. & Palumbo, A. V. Response of marine bacterioplankton to differential filtration and confinement. Appl Environ Microbiol 47, 49-55 (1984).

9. Eilers, H., Pemthaler, J., Glockner, F. O. & Amann, R. Culturability and in situ abundance of pelagic bacteria from the North Sea. Appl Environ Microbiol 66, 3044-3051 (2000).

10. Xu, H. S. et al. Survival and viability of nonculturable Escherichia coli and Vibrio cholerae in the estuarine and marine environment. Microb Ecol 8, 313-323 (1982).

11. Rappe, M. S., Connon, S. A., Vergin, K. L. & Giovannoni, S. J. Cultivation of the ubiquitous SARI 1 marine bacterioplankton clade. Nature In press (2002).

12. Manome, A. et al. Application of gel microdroplet and flow cytometry techniques to selective enrichment of non-growing bacterial cells. FEMS Microbiol Lett 197, 29-33 (2001).

13. Short, J. M. & Keller, M. High throughput screening for novel enzymes. U.S. Patent No. 6,174,673B1 (2001).

14. Powell, K. T. & Weaver, J. C. Gel microdroplets and flow cytometry: rapid determination of antibody secretion by individual cells within a cell population. Bio/Technology 8, 333- 337 (1990).

15. Ryan, C, Nguyen, B. T. & Sullivan, S. J. Rapid assay for mycobacterial growth and antibiotic susceptibility using gel microdrop encapsulation. J Clin Microbiol 33, 1720- 1726 (1995).

16. Bowman, J. P., Rea, S. M., McCammon, S. A. & McMee n, T. A. Diversity and community structure within anoxic sediment from marine salinity meromicitc lakes and a coastal meromictic marine basin, Vestfold Hilds, Eastern Australia. Environ Microbiol 2, 227-237 (2000).

17. Frias-Lopez, J., ZerMe, A. L., Bonheyo, G. T. & Fouke, B. W. Partitioning of bacterial communities between seawater and healthy, black band diseased, and dead coral surfaces. Appl Environ Microbiol 68, 2214-2228 (2002).

18. Ravenschlag, K., Sahm, K., Pemthaler, J. & Amann, R. High bacterial diversity in permanently cold marine sediments. Appl Environ Microbiol 65, 3982-3989 (1999).

19. Tanner, M. A., Everett, C. L., Coleman, W. J., Yang, M. M. & Youvan, D. C. Complex microbial communities inhabiting sulfide-rich black mud from marine coastal environments. Biotechnology et alia 8, 1-16 (2000). 20. de Souza, M. P. et al. Identification and characterization of bacteria in a selenium- contaminated hypersaline evaporation pond. Appl Environ Microbiol 67, 3785-3794 (2001).

21. Kelly, K. M. & Chistoserdov, A. Y. Phylogenetic analysis of the succession of bacterial communities in the Great South Bay (Long Island). FEMS Microbiol Ecol 35, 85-95 (2001).

22. Short, J. M. Recombinant approaches for accessing biodiversity. Nature Biotechnology 15, 1322-1323 (1997).

23. Robertson, D. E., Mathur, E. J., Swanson, R. V., Marrs, B. L. & Short, J. M. The discovery of new biocatalysts from microbial diversity. SIM News 46, 3-8 (1996).

24. Fa3gri, A., Torsvik, V. L. & Goksδyr, J. Bacterial and fungal activities in soil: separation of bacteria and fungi by a rapid fractionated centrifugation technique. Soil Biol Biochem 9, 105-112 (1977).

25. Widdel, F. & Bak, F. in The Prokaryotes (eds. Balows, A., Truper, H. G., Dworkin, M., Harder, W. & Schleifer, K.-H.) 3352-3392 (Springer-Verlag, New York, 1992).

26. Ouverney, C. C. & Fuhrman, J. A. Marine planktonic archaea take up amino acids. Appl Environ Microbiol 66, 4829-4833 (2000).

27. Vobis, G. in The Prokaryotes (eds. Balows, A., Truper, H.G., Dworkin, M., Harder, W. & Schleifer, K.-H.) 1029-1060 (Springer-Verlag, New York, 1992). 28. Sfrunk, O. & Ludwig, W. in http://www.mikro.biologie.tu-muenchen.de (Department of Microbiology, Technische Universitat Munchen, Munich, Germany, 1998). 29. Ludwig, W. et al. Detection and in situ identification of representatives of a widely distributed new bacterial phylum. FEMS Microbiol Lett 153, 181-190 (1997).

EXAMPLE 20: Amplification of Trace Amounts of Environmental gDNA [00353] Figure 31 shows a schematic diagram of the procedure used to amplify trace amounts of environmental gDNA. The amplification proceeded as follows.

Template Preparation. [00354] Trace amounts of environmental, large fragment gDNA were encased in agarose. The agarose gel piece was then equilibrated by adding agarase buffer and incubating at room temperature for 1 hour. After removing the buffer, the agarose was melted by incubating at 70°C for 15 minutes. The melted agarose was then digested with agarase by incubating at 40°C overnight. Approximately lμl (or 1-100 ng) of this solution was used as the template for the amplification reaction. The solution can also be concentrated by ethanol or isopropanol precipitation, then used as the template for the amplification reaction.

Amplification.

[00355] l-100ng of the template was added to random primers (random 7-mers with an additional two nitroindole residues at the 5' end and a phosphorothioate linkage at the 3 ' end; GC-rich random hexamers can be added when template is GC-rich) at lOOμM final concenfration in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concenfration). The template was denatured by incubating the solution at 95°C for 3 minutes followed by cooling on ice. After cooling, deoxynucleoside triphosphates (dNTP) (lOOμM final concenfration), and Phi29 polymerase (Molecular Staging (lμL in a 50μL reaction), Amersham (lμL in a 20μL reaction)) in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concenfration) was added. The entire solution was incubated at 30°C for 3-16 hours.

Partway through the incubation period, extra dNTP, primers, and/or buffer may be added to increase the size of the product. Following amplification, the enzyme was heat inactivated at 65°C for 10 minutes.

[00356] Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced other than as particularly described.

Example 21: Amplification of Trace Amounts of Environmental gDNA A) Cut and Ligate Method:

Template Preparation. [00357] Trace amounts of whole E. coli cells, were encased in an agarose noodle, treated with lysozyme, proteinaseK, melted and digested with agarase. Preparation of the restriction digest may be done by any means known to those skilled in the art. The method used here to prepare the restriction digest was to mix 5uL of the template DNA, luL EcoRl Buffer (commercially available from New England BioLabs), 0.5uL EcoRl (commercially available from New England BioLabs), and 3.5 uL H₂0. The sample was incubated at 37°C for between 1-16 hours. The restriction enzyme was heat-inactivated at 65°C for 20 minutes. luL T4 DNA Ligase (commercially available from New England BioLabs) and 0.56 uL 20mM ATP (commercially available from Sigma) was added directly to the reaction. The sample was incubated at room temperature for between 1-16 hours. The template DNA is very dilute so that the DNA fragments will preferentially form self-ligated products (circles). The ligase was heat-inactivated at 65°C for 10 minutes. Approximately 2uL was used directly as template for amplification. Figure 32 shows the number of cells detectable as template resulting from this experiment.

Amplification.

[00358] Approximately 2uL of the template was added to random primers (random 7- mers with an additional two nitroindole residues at the 5' end and a phosphorothioate linkage at the 3' end; GC-rich random hexamers can be added when template is GC-rich) at lOOμM final concentration in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at

37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concentration). The template was denatured by incubating the solution at 95°C for 3 minutes followed by cooling on ice. After cooling, deoxynucleoside triphosphates (dNTP) (lOOμM final concenfration), and Phi29 polymerase (Molecular Staging (lμL in a 50μL reaction), Amersham (lμL in a 20μL reaction)) in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concenfration) was added. The entire solution was incubated at 30°C for 3-16 hours. Partway through the incubation period, extra dNTP, primers, and/or buffer may be added to increase the yield of the product. Following amplification, the enzyme was heat inactivated at 65°C for 10 minutes.

[00359] Samples were evalutated using GeneChip^® E. coli Antisense Genome Array technology (commercially available from Affymefrix).

[00360] Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced other than as particularly described.

References:

1) Lage, et al., Whole Genome Analysis of Genetic Alterations in Small DNA Samples Using Hyperbranched Strand Displacement Amplification and Array-CGH, Genome Research, 13:294-307 (2003).

2) Detter, et al., Isothermal Strand-Displacement Amplification Applications for High- Throughput Genomics, Genomics, Vol. 80, No.6 (Decmeber 2002). B) Shear and Ligate Method:

Template Preparation.

[00361] Trace amounts of environmental whole cells, are encased in an agarose noodle, treated with lysozyme, proteinaseK, melted and digested with agarase. The template DNA will be sheared by a shearing means (e.g., shearing machine (GeneMachines Hydroshear), 25 gauge needle, among others) known by those skilled in the art. The DNA ends will be filled in with a DNA polymerase. The DNA will be blunt ligated with T4 DNA Ligase. The ligated DNA will be used as the template for amplification.

Amplification.

[00362] 1-50 uL of the template is added to random primers (random 7-mers with an additional two nitroindole residues at the 5' end and a phosphorothioate linkage at the 3' end; GC-rich random hexamers can be added when template is GC-rich) at lOOμM final concentration in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concentration). The template is denatured by incubating the solution at 95°C for 3 minutes followed by cooling on ice. After cooling, deoxynucleoside triphosphates (dNTP) (lOOμM final concenfration), and Phi29 polymerase (Molecular Staging (lμL in a 50μL reaction), Amersham (lμL in a 20μL reaction)) in lx Buffer

Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concenfration) will be added. The entire solution will be incubated at 30°C for 3-16 hours. Partway through the incubation period, extra dNTP, primers, and/or buffer may be added to increase the yield of the product. Following amplification, the enzyme will be heat inactivated at 65°C for 10 minutes.

[00363] Samples will be evalutated using GeneChip^® E. coli Antisense Genome Array technology (commercially available from Affymefrix).

[00364] Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced other than as particularly described.

C) Re-amplification Method: [00365] In another aspect, the amplification process presented above may be performed iteratively on the whole amplification product from the previous amplification step. The template DNA may be prepared by any technique known by those skilled in the art. Amplification. [00366] 50 picograms - 5 ng of the E. coli DNA template was added to random primers (random 7-mers with an additional two nitroindole residues at the 5' end and a phosphorothioate linkage at the 3' end; GC-rich random hexamers can be added when template is GC-rich) at lOOμM final concentration in lx Buffer Y+/Tango™ (3.3mM Tris- acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concenfration). The template was denatured by incubating the solution at 95°C for 3 minutes followed by cooling on ice. After cooling, deoxynucleoside triphosphates (dNTP) (lOOμM final concenfration), and Phi29 polymerase (Molecular Staging (lμL in a 50μL reaction), Amersham (lμL in a 20μL reaction)) in lx Buffer Y+/Tango™ (3.3mM Tris-acetate (pH 7.9 at 37°C), lmM magnesium acetate, 6.6mM potassium acetate, lOμg/ml BSA) (MBI Fermentas) plus Tween (0.12% final concentration) is added. The entire solution is incubated at 30°C.

[00367] After 3 hours, the reaction components (minus additional template) were added again to the solution and incubated for an additional 3 hours. After the additional at least 1 hour, the reaction components (minus additional template) were added again to the solution and incubated an additional 3 hour3. The additional components, and additional incubations allowed otherwise unamplifiable samples to be amplified.

[00368] Samples will be evalutated using GeneChip^® E. coli Antisense Genome Array technology (commercially available from Affymefrix). EXAMPLE 22: LARGE INSERT FACS BIOPANNING PROTOCOL Procedures: Grow overnight culture of controls: [00325] Inoculate from a single colony on a plate into 3 mis growth media. Make a 1:10, 1:100, 1:1000 and 1:10,000 dilution of your starting culture to get a log-phase culture (OD = 0.4-1.0) after growth O/N. Grow overnight in a 30°C shaker at 200-220rρm.

Encapsulation and growth of colonies inside MiCs: [00326] Encapsulate 1 vial of 2.5% (2-3%) home-made SeaPlaque gel (BioWittaker Molecular Applications, Cat #: 50110, premade in PBS, pH7.2 and aliquot 1ml in cryovials). Warm up 17 ml mineral oil (One Cell System) in 42°C water bath. (It takes 40 minutes for the inner part of the oil to reach 42°C). [00327] Take lOOul melt frozen fosmid E. coli library, adjust OD600 = 0.5 (0.5-1.0) to encapsulate, centrifuge down to 25ul. Melt agarose gel by putting in 75°C water bath for at 5 minutes, vortex, add 190ul FBS (fetal bovine serum, heat inactivated, they are stored at - 80°C), vortex well and place in the 45°C water bath for 3 minutes. Vortex the culture well and add to the agarose, vortex and add to 17ml mineral oil. Shake for about 30 times, place on the One Cell machine. Blend at 2600rpm lmin at room temperature and 2600rpm 9 minutes on ice. Wash with PBS twice. Speed: 2500rpm, lOminutes twice.

[00328] Re-suspend in 10ml LB+Apr¹⁰⁰. Incubate at 23 °C room temperature on your bench statically in deep-well petri-dishes for 16hrs. Need to check the colony size after overnight. Grow the colonies to maximal size but most of them not bursting out yet. [00329] Centrifuge at 2500rpm for 10 min. MiCs are re-suspended in 10ml of 2xSSC. Filter with 40um cell strainers. They can be saved at 4 °C for several days (up to a week). Count MiCs by diluting 10 fold.

Formula: Avg = (Nl+N2+N3+N4)/4.

MIC concenfration (# of MIC/ml) = Avg.xl0 x 10 E4 Use lx 10^Λ7 MiCs in each hybridization reaction. 1. De-membrane with SDS : Resuspend in 8 ml 2xSSC/10% SDS. Incubate 45 (15- 60) min at room temperature rotating. Centrifuge at 2500rpm for 10 min at room temperature. (Look under microscope. They are semi-lysed-ghost looking.) 2. Lysis: Resuspend in 4ml lysis solution containing proteinase K. Incubate 30minutes (30min - lhour) 37°C rotating. Look under microscope to be sure of lysis. Centrifuge at 2500rpm for 10 min. Lysis Solution: (save at -20C for up to two months).

50mM Tris pH8 0.75ml IM Tris 1.5ml lM Tris 50mM EDTA 1.5ml 0.5M EDTA 3ml 0.5M EDTA lOOm-MNaCl 300 ul 5MnaCl 600ul 5MNaCl 1% Sarkosyl 0.75ml 20% Sarkosyl 1.5ml 20% Sarkosyl 250ug/ml Proteinase K 375ul proteinase K stock 750ul proteinase K (lOmg/ml in H₂0; Sigma: Catalog P2308, specific activity of stock solution 30unit/mg, make aliquot in lOmg/ml stock solution) 11.325ml dH20 22.65 dH20

Note: 37°C is endonuclease active temp. So do not incubate longer than 45minutes for endo+ strains.

3. Denature: Resuspend in 4ml denaturing solution. Incubate 30 min at RT shaking or rotating.

Centrifuge at 2500rpm for 10 min. Denaturing Solution: 0.5M NaOH/1.5M NaCl

4. Neutralize: Resuspend in 4 ml neutralizing solution. Incubate 30 min at RT shaking or rotating.

Centrifuge at 2500rρm for 10 min. Neutralizing Solution:

5. Wash in 4ml 2XSSC briefly. 6. Pre-hybridization:

Centrifuge and take out the 2XSSC. Add 1.2 ml (1-1.5ml) "DIG EASY HYB" and prehyb for 45 minutes at 37°C. Do prehyb and hyb in Personal Hyb Oven.

7. Hybridization:

From the last 15~20 minutes of prehybridization, aliquot oligo probe (21mer) (DIG- labeled probe from IDT, 19 - 30mer; to dissolve probe, add PCR H20 to cone, of

0.5nmol/ul, use 8ul each probe /RN, combine the double probes and denature at 85°C for 5 minutes, place on ice immediately. Add 16 ul of probe combination and return to rotating hybridization oven for O/N at 37°C.

8. Wash: Wash MiCs with 10 ml of 2XSSC/0.1 %SDS RT 15 min, rotating. 9. Prepare a 1% (lOmg/ml) solution of Blocking Reagent (Supplied with Molecular Probe TSA kit, see below) in PBS. 10. Strigent washes: Wash MiCs with 10ml of 0.1XSSC/0.1%SDS 2x times, each for 20min at appropriate temp (37 - 54°C), rotating. 11. Wash with lOml/reaction 2XSSC briefly. 12. Block: Block the reaction w/2.4ml Blocking Reagent in PBS at RT for 30 minutes. 13. Amplify with peroxidase: Add 24 ul anti-DIG-POD (so 1 : 100; Roche Diagnostics, Catalog: 1207733) and incubate at RT for 1 hour. 14. Wash: Wash MiCs w/ 10ml PBS/RN 3x 10 minutes at 37°C. 15. Add tyramide substrate: Prepare a tyramide working solution by diluting the tyramide stock solution (Molecular Probe, Catalog: T20932, make sure it is dissolved in the manufacture supplied DMSO. First time aliquot in 25ul/tube. Try not to use the leftover.) 1:100 in Amplification buffer/0.0015% H₂0₂ (the instructions can be found in the Molecular Probe Manual). Apply 1.2ml tyramide working solution at RT and incubate in the dark at RT for 20 minutes. 16. Wash: Wash 3X for 10 min in 10ml PBS buffer at 37°C. 17. Leave in PBS for microscopy and FACS sort. 18. Recover with RCA/Lambda packaging scheme. (See the protocol) 19. After getting colonies on a plate, send the plates to Hybridization group as soon as possible for lifting. 20. Do colony PCR with 5', internal, and 3' end primers for the hits on the film to see if the hits are real, if yes, if partial or full-length.

EXAMPLE 23: Large Insert FACS Biopanning Recovery Using MS RCA kit/Lambda Packaging

Materials Needed: Repli-g 625S (Rolling Circle Amplification) Kit (Molecular Staging, order from Fisher, Cat. #: 625S, store at -80°C) Max Plax Packaging Extracts NZY Medium 5X NZY Medium IM MgS0₄ 10mM MgSO₄ Microseal™ A Film (MJ Research, Inc. Part # MSA-5001) Applied Biosystems GeneAmp PCR System 9700 (or other thermocycler) Eppendorf thermomixer Spike (IM MgS0₄ and 20% Maltose)

Procedure: 1. Aliquot lOul H₂0 into PCR strip tubes/well for sorting. 2. FACS 10 (2 - 20) MiCs into H₂0, based on occupancy and fluorescence. Keep sort plate sterile (and on ice as much as possible). Be sure to include no-sort wells for use as RCA controls. 3. Add appropriate RCA controls to no-sort wells. 4. Seal reactions with PCR caps and denature for RCA at 95°C, 3 minutes in thermocycler. Ramp down to 4°C and transfer reactions to ice. 5. Prepare master reaction mix (on ice): 1RN: 4.8ul H20 5ul 4X Mix 0.2ul of DNA polymerase 10 ul total/reaction 6. Aliquot 10 ul reaction mix/reaction. Reseal wells with fresh caps. Mix gently. 7. Incubate at 30°C for 16-18 hours (overnight) in Perkin-Elmer thermocycler. 8. Heat inactivate at 65°C for 10 minutes in thermocycler. 9. Transfer to ice. 10. Remove 5ul/reaction for QC. Add 1 ul loading dye and run out on a 0.5% agarose gel. 11. Store remainder of reactions at -20°C for packaging. 12. Inoculate 50 ml NZY supplemented with 500 ul IM MgS0₄ (Cf=10mM) from XL 1-BlueMR plate. Incubate at 37°C, 250rpm overnight. 13. Add 5 ml from overnight culture to 44 ml NZY and 1 ml spike (0.5ml of IM MgS04 and 0.5ml of 20% Maltose, so Cf =10mM MgS04 and 0.2% Maltose). 14. Read absorbance at 600nm. Incubate at 37°C, 250rpm until OD₆₀₀=1.0. 15. Spin down culture at 4000 rpm, 10 minutes at 4°C. Remove supernatant. 16. Resuspend in the appropriate volume of lOmM MgS0₄ to adjust to OD₆₀₀=1.43. 17. Store cells at 4°C. 18. Thaw one tube (50 ul) of packaging extract for every two reactions at room temperature and transfer to ice. 19. Aliquot 2 (if lOMiCs are used) ul of each RCA reaction and add 8ul of 5T.1E to make it lOul into fresh labeled eppie tubes. If 20 MiCs are sorted, use lui of each RCA reaction and add 9ul of 5T.1E. 20. Add 25 ul thawed extract to each aliquot and mix by gentle pipetting (avoid bubble creation). 21. Incubate at 30°C, 90 minutes without shaking in Thermomixer. 22. Add an additional 25 ul thawed extract to each reaction and mix by gentle pipetting (avoid bubble creation). 23. Incubate at 30°C, 90 minutes without shaking in Thermomixer. 24. Remove reactions from Thermomixer. Set to 39°C. 25. Add 140 ul XLl-BlueMR cells (1.43 OD₆₀₀) to each reaction for a final OD₆₀₀=1.0 and a V =200 ul. 26. Incubate at 39°C, 15 minutes without shaking in Thermomixer. 27. Add 50 ul 5X NZY to each reaction (Vf=250 ul, CplX NZY). 28. Incubate at 37°C, 45 minutes in Thermomixer at 850 rpm. 29. Plate out each entire reaction on 150 mm LBAprlOO plates overnight at 37°C. 30. Confirm presence of target insert by plate hybridization. 31. Further confirmation of the insert by PCR after we get film back.

[00369] Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced other than as particularly described. TABLE 1 A2 Fluorescein conjugated casein (3.2 mol fluorescein/mol casein) CBZ— Ala— AMC t-BOC— Ala— Ala— Asp— AMC succinyl-Ala — Gly — Leu — MC CBZ— Arg— AMC CBZ— Met— AMC morphourea-Phe — AMC t-BOC = t-butoxy carbonyl, CBZ = carbonyl benzyloxy.

AMC = 7-amino-4-methyl coumarin

AD3

Fluorescein conjugated casein t-BOC— Ala— Ala— Asp— AFC CBZ— Ala— Ala— Lys— AFC succinyl-Ala— Ala— Phe— AFC succinyl-Ala — Gly — Leu — AFC AFC = 7-amino-4-frifluoromethyl coumarin)

AE3 AH3 Fluorescein conjugated succinyl-Ala — Ala — Phe — AFC casein CBZ— Phe— AFC CBZ— Trp— AFC

AF3 AI3 t-BOC— Ala— Ala— Asp— AFC succinyl-Ala— Gly— Leu— AFC

CBZ— Asp— AFC CBZ— Ala— AFC CBZ— Sewr— A-FC AG3 CBZ— Ala— Ala— Ly&— AFC CBZ— Arg— AFC

TABLE 2 L2

TABLE 2 (CONTINUED)

TABLE 2 (CONTINUED)

And all of L2

TABLE 3

TABLE 3 (CONTINUED)

TABLE 3 (CONTINUED)

TABLE 4

4-methyl umbelliferone wherein R =

G2 β-D-galactose β -D-glucose β -D-glucuronide

GB3 β -D-cellotrioside β -D-cellobiopyranoside

GC3 β -D-galactose α-D-galactose

CD3 β -D-glucose α-D-glucose

GE3 β -D-glucuronide

GI3 β -D-N,N-diacetylchitobiose

GJ3 β -D-fucose α-L-fucose β -L-fucose

GK3 β -D-mannose α-D-mamiose

Non-Umbelliferyl substrates

GA3 amylose [polyglucan α 1,4 linkages], amylopectin [polyglucan branching α 1,6 linkages] GF3 xylan [poly 1,4-D-xylan]

GG3 amylopectin, pullulan

GH3 sucrose, fructofuranoside

Claims

WHAT IS CLAIMED IS:

1. A method for making a gene library from trace amounts of DNA derived from a plurality of species of organisms comprising: (a) amplifying a substantial portion of the cDNA, gDNA, or genomic DNA fragments, wherein said amplifying is by multiple strand displacement amplification (MDA); and (b) ligating the cDNA, gDNA, or genomic DNA fragments to a DNA vector to generate a library of constructs in which genes are contained in the cDNA, gDNA, or genomic DNA fragments.

2. The method of claim 1, wherein the organisms comprise uncultured organisms.

3. The method of claim 1, wherein the organisms are derived from an environmental sample.

4. The method of claim 1 , wherein the organisms are derived from a contaminated environmental sample.

5. The method of claim 1, wherein the organisms comprise a mixture of terrestrial microorganisms or marine microorganisms, or a mixture of terrestrial microorganisms and marine microorganisms.

6. The method of claim 1, wherein the organisms are exfremophiles.

7. The method of claim 6, wherein the exfremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

8. The method of claim 1, wherein the cDNA or genomic fragments comprise at least an operon, or portions thereof, of the donor microorganisms.

9. The method of claim 8, wherein the operon encodes a complete or partial metabolic pathway.

10. A method of screening clones having DNA recovered from trace amounts of DNA derived from a plurality of species of uncultivated organisms, for a specified protein activity, which method comprises: (a) amplifying the trace amounts of DNA by multiple strand displacement amplification (MDA); (b) transforming a host cell with the DNA of (b) to produce a library of clones which is screened for the specified protein activity; and (c) screening for a specified protein activity in a library of clones prepared by recovering trace amounts of DNA from a DNA population derived from a plurality of species of organisms.

11. The method of claim 10, wherein the DNA is ligated into a vector prior to transforming the host cell.

12. The method of claim 11, wherein the vector comprises at least one DNA sequence capable of regulating production of a detectable enzyme activity from said DNA.

13. The method of claim 11, wherein the vector into which the DNA has been ligated is used to transform a host cell.

14. The method of claim 10, wherein the organisms are derived from an environmental sample.

15. The method of claim 10, wherein the organisms are derived from a contaminated environmental sample.

16. The method of claim 10, wherein the organisms are extremophiles.

17. The method of claim 16, wherein the extremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

18. A method for making a gene library from trace amounts of DNA isolated from trace amoxxnts of DNA from a mixed population of uncultivated cells comprising: (a) encapsulating individually, in a microenvironment, a plurality of cells from a mixed population of uncultivated cells; (b) placing the encapsulated cells in a growth column; (c) incubating the encapsulated cells in the growth column under conditions allowing the encapsulated cells to grow into a microcolonies containing trace amounts of DNA; (d) sorting the encapsulated microcolonies; (e) amplifying the trace amounts of DNA by multiple strand displacement amplification; and (f) ligating the amplified DNA to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

19. The method of claim 18, wherein the cells are derived from an environmental sample.

20. The method of claim 18, wherein the cells are derived from a contaminated environmental sample.

21. The method of claim 18, wherein the cells are extremophiles.

22. The method of claim 21, wherein the extremophiles comprise one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

23. A method for amplifying a DNA template from frace amounts of DNA derived from a plurality of species of organisms comprising: a) preparing a template from said cDNA, gDNA, or genomic DNA fragments, wherein trace amounts of cDNA, gDNA, or genomic DNA fragments are obtained from a plurality of species of organism; and b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection.

24. The method of claim 23, further comprising fragmenting the template.

25. The method of claim 23, wherein said frace amounts of cDNA, gDNA, or genomic DNA fragments are partially or completely digested.

26. The method of claim 24, wherein the template fragmentation is achieved by enzymatic, chemical, photometric, mechanical or any means that provides segments.

27. The method of claim 26, wherein the enzymatic fragmentation comprises use of a DNase or a restriction enzyme.

28. The method of claim 27, further comprising filling DNA ends by polymerase extension.

29. The method of claim 23, wherein said template is diluted to a degree sufficient to obtain substantially self-ligated products in the presence of ligase and ligase buffer.

30. The method of claim 23, wherein said template of step a) is circular.

31. The method of claim 29, wherein said substantially self-ligated products are used in said amplifying step.

32. The method of claim 23, wherein a phi29 polymerase is used in the amplifying step b).

33. The method of claim 26, wherein the mechanical means comprises use of a shearing means.

34. The method of claim 23, wherein the organisms comprise uncultured organisms.

35. The method of claim 23, wherein the at least one organism is derived from an environmental sample.

36. The method of claim 23, wherein the at least one of said organisms is derived from a contaminated environmental sample.

37. The method of claim 23, wherein the organisms comprise a mixture of terrestrial microorganisms or marine microorganisms, or a mixture of terrestrial microorganisms and marine microorganisms.

38. The method of claim 23, wherein the organism is an extremophile.

39. The method of claim 38, wherein the extremophile comprises one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

40. The method of claim 23, wherein the cDNA or genomic fragments comprise at least an operon, or portions thereof, of the donor microorganisms.

41. The method of claim 40, wherein the operon encodes a complete or partial metabolic pathway.

42. The method of claim 23, wherein said amplifying step is repeated.

43. A method for amplifying a DNA template from trace amounts of DNA derived from a plurality of species of organism comprising: (a) preparing a circular template from said cDNA, gDNA, or genomic DNA fragments; (b) amplifying the template of step b) by multiple strand displacement amplification (MDA) to provide sufficient DNA to detect; and (c) ligating the amplified DNA of step c) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

44. A method for amplifying one or more DNA templates contained in a DNA sample derived from a plurality of species of organism , wherein at least one DNA template is in trace amounts, comprising: (a) preparing a template from said cDNA, gDNA, or genomic DNA fragments; wherein trace amounts of cDNA, gDNA, or genomic DNA fragments are obtained from a plurality of species of organism; and (b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection.

45. A method for making a DNA template from trace amounts of DNA isolated from frace amounts of DNA from a mixed population of uncultivated cells composing: b) encapsulating each of a plurality of cells from a mixed population of uncultivated cells, in a microenvironment, wherein said cells contain cDNA, gDNA or genomic DNA fragments; c) preparing a template from said cDNA, gDNA, or genomic DNA fragments; d) amplifying the DNA of step b) by multiple strand displacement amplification (MDA); and e) ligating the amplified DNA of step c) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

46. The method of claim 44 or 45, further comprising fragmenting said template.

47. The method of claim 46, wherein said fragments are partially or completely digested.

48. The method of claim 46, wherein the template fragmentation is achieved by enzymatic, chemical, photometric, mechanical or any means that provides segments.

49. The method of claim 48, wherein the enzymatic fragmentation comprises use of a DNAse or a restriction enzyme.

50. The method of claim 49, further comprising filling DNA ends by polymerase extension.

51. The method of claim 44, wherein said template is diluted to a degree sufficient to obtain substantially self-ligated products in the presence of ligase and ligase buffer.

52. The method of claim 51 , wherein said substantially self-ligated products are used in said amplifying step.

53. The method of claimslδ, 23, 44 or 45, wherein phi29 polymerase is used in said amplifying step.

54. The method of claim 48, wherein the mechanical means comprises use of a shearing means.

55. The method of claims 44 or 45, wherein the cells are derived from an environmental sample.

56. The method of claims 44 or 45, wherein the cells are derived from a contaminated environmental sample.

57. The method of claims 44 or 45, wherein the cells are an extremophile.

58. The method of claim 57, wherein the extremophile comprises one or more organisms selected from the group consisting of thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, halophiles, alkalophiles, and acidophiles.

59. The method claim 45, wherein said microenvironment has frace amounts of cells from at least one species of organism.

60. The method of claims 18, 23, 44, or 45, wherein said amplifying step is performed by polymerase amplification.

61. A method for amplifying a DNA template from frace amounts of DNA derived from a plurality of species of organism comprising: a) preparing a template from said cDNA, gDNA, or genomic DNA fragments, wherein the cDNA, gDNA, or genomic DNA fragments are trace amounts from a plurality of species of organism; b) amplifying a substantial portion of said template from step a) by multiple strand displacement amplification (MDA) to provide sufficient amounts of cDNA, gDNA or genomic DNA fragments for detection; and c) ligating the amplified DNA of step b) to a DNA vector to generate a library of constructs in which genes are contained in the DNA.

62. The method of any one of claims 1, 18, 23, 44, or 45, further comprising biopanning, normalizing, ligating into a vector, directly transforming host cell, mutagenizing, expression screening, making a library, selection screen, sequencing, and/or any combination thereof of the amplified nucleic acid.

63. The method of claim 62, wherein sequencing is shotgun sequencing.

64. The method of claim 63, wherein the clones for sequencing are selected without prior probing or screening.

65. The method of any one of claim 1, 18, 23, 44, or 45, further comprising biopanning the amplified nucleic acid.

66. The method of any one of claim 1, 18, 23, 44, or 45, further comprising normalizing the amplified nucleic acid.

67. The method of any one of claim 1, 18, 23, 44, or 45, further comprising biopanning the amplified nucleic acid.

68. The method of any one of claims 1, 18, 23, 44, or 45, further comprising obtaining a sequence from said amplified nucleic acid.

69. The method of any one of claims 1, 18, 23, 44, or 45, further comprising obtaining a plurality of sequences, assembling two or more sequences to form a more competent sequence or genome or fragment thereof.

70. The method claim 69, further comprising searching the sequence in a database.