WO1999055886A1 - Function-based gene discovery - Google Patents

Function-based gene discovery Download PDF

Info

Publication number
WO1999055886A1
WO1999055886A1 PCT/US1999/008823 US9908823W WO9955886A1 WO 1999055886 A1 WO1999055886 A1 WO 1999055886A1 US 9908823 W US9908823 W US 9908823W WO 9955886 A1 WO9955886 A1 WO 9955886A1
Authority
WO
WIPO (PCT)
Prior art keywords
signaling pathway
population
readout
bar
genes
Prior art date
Application number
PCT/US1999/008823
Other languages
French (fr)
Inventor
Hui Cen
Shaojian Sun
Original Assignee
Genova Pharmaceuticals Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genova Pharmaceuticals Corporation filed Critical Genova Pharmaceuticals Corporation
Priority to AU35727/99A priority Critical patent/AU3572799A/en
Publication of WO1999055886A1 publication Critical patent/WO1999055886A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1086Preparation or screening of expression libraries, e.g. reporter assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries

Definitions

  • the present invention relates generally to the field of genomics. More particularly, the present invention relates to methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by functional expression in one or more biological readout assays. This invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to screen a cDNA library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or compartments prior to functional expression.
  • this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in readout cells. It is the ability to detect a biological readout in a readout cell line which enables the user to identify genes having specific functions.
  • the methods set forth herein are suitable for application in a high throughput format for identification of genes and their functions simultaneously.
  • drugs are known to work on only about 417 molecular targets in the human body and fewer than 80 molecular targets in bacteria, viruses and parasites (see Drews, 1996, Genomic sciences and the medicine of tomorrow, Nature Biotechnology 14, 1516-1518).
  • drugs achieve their desired effects by binding to specific cellular targets (e.g. receptors, ion channels, enzymes, and other proteins or molecules).
  • specific cellular targets e.g. receptors, ion channels, enzymes, and other proteins or molecules.
  • 5 breakthrough drugs for hypertension, depression, migraine, schizophrenia, and ulcers all act via specific receptors (Drews, 1996, Id.).
  • Protein drugs have been made possible by genetic engineering which has enabled industrial-scale protein production.
  • proteins drugs are, e.g., individuals who have had heart attacks and receive clot-dissolvers, individuals with renal failure who receive erythropoietin for anemia, and individuals with diabetes who receive recombinant human insulin.
  • Gene therapy may be defined as the introduction of genetic material into an individual for therapeutic benefit. Gene therapy may be used, for example, to correct detrimental genetic changes that occur in tumor cells, or to direct an individual's cells to produce a specific protein having therapeutic value. Although gene therapy still faces many technical hurdles, it offers promise for treatment of many disorders.
  • Such disorders include those associated with single genes, such as hemophilia, sickle cell anemia, thalassemia, Gaucher's Disease, Huntington's chorea, and many others. More complex, polygenic diseases, such as diabetes and Alzheimer's disease, are also likely to benefit from gene therapy.
  • gene-based diagnostics include tests for hemophilia A and B, phenylketonuria, retinoblastoma, and sickle-cell anemia.
  • New, gene-based diagnostics may also be used to enhance the success rate of an existing therapeutic by identifying specific individuals within an affected group who respond well to a specific drug therapy.
  • diagnostics may help in development of new therapeutics through enhancement of understanding of differences among people in response to various medicines.
  • EST databases are not, by themselves, sufficient to determine biological function. EST databases only suggest functional information to the extent that an EST encodes a domain of known function. Such databases do not provide any functional information for completely novel genes (i.e. genes not encoding any known domains or motifs).
  • the invention described herein provides high throughput methods which combine the simultaneous isolation of gene structure with identification of gene function and/or functional gene group.
  • the method is able to directly screen mammalian cDNA libraries (average size 10 6 clones) using mammalian cell systems for biological functions with specific cellular markers.
  • this strategy also yields all genes in the human genome which are involved in a particular biological functional process of interest.
  • this strategy makes it possible to automate the gene function screening process in a high throughput fashion. Accordingly, this invention provides a much more efficient way to characterize the function of the human genome, as described in detail in Section 5 below.
  • Array technology which represents the first attempt to go beyond single-gene methods of genome analysis, remains limited to characterization of gene expression as opposed to characterization of gene function. For example, one may use array technology to determine differential gene expression patterns in disease, thereby narrowing disease- gene candidates to a subset of genes.
  • the speed of gene function discovery is not likely to increase significantly. This is so since such an approach would identify not only genes which may contribute to the cause of a disease process but also genes having altered expression as a consequence of a disease process. Further, the number of genes in the latter category is likely to vastly outnumber those in the first category. Analyzing potentially hundreds of genes that may be implicated in a given disease by an expression analysis using a single-gene approach would quickly become an overwhelming task.
  • “Expression cloning method” European Patent Application Pub. No. 0 534 619 A2 employs antibodies or ligands to screen expression libraries. As a general proposition, however, these methods have often been designed for very specific purposes, i.e. for identification of a single gene, and therefore lack general utility. For example, one method utilized a transient COS cell expression assay and monoclonal antibody binding to identify CD28 (Aruffo and Seed, 1987, Proc. Natl. Acad. Sci. U.S.A. 84, 8573-8577).
  • This invention provides methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by functional expression in one or more biological readout assays. This invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to l c) screen a cDNA library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or compartments prior to heterologous expression. In one embodiment, this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in heterologous readout cells. It is the ability to detect a biological readout in heterologous cells which enables the user to identify genes having specific functions. The methods set forth herein are suitable for application in a high throughput format for identification of genes and their functions simultaneously.
  • This invention provides a method for enhancing the signal-to-noise ratio of a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting
  • the nucleic acid array is a biological array or a gene chip
  • the biological array comprises a vector carrying a plurality of complementary bar codes.
  • the plurality of complementary bar codes is immobilized on a support.
  • the support is nitrocellulose or nylon.
  • transfecting in situ is carried out using a chemical transfectant or electroporation.
  • the readout cell line is NIH 3T3 cells carrying a reporter gene under the control of a response element or promoter. Selection of the response element or promoter is guided by the particular readout assay selected.
  • the reporter gene is
  • the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1- inducible promoter and a p53-inducible promoter.
  • the cDNA library is tetracycline-inducible or estrogen inducible.
  • the biological readout assay detects genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • This invention provides a method for conducting a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using a nucleic acid array; (b) transfecting the library sorted in step (a) into a readout cell line in situ; and (c) conducting the biological readout assay.
  • the nucleic acid array is a biological array or a gene chip.
  • the biological array comprises a vector carrying a plurality of complementary bar codes immobilized on a support.
  • the plurality of complementary bar codes consists of from 10 2 to 10 8 complementary bar codes.
  • the support is nitrocellulose or nylon.
  • transfecting in situ is carried out using a chemical transfectant or electroporation.
  • the readout cell line is NTH 3T3 cells carrying a reporter gene under the control of a response element or promoter.
  • the reporter gene is selected from the group consisting of ⁇ - galactosidase, luciferase and chloramphenicol acetyltransferase.
  • the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1 -inducible promoter and a p53-inducible promoter.
  • the bar-coded cDNA library is tetracycline inducible or estrogen inducible.
  • the biological readout assay is capable of detecting genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • This invention provides a method for conducting a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using a nucleic acid array having a plurality of concave loci; (b) expressing the bar-coded cDNA library sorted in step (a) using in vitro transcription and translation to produce a population of proteins; and (c) screening the population of proteins produced in step (b) for a biochemical activity-of-interest, so as to conduct the biological readout assay.
  • the biochemical activity-of-interest screened in step (c) is selected from the group consisting of a receptor-binding activity, a ligand-binding activity and a growth factor activity.
  • screening is carried out by immobilizing the population of proteins on a solid support for use in a binding assay.
  • the solid support is nitrocellulose or nylon.
  • screening is carried out by placing the population of proteins in contact with readout cells for use in a biological activity assay.
  • This invention provides a method for identifying one or more genes-of-interest in a pre-sorted cDNA library comprising: (a) transfecting the pre-sorted cDNA library into a population of readout cells; and (b) screening the population of readout cells transfected in a biological readout assay, to identify one or more genes-of-interest.
  • the pre-sorted cDNA library comprises a bar-coded cDNA library hybridized to a nucleic acid array.
  • transfecting is carried out using chemical transfectants or electroporation.
  • the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • This invention provides a method of expression cloning one or more genes-of- interest in a cDNA library comprising: (a) sorting the cDNA library; (b) transfecting the sorted library into a readout cell line; and (c) identifying a positive signal from the transfected library in a biological readout assay, so as to expression clone one or more genes-of-interest in the cDNA library.
  • sorting the cDNA library is carried out using a nucleic acid array.
  • transfecting the sorted library is carried out using chemical transfectants or electroporation. the positive signal is identified by immunocytochemistry.
  • the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • This invention provides a method of sorting a cDNA library for use in an expression cloning assay comprising: (a) cloning a population of cDNA inserts into a population of bar-coded vectors; (b) preparing the population of bar-coded vectors for hybridization to a DNA array by exposing only the bar code region in single-stranded form; and (c) hybridizing the population of bar-coded vectors to a nucleic acid array to sort the cDNA library.
  • the nucleic acid array is selected from the group consisting of a gene chip and a biological array.
  • preparing the population of bar- coded vectors for hybridization to a DNA array by exposing only the bar code region in single-stranded form in step (b) is carried out using the following steps in the order stated: (a) digesting the population with a restriction endonuclease to linearize the population; (b) binding a DNA-binding protein to at least two sites on the population; and (c) digesting the population bound in step (b) to expose the single-stranded bar code region.
  • the DNA-binding protein is selected from the group consisting of a lactose repressor protein, a tetracycline repressor protein, E2F, API, SP1 and p53.
  • the restriction endonuclease is selected from the group consisting of NotI, Sfil and EcoRI.
  • digesting the vector population in step (c) is carried out using an enzyme selected from the group consisting of exonuclease III, T4 DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase and Pfu DNA polymerase.
  • FIG. 1 A phagemid vector for making a bar-coded cDNA library.
  • FIG. 2A-2B 2A. Preparation of a bar-coded vector. 2B. Preparation of a bar- coded cDNA library.
  • FIG. 3 Sorting a bar-coded cDNA library.
  • FIG. 4 Flow chart of gene identification methods from the step of in situ transfection to the step of cDNA retrieval.
  • FIG. 5 Illustration of a gene chip with a plurality of concave loci.
  • This invention provides methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by expression in one or more biological readout assays. This invention provides expression cloning methods enabling high-throughput library screening for determination of gene function. The invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to screen a cDNA expression library can be significantly enhanced by localizing multiple molecular copies of each unique clone into discrete regions or compartments.
  • a major advantage of the invention is to provide methods for assaying all genes in a cDNA expression library simultaneously, instead of one-at-a-time, under conditions in which the readout signal-to- noise ratio is significantly enhanced. Moreover, a rational basis for characterization of functional gene groups is provided where more than one gene is identified in any given readout assay.
  • this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in heterologous readout cells.
  • the vector "bar code” is an oligonucleotide
  • each unique bar code can serve as a specific primer for PCR and/or sequencing of a desired clone in a library.
  • a major advantage of the invention is to provide methods for assaying all genes in a cDNA library simultaneously for the ability to modify a specific biological function associated with a specific readout assay.
  • the pattern of gene activity in any given readout assay also provides a rational method for identification of functional gene groups.
  • the methods set forth are suitable for application in a high throughput format for rapid identification of genes and their functions. For example, such a high throughput format may easily screen, at least 10 4 , or from 10 4 to 10 6 , independent recombinants for functional activity at one time.
  • a complementary DNA (cDNA) library is prepared from messenger RNA (mRNA) obtained from a cell population of interest (e.g. a cell population may be derived from a specific tissue, disease, or biological state).
  • mRNA messenger RNA
  • a cDNA library may also be purchased commercially.
  • the cDNA is operably linked to an expression vector suitable for use with the invention.
  • Constructs are prepared and purified using standard recombinant DNA techniques as described in, e.g., Sambrook et al. (1989, Molecular
  • the library technology of the invention has, «ter alia, the following three features: (a) inducibility of library gene expression; (b) suitability for use with sense and antisense libraries; and (c) suitability for use with libraries from various disease and non-disease tissues and/or cells, and/or various stages of development of interest.
  • the user will note that when a bar-coded vector is used virtually any cDNA library vector is made suitable when modified to comprise a bar code as described herein. Such vectors do not require an inducible promoter because these cDNA libraries are directly transfected into readout cells without having to be propagated in virus-producing cells.
  • the method is suitable for use with a microscope-based, in situ approach for detection of various readouts.
  • Such readouts may include, but are not limited to: target protein expression; target mRNA expression; cellular localization changes; and/or cellular morphology changes.
  • target protein expression may include, but are not limited to: target protein expression; target mRNA expression; cellular localization changes; and/or cellular morphology changes.
  • Such single-format, microscope-based detection may be easily automated. Because screening of libraries and/or sub-libraries requires only a few days, the possibility of appearance of mutated genes during prolonged growth in cell culture, as with prior art methods, is largely eliminated.
  • the methods of the invention are suitable for use in high throughput screening of a large number of functional (i.e. readout) assays of interest.
  • functional assays include cell culture-based assays (i.e. cellular readout assays, see Section 5.5) that rely upon expression of genes in a library and detection of a functional effect of expression.
  • This accelerated time scale provides a two-fold advantage in that it (a) requires a reduced workload relative to procedures requiring longer assay time; and (b) vastly reduces the appearance of mutated genes arising from prolonged cell culture time.
  • the functional assay technology that can be used includes all existing immunostaining assays and biochemical assays. See Sections 5.5-5.7 below for a description of assays and Section 6 below for examples. Such assays are designed to identify genes involved in major disease categories as well as genes that regulate various cellular physiological functions.
  • mRNA messenger RNA
  • Any mRNA source may be used.
  • cells suitable as sources of mRNA from which a 5 cDNA library may be constructed include, but are not limited to, mammalian cells, bacterial cells, yeast cells, insect cells and amphibian cells.
  • the methods of the invention are preferred for use in screening mammalian cDNA libraries.
  • Suitable mammalian mRNA sources include tissues and cell lines.
  • Mammalian tissues that may be used include normal and disease tissues (e.g. carcinomas, lymphomas).
  • Mammalian cell lines that may be used include any of the cell lines available from the American Type Culture Collection (ATCC). Exemplary mammalian cell lines include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (e.g.
  • COS human hepatocellular carcinoma cells
  • human embryonic kidney cells e.g. HEK 293
  • mouse sertoli cells canine kidney cells (e.g. MDCK)
  • buffalo rat liver cells human lung cells, human liver cells and mouse mammary tumor cells.
  • Sense or antisense cDNA libraries may be generated by any method known in the art. Many such methods exist and examples may be found in Sambrook et al. and Ausubel et al., both of which are incorporated by reference herein in their entireties (Sambrook et al.,
  • the library may be an antisense library such that antisense polynucleotides are
  • antisense polynucleotides may, for example, provide a source of inhibition of a detectable cellular event in a functional assay.
  • an antisense expression library will identify one or more genes required for operation of a specific pathway by "knocking out” (i.e. rendering inoperative) such a pathway.
  • a library may be divided into subpools for screening. For example, from 100 to 1,000 subpools may be generated, each subpool comprising a cDNA diversity of from about 5 10,000 to about 1,000 clones, thus representing a cDNA diversity in all pools combined of about 1,000,000. Each library subpool may be individually (i.e. separately) expressed in a heterologous cell population.
  • a library may be a normalized cDNA library. Any cDNA library normalization technique known to one skilled in the art may be used with the methods of the invention. ° For example, see “Normalization and subtraction: two approaches to facilitate gene discovery,” Genome Research 6, 791-806 (1996).
  • a "genetic bar code” is an oligonucleotide tag or label having a specific sequence.
  • This invention provides a method of constructing a cDNA library in a vector containing a plurality of genetic bar codes at a diversity equal to or larger than the diversity of the cDNA library.
  • This invention provides methods for sorting and transfecting such a library. The methods employ a unique genetic bar code linked to each clone in a library for various uses
  • the human genome is believed to encode about 100,000 genes, and any given human cell or tissue may express from about 10,000 to about 50,000 of these genes.
  • a vector having about 10 6 unique genetic bar codes is 5 preferred since it is preferred that a unique genetic bar code be associated with each library clone.
  • a bar code having ten nucleotides and using all four possible bases at each position is capable of generating a set of genetic bar codes having a diversity of 4 10 or 1.048 x 10 6 .
  • oligos for specific and 0 efficient hybridization to complementary sequences may be chosen by the user.
  • oligos 15 to 20 nucleotides long having a diversity of 4 15 to 4 20 i.e., 10 9 to 10 12
  • oligos 15 to 20 nucleotides long having a diversity of 4 15 to 4 20 i.e., 10 9 to 10 12
  • 10 9 to 10 12 may be used to cover a library of 10 6 diversity to ensure that any two genetic bar codes are
  • the vector employed for use with a bar-coded cDNA library may be any vector incorporating a genetic bar code.
  • a suitable vector comprises the genetic bar code and a eukaryotic promoter.
  • a suitable vector comprises the genetic bar code, a eukaryotic promoter (e.g. a CMV promoter), a cDNA insert, an fl origin, an antibiotic resistance gene (e.g. ampicillin resistance gene), an SV40 origin and a ColE origin (see e.g. FIG. 1).
  • a eukaryotic promoter e.g. a CMV promoter
  • a cDNA insert e.g. a CMV promoter
  • an fl origin e.g. an antibiotic resistance gene
  • an SV40 origin e.g. ampicillin resistance gene
  • ColE origin see e.g. FIG. 1
  • sites 1 and 2 may be used for inserting the genetic bar code
  • sites 3 and 4 may be used for inserting cDNA
  • the fl origin may be used for making single-stranded DNA
  • the antibiotic resistance gene provides for growth selection of the phagemid vector in E coli.
  • the ColE and SV40 origins may be used to provide high copy number amplification of phagemid DNA in bacteria and eukaryotic cells, respectively (see FIG. 1).
  • a bar-coded library vector may be constructed as illustrated in FIG. 2A.
  • the vector and the bar code mixture are each digested with enzymes 1 and 2 (which cut at sites 1 and 2, respectively), and then ligated to produce the bar-coded library vector (FIG. 2A).
  • a bar-coded library may be constructed as illustrated in FIG. 2B.
  • Messenger RNA (mRNA) is reverse transcribed using an oligo-dT primer containing restriction site 3 using methods well known to those skilled in the art (see
  • FIG. 2B Following conversion into double-stranded cDNA, an adapter or linker containing restriction site 4 is ligated to the 5' end (relative to the sense strand). The resulting double-stranded cDNA bears site 4 at its 5' end and site 3 at its 3' end.
  • This cDNA and the bar-coded vector are each digested with enzymes 3 and 4 and ligated together to produce the double-stranded, bar-coded cDNA library. It is preferred that any library amplification be performed on plates, as opposed to in solution, to ensure equal amplification of all clones represented.
  • a population of double-stranded genetic bar codes is designed in dephosphorylated form and having a staggered restriction enzyme site at both ends of the bar code (e.g. EcoRI).
  • a staggered restriction enzyme site at both ends of the bar code (e.g. EcoRI).
  • An enzyme e.g. alkaline phosphatase
  • This method ensures that, after annealing of the two bar code strands, a double-stranded bar code is formed which lacks the phosphorylation which would be necessary for the formation of a bar code dimer.
  • zero background cloning system e.g., such a system is commercially available from InVitrogen
  • a zero background cloning system is a positive selection system for prokaryotic cloning which works by direct selection of inserts via disruption of the lethal gene ccdB (control of cell death). In such a system, only bacteria transformed with a genetic bar code inserted into the vector will survive and be propagated. In this way, the vector population generated will not contain any individual vectors lacking a bar code.
  • Sorting of a bar-coded library may be carried out using supports having bound thereto oligonucleotide sequences that are complementary to the genetic bar codes of the cDNA library.
  • a DNA sequence complementary to each genetic bar code in a bar-coded library is affixed (e.g. deposited or synthesized) at discrete locations of a nucleic acid array.
  • Natural or modified nucleotides can be used for synthesis. Use of certain modified nucleotides may promote formation of stronger bonds with the complementary bar code of a vector. In this regard, the bonding properties of common modified nucleotides such as phosphorothioates have been well described.
  • An array may be a commercially available gene chip (e.g. Affymatrix, Incyte) or may be manufactured using methods known in the art. Many such methods have been described (for a brief review, see Ramsay, January 1998, Nature Biotechnol. 16, 40-44).
  • the array is hybridized with the single stranded genetic bar code region of a
  • FIG. 3 a gene chip having a plurality of complementary genetic bar codes is shown. Each area of the chip (labeled A, B, C, and N) contains multiple copies of each unique complementary bar code. Following hybridization, each unique cDNA corresponding to each unique bar code is sorted into a discrete area on the chip (see bottom panel of FIG 7.)
  • Optimum hybridization conditions are used to ensure accurate base pairing between the various genetic bar codes of a library and their corresponding complements in a nucleic acid array so as to prevent or minimize mismatches.
  • the hybridization (a) separates library vector molecules encoding distinct recombinants from each other and (b) sorts all library vector molecules encoding the same recombinant to discrete, known locations. This separation and sorting operation results in an equal amount of DNA at each location. Abundant genes of a library will be represented at multiple locations on a chip while rare genes will be represented at only one or a few locations. Equivalent amounts of DNA are hybridized at each location.
  • a "biological array” may be manufactured and used to sort a bar-coded expression library.
  • a biological array is created from a library of sequences which are complementary to the bar codes of the expression vector.
  • the biological array is segregated into discrete locations ⁇ ons usi ⁇ ng the physiology of the microbe (e.g. bacteria or yeast), as follows. Since each microbe which takes up a plasmid DNA containing a complementary bar code will only retain a single type of plasmid, each complementary genetic bar code is automatically separated from all others at this step.
  • the vector chosen to produce the complementary bar code array is different from the expression vector used to create the library so as to preclude any hybridization between the two vectors.
  • microbial colonies carrying a plasmid library encoding complementary bar codes are used to construct a biological array.
  • a biological array can be easily reproduced by replica plating.
  • the DNA of the array is easily immobilized on a solid support (e.g. nitrocellulose, nylon, etc.) by well known methods.
  • the sequence of the complementary genetic bar code at each location of the array may be determined by standard sequencing reactions. This information may then be stored in a computer for later retrieval as needed.
  • a biological array is different from that of a gene chip array as follows. Instead of having each complementary bar code present as a homogeneous nucleic acid at each location of the array, each is present in a background of other DNA (i.e. from the plasmid carrying the complementary bar code and the microbial genome).
  • Analysis of a sorted library by functional expression provides a means to screen a large number of genes simultaneously in order to identify genes having the biological function specified by the chosen readout assay. Any of the following methods may be used for in situ transfection of a sorted library. Following all of the transfection procedures described below, readout cells are rinsed in a physiologic buffer and cultured for a period of time to allow expression of the transfected genes.
  • the period of time to allow expression may be from 12 hours to 12 days, from 1 day to 6 days, or from 2 days to 3 days. In a preferred embodiment, the period of time is from 1 day to 4 days.
  • in situ transfection may be performed using a gene chip having a plurality of concave loci (i.e. U-shaped areas), each locus having an oligonucleotide complementary to a genetic bar code attached thereto. Following library sorting by hybridization, such a gene chip will have an individual cDNA recombinant at each locus.
  • a gene chip having a plurality of concave loci (i.e. U-shaped areas), each locus having an oligonucleotide complementary to a genetic bar code attached thereto.
  • In situ transfection is performed by contacting the chip to a readout cell culture in the presence of a solution which facilitates the release of the hybridized recombinants.
  • a solution may be, e.g., phosphate-buffered saline or tissue culture medium without serum supplement.
  • any low-salt solution e.g. 150 mM NaCl or lower
  • Chemical transfectants may also be included in the solution to facilitate uptake of the released DNA into the readout cells.
  • Such transfectants may be any transfectant known in the art. For example, calcium phosphate, DEAE-dextran, polybrene or a lipid-based transfectant such as
  • LT1 Panvera
  • Lipofectamine Lipofectamine
  • in situ transfection may be performed using a biological array or a gene chip having a flat surface.
  • in situ transfection may be performed by contacting the biological array or the gene chip to a readout cell line m the presence of a solution as described above.
  • a chemical transfectant may also be used as described above.
  • a micro-compartmentalization grid device may also be used to
  • micro-compartmentalization grid device may be as illustrated in FIG. 3 of copending U.S. Patent Application No. 09/065,776, filed April 24, 1998, entitled “MICRO-COMPARTMENTALIZATION DEVICE AND USES THEREOF,” by Cen and Sun (Attorney Docket No. 9557-003), which is incorporated herein by reference in its entirety.
  • in situ transfection may be performed by electroporation.
  • Electroporation may be performed using, e.g., a cell culture device as described in U.S. Patent No. 5,134,070, which is incorporated by reference herein in its entirety.
  • the readout cell line used for electroporation may be any readout cell line which will attach to and proliferate on a solid support.
  • Such a cell line may be grown in a monolayer on the bottom of a cell culture device which is electrically conductive (see e.g. U.S. Patent No. 5,134,070).
  • a gene chip or biological array having a sorted cDNA library attached thereto is contacted with the cell line in the presence of a suitable electroporation solution (e.g.
  • in situ transfection may be performed using enzymes to facilitate attachment to and release from a gene chip or a biological array.
  • one may covalently attach a sorted library to a nucleic acid array to ensure tight binding using T4 ligase, or an enzyme having a similar activity, to ligate the oligonucleotides encoding the complementary bar codes to the bar-coded cDNA library following hybridization.
  • Such a covalently-bound bar-coded cDNA library may be released at will by . including a restriction endonuclease in the transfection solution (e.g. if an EcoRI site is used as site 2, see FIG. 1, then one may cut with EcoRI).
  • enzymes which cut mismatched nucleotides may be used to eliminate cross- hybridization following the completion of the hybridization process.
  • FIG. 4 An overview of the in situ transfection (gene transfer) methods as they relate to the overall process of gene identification and retrieval, is illustrated schematically in FIG. 4.
  • biochemical analysis of a sorted library may also be performed.
  • a solid support having a plurality of concave loci may be used, as described above and depicted in FIG. 5.
  • each individual protein encoded by the sorted library is first expressed using the well-known techniques of in vitro transcription and translation. Since each individual protein expressed is compartmentalized at a discrete locus, each may be screened for any biochemical activity of interest (e.g. receptor-binding activity, ligand- binding activity, growth factor activity). See Section 5.6 below for a description of various biochemical readout assays.
  • Individual proteins may be subsequently immobilized on a solid support (e.g. nitrocellulose or nylon) for use in binding or other assays.
  • a solid support e.g. nitrocellulose or nylon
  • individual proteins may be left free within U-shaped wells for subsequent assay of activity.
  • in vitro translation products may be placed in contact with readout cells using mild centrifugation to transfer the contents of each U-shaped locus onto a readout cell grid.
  • the unique genetic bar code situated in the vector next to the cDNA insert may be used to isolate the clone-of-interest. For example, localized releasing of DNA hybridized on a nucleic acid array may be carried out by competition using an oligonucleotide identical to the bar code of interest. Further, isolation of the clone-of-interest may be carried out by polymerase chain reaction (PCR) to amplify only insert cDNA linked to the identified genetic bar code. Under this approach, for example, the bar code sequence may be used as a specific primer together with a suitable vector primer and total library DNA as template.
  • PCR polymerase chain reaction
  • the primer of the genetic bar code can also be used for isolating the specific plasmid by a procedure referred to as "gene trapping" (Gibco BRL).
  • Gene trapping is a method for rapid isolation of cDNA clones from single stranded DNA prepared from a library. This method is based on isolating cDNA clones which hybridize with a biotinylated oligonucleotide complementary to a cDNA of interest (see Le et al., 1995, Focus 17, 45).
  • isolation of the clone-of-interest may be carried out by physical "picking" since the location of each unique bar code sequence within an array is generally known.
  • the genetic bar code technology not only enables performing cellular functional assays on all genes represented in a cDNA library simultaneously, but is also amenable to automation. Such automation allows many functional assays to be performed in a single format, as described further in Section 5.8 below.
  • all oligonucleotides i.e. genetic bar codes
  • Tm melting temperature
  • a bar code in a double-stranded vector can be exposed in single- stranded form using an enzyme having 3' to 5' exonuclease activity, such as T4 DNA polymerase, and a bar code population having one nucleotide omitted from all bar code sequences.
  • an exonuclease activity is capable of cleaving nucleotides from a 3' recessed end, and such enzymes will cease exonuclease activity under certain conditions.
  • T4 polymerase may be used in the presence of G and the linearized double stranded vector to expose the bar code in single stranded form.
  • T4 polymerase may be used in the presence of T and the linearized double stranded vector to expose the bar code in single stranded form.
  • NTPs nucleotide triphosphates
  • T4 DNA polymerase and a bar code population lacking one of the four nucleotides in its sequence may be used to make the bar code single-stranded and protruding from the end of a linearized, double-stranded vector.
  • such an enzymatic exonuclease may be used when all four nucleotides are present in the bar code so long as the timing of the reaction is closely controlled to expose only the bar code in single stranded form.
  • a genetic bar code pool may be designed using a set of nucleotide dimers as building blocks for synthesizing the pool.
  • one nucleotide dimer set consists of TG, AG, GA and GT. It is notable that this set has a minimum of one nucleotide identity between any two given dimers.
  • Such a set is referred to herein as a minimal mismatch set (MMS).
  • MMS minimal mismatch set
  • the average pair-wise mismatch for the set of TG, AG, GA and GT is 1.7 nucleotides, or 83%.
  • the average pair- wise mismatch is computed by adding all the possible pair-wise nucleotide differences and dividing by the total number of pairs.
  • the omitted nucleotide in the MMS listed above is C
  • the omitted nucleotide may be any of the four nucleotides when designing such an MMS.
  • Nucleotide dimers are chosen such that the Tm for each dimer remains constant. In this way, the Tm for each genetic bar code of a pool will be the same. Methods for computing Tm are well known to one skilled in the art. The omission of a nucleotide in design of a bar code pool may be used to allow formation of a protruding end encoding the bar codes, as described above.
  • a pool of 20-mers generated through random synthesis using the above-listed set of nucleotide dimers will produce a pool having a diversity of 4 10 or 1.05 x 10 6 genetic bar codes.
  • the minimum percentage of pair-wise mismatches within this pool of genetic bar code 20-mers is 1/20 or 5%, while, as noted above, the average pair-wise mismatch between any two genetic bar codes is 83%.
  • genetic bar codes may be designed using a set of nucleotide t ⁇ mers, each trimer having a minimum of two nucleotides different from any other trimer and each trimer having one G.
  • Such an example set, which omits C, consists of AGT, TGA,
  • the minimum percentage of pair- wise mismatch within this pool of genetic bar codes is 2/24 or 8.3%, while the average pair-wise mismatch between any two genetic bar codes in this set is 80%.
  • an MMS may consist of GAT A, GTAT, AGAA, TGTT, AAGT, TTGA, ATTG and TAAG.
  • a pool of genetic bar codes constructed using seven 4-mer subunits will produce a bar code length of 28 nucleotides and a bar code diversity of 1 x 10 6 .
  • the minimum percentage of pair- wise mismatches within this genetic bar code pool is 3/28 or 10.1%.
  • N-mer oligonucleotide building block having a number of G nucleotides in the N-mer equal to k and the remaining nucleotides (i.e., N-k) consisting of A or T, or an A plus T mixture
  • the bar code diversity is equal to 2 ⁇ N"k) C k N .
  • N nucleotides
  • N-k the remaining nucleotides
  • the bar code diversity is equal to 2 ⁇ N"k) C k N .
  • 2 (N"k) C k N number of MMS in these total possible constructs for a give minimal mismatch cut-off.
  • the number of sequences in different MMS is not the same. For example, in the case 5-mer with two G in each sequence and three nucleotides of A or T or A and T mixture. There are 80 sets of MMS.
  • MMS has 8 sequences and some have 12 sequences for a mismatch cut-off of 3. It is generally true also that the larger the N for a given minimal mismatch cut-off, the larger the number of sequences in any set of MMS. It is also true that for a given diversity number in a library of genetic bar codes constructed from N-mer nucleotide subunits (such as those listed above, 8 tetrameric MMS in 4-mer example), the larger the N, the higher the minimal mismatch percentage in the library.
  • Another way of constructing a pool of genetic bar codes is to use a certain number (e.g. 100) of oligonucleotides as building blocks selected from all possible combinations of a fixed length (e.g. 9-mer) with a certain minimum number (e.g. four) of nucleotides different among them.
  • a certain number e.g. 100
  • a fixed length e.g. 9-mer
  • a certain minimum number e.g. four
  • the bar codes are composed of three subunits of 9-mers. Further, the minimum percentage difference between any two bar codes within this pool is 4/27 or 14.8%.
  • the average pair-wise sequence difference in this pool of 1,000,000 genetic bar codes is 65%>, or 17.5 nucleotides.
  • the number of G nucleotides in an MMS nucleotide subunit ranges from 45% to 50%>. The number of G nucleotides is the same in all bar codes.
  • a pool of 36-mer genetic bar codes, each constructed from four 9-mers of this MMS has a diversity of one hundred million. The minimal pair- wise sequence difference mismatch between any two 36-mers in this pool of
  • the following example lists one hundred 9-mers of a minimal mismatch set, each having four G nucleotides and five nucleotides selected from the group consisting of A, T, and an A plus T mixture, and further having a pair- wise mismatch of at least 4 nucleotides between any two 9-mers.
  • 4,032 members in this MMS of 9-mers i.e. 2 5 C 4 9 ).
  • the typical number of sequences in each of these 4,032 sets ranges from 80 to 101.
  • a pool of 27-mer genetic bar codes, each constructed from three 9-mers of this MMS, has a diversity of one million.
  • the minimum pair- wise sequence mismatch between any two 27-mers in this pool of genetic bar codes is 14.8%) while the average pair-wise sequence mismatch is 65%>.
  • a population of 36-mer genetic bar codes, each constructed from four 9-mers of this MMS has a diversity of one hundred million.
  • the minimal pair-wise sequence mismatch between any two 36- mers in this pool is 14.8%, while the average pair-wise sequence mismatch is 65%, or 23.4 nucleotides.
  • restriction endonuclease or other mechanism is used to generate a single- stranded region in a bar-coded vector, then all four nucleotides may be used for design and synthesis of the genetic bar code.
  • restriction endonucleases such as Bbvl,
  • Bbsl, Bsal, BsmA 1, BsmF 1, BspM 1, Fokl, Hga I and SfaN I may be used to generate an end having four or five protruding nucleotides.
  • restriction endonuclease for linearizing a bar-coded library and further exposing the genetic bar codes in single stranded form is Bgl II which may be used at site 1
  • T4 DNA polymerase is then used in the presence of GTP (and the absence of other NTPs) to digest the complementary strand of the genetic bar code. If an enzyme such as EcoR I is used for site 2 (FIG. 1), T4 DNA polymerase will stop degradation at the EcoR I site when it encounters nucleotide G. T4 DNA polymerase is then inactivated by heat.
  • the bar-coded cDNA library having protruding single-stranded bar codes may then be purified using standard phenol/chloroform extraction and ethanol precipitation.
  • a Bgl II site plus two additional nucleotides at its 5' end is synthesized as a standard component of the vector proceeding the first nucleotide of the genetic bar code.
  • an EcoR I site plus one additional nucleotide at its 3' end is synthesized as a standard component of the vector after the last nucleotide of the genetic bar code.
  • Hybridization of a bar-coded cDNA library with a genetic bar code population is carried out several hours to overnight at an optimal temperature, preferably 5 to 10 °C lower than the Tm or 2 to 5°C below Td of the genetic bar code population.
  • the prehybridization buffer may be as follows: 6 x SSC
  • Hybridization buffer may be 3.0 M TMA Cl or 2.4 M TEA Cl, 0.01 M sodium phosphate
  • a hybridized bar-coded cDNA library may be washed with 6 x SSC solution and 2 x SSC solution as needed.
  • a sorted, single-stranded cDNA library or genomic library can be used to create a
  • a bar-coded library having a protruding single-stranded genetic bar code region is hybridized with a gene chip. This is followed by incubation with a DNA ligase, such as T4 ligase, to covalently bond the complementary strand of cDNA to the gene chip.
  • the sense strand of cDNA may be removed by, for 5 example, denaturing (e.g. 100 °C incubation) under high stringency or through using an enzyme such as Exo III and the like (Hoheisel, 1993, Anal. Biochem. 209:238), so as to convert a double stranded library into a single stranded library.
  • the genetic bar code region is placed in front of a CMV promoter and the same strand as sense cDNA.
  • the 10 genetic bar code region is placed at the end of the cDNA and the same strand as antisense cDNA.
  • such a library array can be constructed at a density as high as an oligonucleotide array on a gene chip; 2) only a single
  • T 5 sample is needed instead of a million samples for individual spotting for the current DNA array technology (i.e. since the cDNA library can be sorted using the genetic bar code method, only one sample containing the whole cDNA library is prepared, instead of preparing 10 6 individual samples as would be otherwise required); and 3) many different cDNA library arrays can be easily prepared (i.e. library arrays can be made from a plurality
  • beads may be used instead of a gene chip or biological array, for sorting a cDNA library.
  • each individual bead carries multiple copies of one unique complementary bar code. Therefore, each individual bead will hybridize to multiple copies of a single recombinant, thereby sorting and concentrating individual members of the library to discrete loci.
  • spherical or spheroid As spherical or spheroid
  • beads may provide enhanced hybridization efficiency compared to gene chips or biological arrays. Following hybridization, each bead represents a single, easily manipulable recombinant which may be assayed under a high-
  • - 30 - throughput pharmaceutical screening format (e.g. one bead per well) to determine biological functions of the encoded cDNAs. For example, one can place each unique bead, with a bar- coded vector DNA hybridized thereto, into a well of an assay plate or into a micro- compartment of a micro-compartmentalization device (e.g. a 96-well plate, a 384-well plate, or a plate having a larger number of wells or micro-compartments).
  • a micro-compartmentalization device e.g. a 96-well plate, a 384-well plate, or a plate having a larger number of wells or micro-compartments.
  • Such a micro- compartmentalization device may be as described in FIG. 1 through FIG. 4 of copending U.S. Patent Application No.
  • Bead placement may be performed robotically.
  • the presence of a low salt solution in each well permits dissociation of vector DNA from the beads.
  • the resulting solution in each well, now containing DNA of a single recombinant, may be mixed with a chemically transfected or electroporated into readout cells.
  • a removable micro-compartmentalization device as described herein may be used during the transfection or electroporation procedure.
  • the grid of the device may be removed from the readout cell culture following gene transfer so as to facilitate processing (i.e., rinsing, culturing, and assaying for biological function). Any recombinant producing a positive signal in a readout assay may be recovered, for example, by sampling the positive cell population and performing PCR using primers flanking the cDNA insert of the vector.
  • a protein binding site may be installed in the vector.
  • a DNA binding protein which recognizes the protein binding site in the vector can then be used to sterically hinder (i.e. block or prevent) the 3' to 5' exonuclease progression beyond the bar code region of the vector.
  • DNA binding proteins may include prokaryotic proteins such as a lactose repressor which binds to a lactose operator sequence, a tetracycline (tet) repressor which binds to a tet operator sequence, etc. , or eukaryotic proteins such as a eukaryotic transcription factor (e.g. E2F, API, SP1, p53, etc.).
  • Any chosen protein binding site may be installed outside of the genetic bar code region defined by site 1 and site 2 using
  • DNA binding proteins may be applied to occupy two sets of DNA binding sites.
  • a 3' to 5' single strand exonuclease e.g. exonuclease III, T4 DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase, Pfu DNA polymerase, etc.
  • exonuclease activity is stopped at the DNA binding sites by steric hindrance from the bound DNA binding proteins.
  • exonuclease III is used as the 3' to 5' exonuclease
  • the protection of the non genetic bar code end from 3' to 5' exonuclease activity can also be achieved by generating a 3' overhang which is resistant to exonuclease III.
  • This approach may be used as an alternative to the DNA-protein complex formation described above which sterically hinders (i.e., blocks or prevents) exonuclease progression along the DNA.
  • a 3' overhang for this purpose can be obtained by installing a restriction enzyme site which produces such an overhang on the right side of site 1 (see FIG. 1).
  • Suitable enzymes include Hae II, Kpn I, Nsi I, Pst I, Sac I, etc.
  • all four nucleotides may be included in construction of the pool of unique genetic bar codes (as opposed to using only three nucleotides as described herein).
  • the number of unique sequences in any minimal mismatch set (MMS) for a given mismatch cutoff will be larger.
  • MMS minimal mismatch set
  • one benefit to using all four nucleotides is to achieve a reduction of potential cross hybridizations among similar but non-identical bar codes.
  • this benefit must be weighed against the benefits of using bar codes consisting of three nucleotides in a given situation.
  • Computer algorithms may be used to facilitate the design of unique genetic bar code 5 sets having oligos of a given length, minimal cross hybridization, and the same or similar melting temperature (Tm) (see e.g. U.S. Patent Nos. 5,635,400 and 5,654,413 by Brenner).
  • Non-natural nucleotides i.e. modified nucleotides or nucleotide derivatives
  • any DNA polymerase having 3' to 5' exonuclease activity may be used.
  • Such polymerases include Klenow, T7, Vent, Pfu, and T4 DNA polymerases.
  • Methods are provided to systematically screen expressed genes of the human genome for specific functions using a large number of functional assays. Such technology provides a very rapid system for gene identification in which at least some functional information may be inferred. Readout assays may be cell-based or biochemical-based. In
  • changes in cellular morphology, immunostaining, or reporter gene expression can be detected within 1-2 days after library gene expression.
  • changes in ligand binding, growth factor activity, or enzymatic activity can be detected within 1-2 days after library gene in vitro transcription and translation. Using either approach, full functional screening of a library
  • 25 ⁇ having a diversity of 10 can be completed within a few days.
  • the methods of the invention provide the advantages of minimizing workload and reducing the occurrence of gene mutations which can arise in screening assays employing long-term culture. It is estimated that one of ordinary skill in the art can easily screen one library per week by the methods of the invention without using
  • functional assays of use together with the methods of the invention will measure changes (i.e. induction or reduction) of target gene expression, changes of cellular localization of a specific antigen, changes in cellular behaviors (e.g. growth factor secretion, apoptosis factor secretion, differentiation factor secretion), and changes in cellular morphology.
  • the signals a cell receives, whether from outside or inside the cell, are generally transmitted through a cascade of molecular interactions, including protein-protein interactions.
  • the overall process is generally termed signal transduction.
  • the signaling pathways which may be assayed for identification of associated genes include, but are not limited to, a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
  • BRDU incorporation or PCNA induction can be the cellular event detected.
  • stress signaling p53 induction, Jun induction, or nuclear HSF3 aggregates can be the detectable cellular event.
  • p53 induction, Jun induction, or nuclear HSF3 aggregates can be the detectable cellular event.
  • apoptosis signaling detection by Apo AlertTM staining (Gavrieli et al, 1992, J. Cell. Biol. 119, 493) or annexin staining can be the detectable cellular event.
  • anti-proliferation signaling detection of p21, p27, p57, pl5, pl6, pl8, or pl9 induction can be the detectable cellular event.
  • detection of ⁇ -catenin induction or ⁇ -catenin re-localization can be the detectable cellular event.
  • STAT signaling detection of induction of a reporter gene under the control of a STATl, 2, 3, 4, 5, 6, or 7 promoter can be the detectable cellular event.
  • AP-1 signaling detection of c-fos induction can be the detectable cellular event.
  • CREB signaling CREB phosphorylation, or induction of a reporter gene under the control of the CREB promoter, can be the detectable cellular event.
  • NFKB signaling NFKB re-localization, or induction of a reporter gene under NFKB promoter control, can be the detectable cellular event.
  • IL-2 mediated proliferation can be the detectable cellular event.
  • Other signaling pathways can include Hedgehog signaling (detectable by GLI-1, GLI-2, and GLI-3 induction); nuclear receptor signaling (detectable by induction or reduction of a reporter gene under estrogen, retinoic acid, vitamin D3 or thyroid hormone responsive promoters); antiviral signaling (detectable by induction of interferon alpha or beta); myc-max signaling (detectable by induction of a reporter gene under a myc-Max responsive promoter); BMP signaling (detectable by nuclear translocation of Smad); and insulin signaling (detectable by Glutl or Glut4 re-localization).
  • Specific diseases of interest include, but are not limited to, cancer, inflammation, atherosclerosis, autoimmune diseases, diabetes, infection, diseases of metabolism (e.g. obesity), and neurodegenerative diseases (e.g. Alzheimer's disease and Parkinson's disease).
  • Readout assays involving detection of changes (i.e. increases or decreases) in the levels of the following targets may identify genes associated with the indicated specific disease.
  • Assays that may detect genes involved in cancer include assays for detection of:
  • HLA for immune surveillance
  • OSM for anti-cancer growth
  • GADD45 and GADD153 for tumor suppression
  • nm23 for anti-metastasis
  • vEGFA, vEGFB, vEGFC, PIGF, and FGF2 for angiogenesis
  • MDR for drug resistance: CASP100 for apoptosis
  • Assays that may detect genes involved in inflammation include detection of Cox2, IL-l ⁇ , IL-6, TNF ⁇ , IL-13, E-selectin, VCAMI, ICAM 1 and 2,
  • NFKB NFKB
  • c-Rel RelB
  • I ⁇ B ⁇ I ⁇ B ⁇
  • Bcl3 Changes in the level of expression of the following targets may be assayed immuno logically in response to expression of a heterologous cDNA in order to detect genes which may be involved in a given disease.
  • the potential targets that can be used for detecting genes involved in atherosclerosis include Egr-I.
  • the potential targets that can be used for detecting genes involved in autoimmunity include Fas and Fas ligand.
  • the potential targets that can be used for detecting genes involved in diabetes include insulin.
  • chemokines MlP-l , MlP-l ⁇ , MIP-2, RANTES, MCP-1, MCP-2, GRO , GRO ⁇ , GRO ⁇ ,
  • ENA-78, 1309, and IP10 various cytokines (e.g. IL-2, IL-13, GM-CSF, G-CSF, and
  • the potential targets that can be used for detecting genes involved in obesity include leptin and the leptin receptor.
  • the potential targets that can be used for detecting genes involved in Alzheimer's disease include Tau, CRF, CRF receptor, CRF-BP,
  • Urocortin and neuronal growth factors (e.g. BDNF, NT3, NT4, NT5, CNTF, and GDNF).
  • BDNF neuronal growth factors
  • NT3, NT4, NT5 neuronal growth factors
  • CNTF neuronal growth factors
  • GDNF neuronal growth factors
  • the potential targets that can be used for detecting genes involved in Parkinson's disease includes TH, and ⁇ -synuclein.
  • an assay may be employed which can detect changes in such functions as cell growth, apoptosis, senescence, differentiation, adhesion, binding of a cell to a specific molecule, binding of a cell to another cell, cellular organization, organogenesis, intracellular transport, transport facilitation, protein synthesis, transcription, energy conversion, metabolism, myogenesis, neurogenesis, or hematopoiesis.
  • Examples of such cellular physiological functions and assays for detecting changes in them include, but are not limited to: cholesterol transport, detectable by detecting intracellular cholesterol accumulation; myogenesis, detectable by detecting induction of MyoD or MEF-2; neurogenesis, detectable by detecting induction of neuro D; and vasodilation and neurotransmission, detectable by induction of inducible nitric oxide synthase (iNOS).
  • cholesterol transport detectable by detecting intracellular cholesterol accumulation
  • myogenesis detectable by detecting induction of MyoD or MEF-2
  • neurogenesis detectable by detecting induction of neuro D
  • vasodilation and neurotransmission detectable by induction of inducible nitric oxide synthase (iNOS).
  • Bromodeoxyuridine (BRDU) incorporation may be used as an assay to identify genes involved in proliferation.
  • the BRDU assay identifies a cell population undergoing DNA synthesis by incorporation of BRDU into newly-synthesized DNA. Newly- synthesized DNA may then be detected using an anti-BRDU antibody (see Hoshino et al., 1° 1986, Int. J. Cancer 38, 369; Campana et al., 1988, J. Immunol. Mem. 107, 79).
  • PCNA proliferating cell nuclear antigen
  • p53 is an important modulator of the stress response. p53-dependent transcriptional activation may therefore be used to identify genes involved in a stress signaling pathway.
  • a readout cell population containing a reporter gene under the control of a p53 -inducible promoter may be used for the assay.
  • Suitable reporter genes include, but are not limited to, ⁇ -galactosidase ( ⁇ -gal), chloramphenicol acetyltransferase (CAT), and luciferase. Positive
  • 25 cells may be identified by blue color in a ⁇ -gal reporter gene assay (see e.g. Komarova et al, 1991, EMBO J. 16, 1391-1400) or by immunostaining for the reporter gene product.
  • a p53 induction assay may also be used to identify genes involved in a stress signaling pathway. p53 induction (i.e. increases in cellular p53 protein expression) may be identified by immunostaining using a specific anti-p53 antibody (Anker et al, 1993, Int. J. Cancer 55, 30 982; Weiss et al, 1993, Int. J. Cancer 54, 693).
  • a heat shock transcription factor 3 (HSF3) aggregation assay may also be used to identify genes in a stress signaling pathway.
  • the HSF3 aggregation assay measures HSF3
  • c-Jun kinase assay may be used to identify genes in a stress signaling pathway.
  • c-Jun kinase JNK
  • p-JNK phosphorylation 5
  • Many stress signals result in activation of c-Jun kinase by phosphorylation (Derijard et al, 1994, Cell 76, 1025).
  • the availability of p-JNK specific antibodies allows in situ detection of cells in which JNK is activated by heterologous library genes.
  • Invasion inhibition assays may be used to identify genes involved in cancer.
  • One such assay measures induction of E-cadherin-mediated cell-cell adhesion.
  • the induction of E-cadherin-mediated adhesion can result in phenotypic reversion and loss of invasiveness of epithelial cells.
  • This assay measures increased expression of E-cadherin at the cell junction through immunostaining using a specific anti-E-cadherin antibody (Hordijk et al, 1997, Science 278, 1464).
  • Another such assay measures loss of hepatocyte growth factor (HGF)- induced cell scattering.
  • HGF hepatocyte growth factor
  • HGF-induced cell scattering is correlated with loss of invasiveness of epithelial cells such as Madin-Darby canine kidney (MDCK) cells.
  • MDCK Madin-Darby canine kidney
  • TUNEL terminal deoxynucleotidyl transferase-mediated dUTP nick-end-labeling
  • pi 5 is a member of a family of specific inhibitors of Cdk4 and
  • pl5 is positively regulated by transforming growth factor- ⁇ (Reynisdottir et al, 1997, Genes & Dev. 11, 492).
  • pi 5 induction may be identified by immunostaining using a specific anti-pl5 antibody available commercially (e.g. Santa Cruz).
  • p21 induction assay Another assay useful for gene identification in an anti-proliferation signaling pathway is the p21 induction assay. Increased levels of p21 expression in cells results in Cdk inhibition, thus resulting in delayed entry into Gl of the cell cycle (Harper et al, 1993, Cell 75, 805; Li et al, 1996, Current Biology 6, 189). For example, p21 expression can be elevated by p53 and transforming growth factor- ⁇ activities. p21 induction may be identified by immunostaining using a specific anti-p21 antibody available commercially (e.g. Santa Cruz).
  • Cdk inhibitor family of proteins The expression of p27 is increased upon mitogen withdrawal or contact inhibition (Polyak et al, 1994, Cell 78, 59). p27 induction may be identified by immunostaining using a specific anti-p27 antibody available commercially (e.g. Santa Cruz).
  • One assay for detection of genes which modulate the Wnt signaling pathway is a ⁇ -catenin induction and/or translocation assay.
  • the activation of the Wnt signaling pathway results in an increased expression of ⁇ -catenin and the translocation of ⁇ -catenin from the cytoplasmic compartment to the nucleus (Kuhl et al, 1997, BioEssays 19, 101). 5
  • This assay is used to identify cells and/or cell populations in which the expression of ⁇ -catenin is increased compared to background levels, and/or in which a change of ⁇ -catenin localization occurs, in response to expression of a heterologous gene. Changes in ⁇ -catenin expression or localization are detected using a specific anti- ⁇ -catenin antibody
  • Another assay for detection of genes which modulate the Wnt signaling pathway is a
  • LEF-1 inducible promoter induction assay activates downstream targets in the Wnt signaling pathway by binding to a transcription factor known as LEF-1, thus resulting
  • a readout cell line containing a reporter gene, such as ⁇ -gal, under a LEF-1 inducible promoter is used for the assay.
  • ⁇ -gal is used as a reporter gene, positive cells are the darker blue cells. 5
  • the STAT (signal transducers and activators of transcription) signaling pathway is activated by many growth factors and cytokines and plays essential roles in cell differentiation, cell cycle control, and development. There are six known members of the
  • the assay employs a readout cell line containing a reporter gene, such as ⁇ -gal, under the control of any of these known STAT-inducible promoters (White et al, 1996, Cytokine Growth Factor Rev. 7, 303). Positive cells stain dark blue when ⁇ -gal is used as the reporter gene. This assay may
  • I 5 be used to identify genes in a STATl signaling pathway, a STAT3 signaling pathway, a STAT4 signaling pathway, and/or a STAT5/STAT6 signaling pathway. Since STAT5 and STAT6 share the same DNA recognition site, the assay does not distinguish between these two STAT pathways. Readout cells expressing a gene which activates a particular STAT transcription factor will produce a positive signal. Accordingly, the genes identified reside
  • MAP kinase signaling pathway genes may be identified using a p-ERK assay. The activation of this signal transduction pathway by certain growth factors, hormones and
  • 25 neurotransmitters is mediated through two closely-related MAP kinases, p44 and p42, also known as ERKl and ERK2.
  • ERK proteins are activated by dual phosphorylation at specific tyrosine and threonine sites.
  • the p-ERK assay is used to identify genes by immunostaining readout cells with an antibody which specifically detects the presence of phosphorylated
  • ERK p-ERK
  • ERK2 may be obtained commercially (e.g. Santa Cruz). See Boulton et al, 1991, Cell 65,
  • Genes in an AP-1 signaling pathway may be identified using a c-fos induction readout assay.
  • the AP-1 signaling pathway is involved in cell proliferation, cell survival and cell stress. Activation of the AP-1 signaling pathway results in an increased expression of genes under the control of an AP-1 promoter sequence such as the c-fos gene (see e.g.
  • the c-fos induction assay identifies genes expressed in cell populations in which the level of endogenous c-fos protein is increased by immunostaining c-fos using a specific anti-c-fos antibody (Telford et al, 1996, J. Comp.
  • genes in a cyclic adenosine monophosphate response element binding protein (CREB) signaling pathway may be identified using a phosphorylated CREB (p-CREB) readout assay.
  • CREB is activated by phosphorylation following an increase in the intracellular concentration of cAMP or Ca 2+ .
  • An antibody which specifically recognizes phosphorylated CREB allows detection of an activated CREB pathway in readout cells (Ginty et al, 1994, Cell 77, 713).
  • genes in a CREB signaling pathway may be identified using a cyclic adenosine monophosphate response element (CRE) reporter gene assay.
  • CRE cyclic adenosine monophosphate response element
  • a readout cell containing a reporter gene e.g. ⁇ -gal, CAT or luciferase
  • Positive cells may be identified by, e.g., blue staining in a ⁇ -gal assay (Himmler et al, 1993, J. Recept. Res. 13, 79; Kruger et al, 1997,
  • an NFKB translocation assay may be used to identify genes in an NFKB signaling pathway. Activation of the NFKB signaling pathway results in translocation of NFKB from the cytoplasm to the nucleus.
  • the NFKB translocation assay identifies cells with NFKB translocated to the nucleus by immunostaining for NFKB using a specific anti-NF ⁇ B antibody (Han et al, 1997, J. Biol. Chem. 272, 9825; Janssen et al, 1995, Adv. Cancer Res. 151, 389).
  • an NFKB reporter gene assay may be used to identify genes in an NFKB signaling pathway.
  • a readout cell containing a reporter gene e.g. ⁇ -gal, CAT or luciferase
  • Positive cells may be identified, e.g., by blue staining in a ⁇ -gal assay or by immunostaining for the reporter gene product (Rothe et al, 1995, Science 269, 1424).
  • Genes in an NFAT signaling pathway may be identified using a NFAT reporter gene assay.
  • a readout cell expressing a reporter gene e.g. ⁇ -gal, CAT or luciferase
  • Positive clones may be identified by blue staining in a ⁇ -gal assay (see e.g. Burres et al, 1995, J. Antibiot. 48, 380) or by immunostaining for the reporter gene product.
  • Genes in the insulin signaling pathway may be identified using a GLU4 translocation assay.
  • Insulin stimulation of adipocytes results in translocation of the GLU4 glucose transporter to the plasma membrane.
  • This assay identifies cells in which the insulin signaling pathway is activated by immunostaining GLU4 protein localized at the plasma membrane (Martin et al, 1996, J. Biol. Chem. 271, 17605). 0
  • MDR gene regulation pathway Genes in the multiple drug resistance (MDR) gene regulation pathway may be identified using an MDR reporter gene assay. MDR gene expression is often greatly increased in cancer cells resistant to chemotherapy.
  • a readout cell containing a 5 reporter gene e.g. ⁇ -gal, CAT or luciferase
  • Positive cells may be identified by blue staining in a ⁇ -gal assay
  • Genes important in a cholesterol transport pathway may be identified using an intracellular cholesterol accumulation assay. For example, mutations of the Niemann-Pick
  • NP-C low density lipoprotein
  • LDL low density lipoprotein
  • the filipin staining assay may be used to identify cells with cholesterol accumulation due to the expression an exogenous sense or anti-sense cDNA (see Eugene et al, 1991, Science 277, 228).
  • biochemical readout assays may be used to identify genes modifying specific activities following in vitro transcription and translation.
  • biochemical readout assays include, but are not limited to, enzymatic and receptor-based assays.
  • assays for enzymatic activities and receptor-binding activities which may be adapted for use in identification of new genes upon screening a library of interest, as further exemplified in this Section below.
  • Biochemical readout assays may include, e.g. , detection of: GAB A receptor activity, glutamate receptor activity, monoamine oxidase activity, nitric oxide synthetase activity, opiate receptor activity, serotonin receptor activity, adenosine A ! agonist and antagonist activity, adrenergic ⁇ dress ⁇ 2 , ⁇ ; agonist and antagonist activity, calcium channel blocker activity, inflammatory mediator activity, such as the interleukins (e.g. IL-1, IL-6), tumor necrosis factor activity, arachidonic acid activity and phosphatase activity (e.g. tyrosine phosphatase).
  • interleukins e.g. IL-1, IL-6
  • tumor necrosis factor activity e.g. tyrosine phosphatase
  • arachidonic acid activity e.g. tyrosine phosphatase
  • biochemical readout assays may include, for example, binding to protein domain or subdomain, for example, a PDZ domain, a PH domain, an SH2 domain, and an SH3 domain. Still further, biochemical readout assays may include binding to a molecule, for example, phosphotyrosine and phosphorylated inositol.
  • a functional assignment given to a particular gene may be derived from results obtained in more than one assay. Indeed, it is preferred that a functional assignment be derived from results
  • assays based on enzymes or receptors include the following: acetylcholinesterase; aldol-reductase; angiotensin converting enzyme (ACE); cyclooxygenases; DNA repair; ⁇ -glucuronidase; lipoxygenases; monoamine oxidases; phosphohpase A 2 , platelet activating factor (PAF); potassium channel assays; prostaglandin synthetase; serotonin re-uptake activity; and steroid receptors.
  • Additional assays may include: ATPase inhibition, benzopyrene hydroxylase inhibition, HMG-CoA reductase inhibition, phosphodiesterase inhibition, protease inhibition, and tyrosine kinase inhibition.
  • the methods of the invention are not limited to use with the readout assays described herein. Such readout assays merely serve to exemplify a few of the myriad possibilities suitable for use with the invention.
  • the readout assay is a cellular readout assay, virtually any cell line identified as suitable by one skilled in the art may be used. Further, virtually any reporter gene, or endogenous gene functioning as a reporter gene, identified as suitable by one skilled in the art may be used. It will be well noted by one skilled in the art that the methods of the invention are suitable for use with any known readout assay, whether the assay be cellular or biochemical.
  • the skilled practitioner will recognize that it is the particular readout assay, whether chosen from the literature or designed by the user, which determines the type (i.e. function) of genes identified. For example, if one wishes to identify genes associated with cancer, one may choose to screen the library of interest using the p53 and or MDR assays described above. Often, the user will provide the most appropriate readout assay to be employed for identification of particular genes-of-interest.
  • automation technology be applied throughout the entire functional gene identification process. Many steps in the overall process are amenable to such automation. For example, robotic colony picking may be used for building a library of 10 6 clones from plates containing well-isolated colonies. Robots suitable for this purpose are available commercially from, e.g., Qiagen, Gentix, etc. Similarly, transfection of retro viral vectors into producer cells, and in situ transfection of bar-coded, sorted libraries into readout cells, are repetitive operations suitable for robotic automation. Further, the system is suitable for automated immunostaining of the co-culture, and to automated microscopic viewing of the immunostained result. Only one population of bar codes is needed for all screenings and the same nucleic acid array can be used repeatedly. Automation can be applied to hybridization to an array such that the same hybridization conditions are used for various libraries. Automation can also be applied to in situ transfections and in situ bioassays.
  • genes identified will depend on the type of library screened. For example, if a sense cDNA library is screened, genes associated with proliferation will be identified. By contrast, if an antisense cDNA library is screened, genes associated with anti -proliferation will be identified.
  • PCNA immunostaining is performed using readout cells that are starved for at least 24 hours in low serum (e.g. 0.5%) medium prior to tetracycline induction. After 12-24 hours of tetracycline treatment, the cells are fixed in cold methanol
  • FITC fluorescein isothiocyanate
  • the readout cells are fixed (e.g. absolute methanol for 10 minutes at 4°C) and air dried. Cells are next rehydrated (e.g. PBS for 3 minutes), DNA is denatured (e.g. 2 M HC1 for 60 minutes at 10 37°C), the preparation is neutralized (e.g. 0.1 M borate buffer, pH 8.5, 2X for 5 minutes) and washed (e.g. PBS 3X over 10 minutes).
  • the readout cell preparation is incubated with anti-BRDU antibody (e.g. 50 ⁇ g/ml for 60 minutes at room temperature; Boehringer Mannheim), washed (e.g. PBS 3X over 10 minutes), and counterstained with Harris- modified hematoxylin. The preparation is then dehydrated and mounted for observation (i.e. visualization of readout cells staining positive with anti-BRDU antibody).
  • anti-BRDU antibody e.g. 50 ⁇ g/ml for 60 minutes at room temperature; Boehringer Mannheim
  • An assay for p53 induction may be used to identify genes associated with p53 expression.
  • One or more p53 inducer genes may be identified if a sense cDNA library is
  • one or more p53 inhibitor genes may be identified if an antisense library is used.
  • the p53 assays may be conducted by measuring endogenous p53 levels or by measuring levels of a reporter gene operably linked to a p53 promoter, as described below.
  • 25 readout cells are treated with tetracycline for 12-24 hours, to induce library gene expression, prior to anti-p53 antibody staining for endogenous p53 expression.
  • Cells are fixed with 1 : 1 acetone:methanol at -20°C for 10 minutes and air dried. This is followed by blocking with
  • FITC conjugated anti-mouse IgG antibody (Cappel) may be used for detection.
  • a p53 promoter operably linked to a ⁇ -gal reporter gene may be used.
  • Readout cells containing the ⁇ -gal gene under the direction of the p53 promoter can be obtained, e.g., from transgenic mice (see Komarova et al, 1997, EMBO J. 16, 1391-1400) or by establishing a stable cell line expressing a reporter gene under control of the p53 promoter.
  • Such readout cells are induced with tetracycline, fixed with 1%> glutaraldehyde in PBS, washed three times with PBS, and stained in 0.2%o X- gal, 3.3 mM K 4 Fe(CN) 6 , 3.3 mM K 3 Fe(CN) 6 and 1 mM MgCl 2 for one or more hours.
  • Positive cells are detected by the characteristic blue color which develops from the ⁇ -gal staining.
  • a heat shock transcription factor (HSF) intracellular translocation assay may be used to identify genes which are associated with transport of HSF from the cytoplasm to the nucleus.
  • One or more HSF transport inducer genes may be identified if a sense library is screened.
  • one or more genes associated with inhibition of HSF transport may be identified if an antisense library is screened.
  • Readout cells may be fixed with either absolute methanol or 4%> paraformaldehyde. After blocking with 10% normal goat serum in PBS, cells are incubated with a 1:300 dilution of anti-HSF3 in 10%> normal goat serum in
  • the preparation may be washed and mounted as described above.
  • a filipin staining assay may be used to identify genes associated with blocking cholesterol transport when screening a sense cDNA library or to identify genes associated with facilitating cholesterol transport when screening an antisense cDNA library.
  • the assay is based on the principle that filipin can specifically stain unesterified cholesterol located inside of cells (Carstea et al, 1997, Science 277, 228). The presence of large amounts unesterified cholesterol inside of cells indicates breakdown of the cholesterol transport pathway.
  • filipin staining solution is prepared by dissolving 2.5 mg of filipin in 1 ml of DMSO, which is then added to 50 ml of Dulbecco's PBS. The stained cells are washed three times with Dulbecco's PBS and mounted with glycerol/gelatin containing 1%> phenol. 5
  • This invention provides a high throughput method for structure- function analysis of a particular protein using random mutagenesis of a single gene of interest and the functional screening methods described herein.
  • mutagenized recombinants from a gene are obtained using, e.g., PCR with two primers framing the DNA region to be mutagenized under conditions of reduced Taq polymerase fidelity (see e.g. Rice et al, 1992, Proc. Natl. Acad. Sci. U.S.A. 89:5467; Leung et al, 1989, Technique 1 :11).
  • the mutagenized library may also be a deletion library which can be obtained through inverse PCR using 5'-truncated primers (Pues et al, 1997, Nucl. Acids
  • cDNA library plasmid is pBR322 and contains an ampicillin resistance gene
  • suitable vectors for construction of biological arrays include plasmid Ml 3 or pACYC
  • yeast vector such as Yep24, Yip5, etc. (available from New England BioLabs) may be used.
  • this invention provides a method for manually sorting cDNAs and oligonucleotides for screening using the in situ transfection procedures and cellular readout assays described herein.
  • manual sorting has the advantage of not requiring a bar-coded vector.
  • Manual sorting may be carried out by mechanical spotting of individual cDNAs onto a solid support (e.g. nitrocellulose or nylon). Such a manually- sorted cDNA population can be considered to be another form of a nucleic acid array.
  • Such an array can also be used in the in situ transfection and cellular readout assays described herein, so long as the cDNA is cloned into an expression vector which is capable of expressing either sense or antisense cDNA (as desired by the user) when transfected into readout cells.
  • a manually-sorted nucleic acid array can be used to analyze a collection of full-length genes-of-interest from any given source.
  • a manually-sorted single-stranded oligonucleotide array can also be constructed and used for the in situ transfection procedures and cellular readout assays described herein to identify a particular oligonucleotide which is most effective in manifesting a biological function-of-interest (e.g. antisense inhibition of oncogene expression).
  • a manually- sorted oligonucleotide array may be obtained through mechanical spotting of individual oligonucleotides onto a solid support (e.g. nitrocellulose or nylon).
  • Such an approach may be an effective way for identifying an optimal antisense oligonucleotide from among a population of antisense oligonucleotides which is effective in altering the expression of a particular target gene, such as the ras oncogene.

Abstract

The present invention relates generally to the field of genomics. More particularly, the present invention relates to methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by functional expression in one or more biological readout assays. This invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to screen a cDNA library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or compartments prior to functional expression. In one embodiment, this invention provides methods for in situ transfection of a sorted library in a 'bar-coded' vector to carry out expression of genes from libraries being screened in readout cells. It is the ability to detect a biological readout in a readout cell line which enables the user to identify genes having specific functions. The methods set forth herein are suitable for application in a high throughput format for identification of genes and their functions simultaneously.

Description

FUNCTION-BASED GENE DISCOVERY
1. FIELD OF THE INVENTION
The present invention relates generally to the field of genomics. More particularly, the present invention relates to methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by functional expression in one or more biological readout assays. This invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to screen a cDNA library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or compartments prior to functional expression. In one embodiment, this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in readout cells. It is the ability to detect a biological readout in a readout cell line which enables the user to identify genes having specific functions. The methods set forth herein are suitable for application in a high throughput format for identification of genes and their functions simultaneously.
2. BACKGROUND OF THE INVENTION
In the past 25 years, approximately 5,000 human genes have been cloned in full length and characterized by specific biological functions through various of assay systems. This represents only about 5% of an estimated 100,000 different genes in the human genome. A state-of-the-art method for determining gene function is to first individually clone a full-length cDNA encoding a protein-of-interest and then to perform assays in an attempt to determine biological function of the cloned gene. This approach can be quite expensive and inefficient due to the high cost of labor and materials for such work and because the entire process must be repeated from the beginning for each new gene to be characterized. In this regard, it generally takes several years for a skilled researcher to identify a new gene and characterize its corresponding function using methods focused on individual genes. With the advent of nucleic acid array technology to determine differential mRNA expression, the ability exists to analyze more than one gene at a time. For example, one might employ this method to select a subset of the 95,000 remaining genes to be characterized in a given tissue of interest. With this improvement alone, however, the speed of gene function discovery would remain painfully slow. One reason is that differential mRNA expression analysis does not address the question of gene function in any depth. Instead, the technique may simply mark a gene as an interesting target worthy of further study. By contrast, functional expression in a heterologous background permits a gene to 5 display a detectable function.
Some attempts have been made to develop systems for mammalian genetic functional screening (described further in Section 2.2 below). However, all of these attempts have involved detecting a positive signal through conferring a growth advantage. Such systems require weeks to grow cells under selective pressure, and the use of selection ^ tends to result in the cloning of mutated genes. Of course, these systems are limited to functional identification of growth-related genes, and will not identify genes, for example, that are associated with cell death, that are toxic to a cell, or that cause other morphological changes. Accordingly, there is an urgent need for increased efficiency in the process of gene identification and functional characterization such that it not only takes less time but also yields more information.
2.1. GENOMIC SCIENCES AND DRUG DISCOVERY
There are about 40,000 prescription drug products currently available on the market, including over 6,700 brand names and 1,600 FDA-approved drugs. Despite this large 0 number, drugs are known to work on only about 417 molecular targets in the human body and fewer than 80 molecular targets in bacteria, viruses and parasites (see Drews, 1996, Genomic sciences and the medicine of tomorrow, Nature Biotechnology 14, 1516-1518). Of course, drugs achieve their desired effects by binding to specific cellular targets (e.g. receptors, ion channels, enzymes, and other proteins or molecules). For example, 5 breakthrough drugs for hypertension, depression, migraine, schizophrenia, and ulcers all act via specific receptors (Drews, 1996, Id.).
There are perhaps 100 to 150 major diseases in need of development of new treatments (Drews, 1996, Id.). If the number of genes contributing to each of these complex disease phenotypes is five to ten, and if each gene product interacts with from three to ten 0 other gene products, then the number of genes associated with these conditions is perhaps from 3,000 to 10,000 (Drews, 1996, Id.). All disease-associated genes may be considered to be potential drug intervention targets and/or diagnostic markers. By contrast to the 420 known human molecular targets on which currently-available drugs are believed to work, the majority of the 3,000 to 10,000 human disease-associated genes have not yet been identified. From these considerations it is readily apparent that isolation and characterization of remaining disease-associated genes will dramatically broaden the horizon for development of new and/or improved treatments for most human diseases.
Identification and functional characterization of previously-unknown genes provides proteins which may be useful as drugs. Historically, such drugs have been useful when the body makes too little of an important protein, or when the presence of supplemental amounts of a protein can arrest or reverse a disease process. Protein drugs have been made possible by genetic engineering which has enabled industrial-scale protein production. Among the current beneficiaries of protein drugs are, e.g., individuals who have had heart attacks and receive clot-dissolvers, individuals with renal failure who receive erythropoietin for anemia, and individuals with diabetes who receive recombinant human insulin.
Identification and functional characterization of new genes also provides reagents for potential use in gene therapy. Gene therapy may be defined as the introduction of genetic material into an individual for therapeutic benefit. Gene therapy may be used, for example, to correct detrimental genetic changes that occur in tumor cells, or to direct an individual's cells to produce a specific protein having therapeutic value. Although gene therapy still faces many technical hurdles, it offers promise for treatment of many disorders.
Such disorders include those associated with single genes, such as hemophilia, sickle cell anemia, thalassemia, Gaucher's Disease, Huntington's chorea, and many others. More complex, polygenic diseases, such as diabetes and Alzheimer's disease, are also likely to benefit from gene therapy.
In more complex conditions, such as dementia and severe obesity, several distinct diseases may actually exist concurrently. In such cases, a condition may be mistakenly classified as a single disease simply because medical science lacks the information and tools necessary to distinguish among the underlying disease processes. If functional information were available for most genes in the genome, it might become possible to accurately identify each specific disease and a corresponding optimal therapeutic intervention within such complex conditions.
Discovery of new genes and their functions permits development of diagnostics for early detection of diseases. Such diagnostics, in turn, permit timely use of drugs or other
- 3 - therapies for preventing irreversible damage. For example, current commercially-available gene-based diagnostics include tests for hemophilia A and B, phenylketonuria, retinoblastoma, and sickle-cell anemia. New, gene-based diagnostics may also be used to enhance the success rate of an existing therapeutic by identifying specific individuals within an affected group who respond well to a specific drug therapy. Similarly, diagnostics may help in development of new therapeutics through enhancement of understanding of differences among people in response to various medicines.
Existing expressed sequence tag (EST) databases are not, by themselves, sufficient to determine biological function. EST databases only suggest functional information to the extent that an EST encodes a domain of known function. Such databases do not provide any functional information for completely novel genes (i.e. genes not encoding any known domains or motifs).
As mentioned above, the state-of-the-art for determination of gene function has been to first clone a full length cDNA and then pursue functional characterization on an individual gene basis. The time consuming nature of the so-called single-gene approach can be illustrated by examination of the progress made. By 1995, the rate of functional characterization of newly-discovered genes reached a plateau of about 2,000 genes per year. If this rate continues, it would take another 46 years to identify the function of the genes remaining to be characterized in the human genome. The invention set forth herein provides methods to accelerate this schedule considerably.
It is believed the most efficient way to accomplish this characterization is to combine information from total genome sequencing with a database on gene expression patterns and another database on biological function, so that most of the estimated 100,000 genes encoded by the human genome can be grouped into a much smaller number of multi- component, core processes of known biochemical functions. Following this approach, gene groups, and then genes having strong medical relevance, would be prioritized for further, more thorough biological studies.
In contrast to the single-gene approach employed by previously available technologies, the invention described herein provides high throughput methods which combine the simultaneous isolation of gene structure with identification of gene function and/or functional gene group. By doing so, the method is able to directly screen mammalian cDNA libraries (average size 106 clones) using mammalian cell systems for biological functions with specific cellular markers. With high resolution bioassay technology, this strategy also yields all genes in the human genome which are involved in a particular biological functional process of interest. In addition, this strategy makes it possible to automate the gene function screening process in a high throughput fashion. Accordingly, this invention provides a much more efficient way to characterize the function of the human genome, as described in detail in Section 5 below.
A brief overview of functional genomics approaches currently under way serves to illustrate the state of the art. With regard to EST databases, Human Genome Sciences, Inc. (HGS) and Incyte Pharmaceuticals, Inc. (Incyte), have produced proprietary EST databases comprising partial sequences of perhaps more than 70% of all human genes. Despite the fact that these EST databases are not yet linked in a meaningful way to functional information, seventeen of the largest pharmaceutical companies have spent more than $482 million to subscribe, according to a 1996 report (Friedrich, 1996, Nature Biotechnol. 14, 1234). Additional organizations have chosen positional cloning strategies for linking gene structure with function. Still other organizations are applying nucleic acid array -technologies for analysis of expressed genes in a given tissue or cell (e.g. Affymetrix, Incyte).
Array technology, which represents the first attempt to go beyond single-gene methods of genome analysis, remains limited to characterization of gene expression as opposed to characterization of gene function. For example, one may use array technology to determine differential gene expression patterns in disease, thereby narrowing disease- gene candidates to a subset of genes. However, even under this approach, the speed of gene function discovery is not likely to increase significantly. This is so since such an approach would identify not only genes which may contribute to the cause of a disease process but also genes having altered expression as a consequence of a disease process. Further, the number of genes in the latter category is likely to vastly outnumber those in the first category. Analyzing potentially hundreds of genes that may be implicated in a given disease by an expression analysis using a single-gene approach would quickly become an overwhelming task. This is particularly evident when one considers that a given organization generally has a limited number of biological assays available in-house, i.e. far from enough for beginning to determine the biological function of new genes en masse. It is clear that a genetic screen capable of identifying all genes associated with a specific biological function would be the most efficient way of linking gene structure to function. Although genetic screening approaches have been widely used for such organisms as Drosophila and C. elegans, there is no such approach widely applicable to mammalian systems. This is primarily due to the large size of mammalian genomes (i.e. 105 genes) and a lack of sensitive assay systems for detecting positive signals over background noise.
Nevertheless, limited attempts have been made at developing mammalian genetic functional screening systems. For example, such systems have been described by Deiss and Kimchi (1991, Science 252, 117-120) and by Cohen (1996, Cell 85, 319-329). However, these systems are slow, labor-intensive, restricted to cloning growth-related genes, and have a tendency to isolate mutated genes. This latter tendency arises from a requirement for relatively long-term culture (i.e. two or more weeks) under selective pressure for identification of a growth phenotype.
Accordingly, a great need exists for a large-scale (i.e. genome- wide) mammalian genetic functional screening method which may be employed over a time period of days instead of weeks and which provides an automated, general format for use instead of a manual, specific format that must be tailored to each functional readout assay. This invention provides such a method, as described in detail in Section 5 below.
2.2. EXPRESSION CLONING
Many methods have been described for cloning genes by functional expression. One method by Clarke et al. (June 23, 1987, Method for identification and isolation of DNA encoding a desired protein, U.S. Patent No. 4,675,285) provides a ten-step approach for selection of cDNAs expressed from sub-pools of a library which includes testing media from cultured cells in which sub-pools are expressed so as to identify a cDNA encoding a desired protein. Another method by King et al. (August 5, 1997, "Method of expression cloning," U.S. Pat. No. 5,654,150) provides an improvement which employs pools of about
100 individual bacterial colonies. Yet another method by Sang (March 31, 1993,
"Expression cloning method," European Patent Application Pub. No. 0 534 619 A2) employs antibodies or ligands to screen expression libraries. As a general proposition, however, these methods have often been designed for very specific purposes, i.e. for identification of a single gene, and therefore lack general utility. For example, one method utilized a transient COS cell expression assay and monoclonal antibody binding to identify CD28 (Aruffo and Seed, 1987, Proc. Natl. Acad. Sci. U.S.A. 84, 8573-8577).
3. SUMMARY OF THE INVENTION
5 This invention provides methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by functional expression in one or more biological readout assays. This invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to l c) screen a cDNA library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or compartments prior to heterologous expression. In one embodiment, this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in heterologous readout cells. It is the ability to detect a biological readout in heterologous cells which enables the user to identify genes having specific functions. The methods set forth herein are suitable for application in a high throughput format for identification of genes and their functions simultaneously.
This invention provides a method for enhancing the signal-to-noise ratio of a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting
20 the bar-coded cDNA library using a nucleic acid array; and (b) transfecting the library sorted in step (a) into a readout cell line in situ. In one embodiment, the nucleic acid array is a biological array or a gene chip, in another embodiment, the biological array comprises a vector carrying a plurality of complementary bar codes. In still another embodiment, the plurality of complementary bar codes is immobilized on a support. In a preferred
25 embodiment, the support is nitrocellulose or nylon. In another embodiment, transfecting in situ is carried out using a chemical transfectant or electroporation. In another embodiment, the readout cell line is NIH 3T3 cells carrying a reporter gene under the control of a response element or promoter. Selection of the response element or promoter is guided by the particular readout assay selected. In still another embodiment, the reporter gene is
30 selected from the group consisting of β-galactosidase, luciferase and chloramphenicol acetyltransferase. In a preferred embodiment, the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1- inducible promoter and a p53-inducible promoter. In another preferred embodiment, the cDNA library is tetracycline-inducible or estrogen inducible. In still another preferred embodiment, the biological readout assay detects genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
This invention provides a method for conducting a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using a nucleic acid array; (b) transfecting the library sorted in step (a) into a readout cell line in situ; and (c) conducting the biological readout assay. In another embodiment, the nucleic acid array is a biological array or a gene chip. In another embodiment, the biological array comprises a vector carrying a plurality of complementary bar codes immobilized on a support. In another embodiment, the plurality of complementary bar codes consists of from 102 to 108 complementary bar codes. In another embodiment, the support is nitrocellulose or nylon. In another embodiment, transfecting in situ is carried out using a chemical transfectant or electroporation. In another embodiment, the readout cell line is NTH 3T3 cells carrying a reporter gene under the control of a response element or promoter. In another embodiment, the reporter gene is selected from the group consisting of β- galactosidase, luciferase and chloramphenicol acetyltransferase. In another embodiment, the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1 -inducible promoter and a p53-inducible promoter. In another embodiment, the bar-coded cDNA library is tetracycline inducible or estrogen inducible. In another embodiment, the biological readout assay is capable of detecting genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
8 This invention provides a method for conducting a biological readout assay used to screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using a nucleic acid array having a plurality of concave loci; (b) expressing the bar-coded cDNA library sorted in step (a) using in vitro transcription and translation to produce a population of proteins; and (c) screening the population of proteins produced in step (b) for a biochemical activity-of-interest, so as to conduct the biological readout assay. In one embodiment, the biochemical activity-of-interest screened in step (c) is selected from the group consisting of a receptor-binding activity, a ligand-binding activity and a growth factor activity. In another embodiment, screening is carried out by immobilizing the population of proteins on a solid support for use in a binding assay. In another embodiment, the solid support is nitrocellulose or nylon. In another embodiment, screening is carried out by placing the population of proteins in contact with readout cells for use in a biological activity assay.
This invention provides a method for identifying one or more genes-of-interest in a pre-sorted cDNA library comprising: (a) transfecting the pre-sorted cDNA library into a population of readout cells; and (b) screening the population of readout cells transfected in a biological readout assay, to identify one or more genes-of-interest. In one embodiment, the pre-sorted cDNA library comprises a bar-coded cDNA library hybridized to a nucleic acid array. In another embodiment, transfecting is carried out using chemical transfectants or electroporation. In another embodiment, the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
This invention provides a method of expression cloning one or more genes-of- interest in a cDNA library comprising: (a) sorting the cDNA library; (b) transfecting the sorted library into a readout cell line; and (c) identifying a positive signal from the transfected library in a biological readout assay, so as to expression clone one or more genes-of-interest in the cDNA library. In one embodiment, sorting the cDNA library is carried out using a nucleic acid array. In another embodiment, transfecting the sorted library is carried out using chemical transfectants or electroporation. the positive signal is identified by immunocytochemistry. In another embodiment, the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
This invention provides a method of sorting a cDNA library for use in an expression cloning assay comprising: (a) cloning a population of cDNA inserts into a population of bar-coded vectors; (b) preparing the population of bar-coded vectors for hybridization to a DNA array by exposing only the bar code region in single-stranded form; and (c) hybridizing the population of bar-coded vectors to a nucleic acid array to sort the cDNA library. In one embodiment, the nucleic acid array is selected from the group consisting of a gene chip and a biological array. In another embodiment, preparing the population of bar- coded vectors for hybridization to a DNA array by exposing only the bar code region in single-stranded form in step (b) is carried out using the following steps in the order stated: (a) digesting the population with a restriction endonuclease to linearize the population; (b) binding a DNA-binding protein to at least two sites on the population; and (c) digesting the population bound in step (b) to expose the single-stranded bar code region. In another embodiment, the DNA-binding protein is selected from the group consisting of a lactose repressor protein, a tetracycline repressor protein, E2F, API, SP1 and p53. In another embodiment, the restriction endonuclease is selected from the group consisting of NotI, Sfil and EcoRI. In another embodiment, digesting the vector population in step (c) is carried out using an enzyme selected from the group consisting of exonuclease III, T4 DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase and Pfu DNA polymerase.
10 4. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1. A phagemid vector for making a bar-coded cDNA library.
FIG. 2A-2B. 2A. Preparation of a bar-coded vector. 2B. Preparation of a bar- coded cDNA library.
FIG. 3. Sorting a bar-coded cDNA library.
FIG. 4. Flow chart of gene identification methods from the step of in situ transfection to the step of cDNA retrieval.
FIG. 5. Illustration of a gene chip with a plurality of concave loci.
5. DETAILED DESCRIPTION OF THE INVENTION This invention provides methods for function-based gene discovery. Genes are identified as having or being associated with a specific function, as participating in a specific functional pathway, or as being a member of a specific functional group, by expression in one or more biological readout assays. This invention provides expression cloning methods enabling high-throughput library screening for determination of gene function. The invention is based, at least in part, on the recognition that the signal-to-noise ratio of a readout assay used to screen a cDNA expression library can be significantly enhanced by localizing multiple molecular copies of each unique clone into discrete regions or compartments. It is the ability to detect a biological readout in heterologous cells which enables the user to identify genes having specific functions. A major advantage of the invention is to provide methods for assaying all genes in a cDNA expression library simultaneously, instead of one-at-a-time, under conditions in which the readout signal-to- noise ratio is significantly enhanced. Moreover, a rational basis for characterization of functional gene groups is provided where more than one gene is identified in any given readout assay.
In one embodiment, this invention provides methods for in situ transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from libraries being screened in heterologous readout cells. The vector "bar code" is an oligonucleotide
- 1 1 - sequence within the vector which is unique to each individual clone of a library. The bar code enables sorting of the library in physical space by hybridization to nucleic acid arrays which are complementary to library bar code sequences. The bar code unique to each clone, together with the unique position of each complementary bar code in a nucleic acid array, provides a method for direct retrieval of a gene having a function of interest identified in any given readout assay. Moreover, each unique bar code can serve as a specific primer for PCR and/or sequencing of a desired clone in a library.
5.1. GENERAL CONSIDERATIONS In both above-mentioned embodiments of the invention, it is the ability to detect a biological readout upon heterologous expression which enables the user to identify genes having specific functions. These embodiments are described in detail in Sections 5.2 below. A major advantage of the invention is to provide methods for assaying all genes in a cDNA library simultaneously for the ability to modify a specific biological function associated with a specific readout assay. The pattern of gene activity in any given readout assay also provides a rational method for identification of functional gene groups. The methods set forth are suitable for application in a high throughput format for rapid identification of genes and their functions. For example, such a high throughput format may easily screen, at least 104, or from 104 to 106, independent recombinants for functional activity at one time.
To practice the invention, a complementary DNA (cDNA) library is prepared from messenger RNA (mRNA) obtained from a cell population of interest (e.g. a cell population may be derived from a specific tissue, disease, or biological state). A cDNA library may also be purchased commercially. The cDNA is operably linked to an expression vector suitable for use with the invention. Constructs are prepared and purified using standard recombinant DNA techniques as described in, e.g., Sambrook et al. (1989, Molecular
Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York). Expression vectors suitable for use with the invention are available commercially, or can be specially designed by the user or as described herein.
12 5.1.1. OVERVIEW
The library technology of the invention has, «ter alia, the following three features: (a) inducibility of library gene expression; (b) suitability for use with sense and antisense libraries; and (c) suitability for use with libraries from various disease and non-disease tissues and/or cells, and/or various stages of development of interest. The user will note that when a bar-coded vector is used virtually any cDNA library vector is made suitable when modified to comprise a bar code as described herein. Such vectors do not require an inducible promoter because these cDNA libraries are directly transfected into readout cells without having to be propagated in virus-producing cells. The method is suitable for use with a microscope-based, in situ approach for detection of various readouts. Such readouts may include, but are not limited to: target protein expression; target mRNA expression; cellular localization changes; and/or cellular morphology changes. Such single-format, microscope-based detection may be easily automated. Because screening of libraries and/or sub-libraries requires only a few days, the possibility of appearance of mutated genes during prolonged growth in cell culture, as with prior art methods, is largely eliminated.
The methods of the invention are suitable for use in high throughput screening of a large number of functional (i.e. readout) assays of interest. Such functional assays include cell culture-based assays (i.e. cellular readout assays, see Section 5.5) that rely upon expression of genes in a library and detection of a functional effect of expression. This accelerated time scale provides a two-fold advantage in that it (a) requires a reduced workload relative to procedures requiring longer assay time; and (b) vastly reduces the appearance of mutated genes arising from prolonged cell culture time. The functional assay technology that can be used includes all existing immunostaining assays and biochemical assays. See Sections 5.5-5.7 below for a description of assays and Section 6 below for examples. Such assays are designed to identify genes involved in major disease categories as well as genes that regulate various cellular physiological functions.
13 5.1.2. mRNA SOURCES
There are no special considerations when choosing a messenger RNA (mRNA) source for construction of a cDNA library for use with the methods of the invention. Any mRNA source may be used. Accordingly, cells suitable as sources of mRNA from which a 5 cDNA library may be constructed include, but are not limited to, mammalian cells, bacterial cells, yeast cells, insect cells and amphibian cells. However, because of (a) the relative absence of genetic functional screening systems available for mammalian organisms compared to, say, flies or yeast, and (b) the relative complexity of the mammalian genome, the methods of the invention are preferred for use in screening mammalian cDNA libraries. 1 () Suitable mammalian mRNA sources include tissues and cell lines. Mammalian tissues that may be used include normal and disease tissues (e.g. carcinomas, lymphomas). Mammalian cell lines that may be used include any of the cell lines available from the American Type Culture Collection (ATCC). Exemplary mammalian cell lines include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (e.g.
-1 c
COS), human hepatocellular carcinoma cells (e.g. Hep G2), human embryonic kidney cells (e.g. HEK 293), mouse sertoli cells, canine kidney cells (e.g. MDCK), buffalo rat liver cells, human lung cells, human liver cells and mouse mammary tumor cells.
5.1.3. cDNA LIBRARIES
20 Sense or antisense cDNA libraries may be generated by any method known in the art. Many such methods exist and examples may be found in Sambrook et al. and Ausubel et al., both of which are incorporated by reference herein in their entireties (Sambrook et al.,
1989, Molecular Cloning, A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, New York; Ausubel et al., eds., in the Current Protocols in
25
Molecular Biology series of laboratory technique manuals, © 1987-1997 Current Protocols,
© 1994-1997 John Wiley and Sons, Inc.). Many references are available which describe antisense cDNA library construction (see e.g., Spann et al., 1996, Proc. Natl. Acad. Sci. U.S.A. 93, 5003-5007; and Deiss and Kimchi, 1991, Science 252, 117-120).
The library may be an antisense library such that antisense polynucleotides are
30 generated upon expression of the library. Such antisense polynucleotides may, for example, provide a source of inhibition of a detectable cellular event in a functional assay. In this
14 way, an antisense expression library will identify one or more genes required for operation of a specific pathway by "knocking out" (i.e. rendering inoperative) such a pathway.
A library may be divided into subpools for screening. For example, from 100 to 1,000 subpools may be generated, each subpool comprising a cDNA diversity of from about 5 10,000 to about 1,000 clones, thus representing a cDNA diversity in all pools combined of about 1,000,000. Each library subpool may be individually (i.e. separately) expressed in a heterologous cell population.
A library may be a normalized cDNA library. Any cDNA library normalization technique known to one skilled in the art may be used with the methods of the invention. ° For example, see "Normalization and subtraction: two approaches to facilitate gene discovery," Genome Research 6, 791-806 (1996).
5.2. BAR-CODED VECTORS
A "genetic bar code" is an oligonucleotide tag or label having a specific sequence. This invention provides a method of constructing a cDNA library in a vector containing a plurality of genetic bar codes at a diversity equal to or larger than the diversity of the cDNA library. This invention provides methods for sorting and transfecting such a library. The methods employ a unique genetic bar code linked to each clone in a library for various uses
(e.g. sorting, retrieval of insert). 0 The human genome is believed to encode about 100,000 genes, and any given human cell or tissue may express from about 10,000 to about 50,000 of these genes.
Therefore, in order to cover every expressed gene (including rare genes) during preparation of a human cDNA library from messenger RNA, it is preferred that about 106 independent clones be used. Accordingly, a vector having about 106 unique genetic bar codes is 5 preferred since it is preferred that a unique genetic bar code be associated with each library clone.
A bar code having ten nucleotides and using all four possible bases at each position is capable of generating a set of genetic bar codes having a diversity of 410 or 1.048 x 106.
The optimum length and base composition of oligonucleotides (oligos) for specific and 0 efficient hybridization to complementary sequences may be chosen by the user. For example, oligos 15 to 20 nucleotides long having a diversity of 415 to 420 (i.e., 109 to 1012) may be used to cover a library of 106 diversity to ensure that any two genetic bar codes are
- 15 - different by several nucleotides. Various approaches known in the art may also be used to reduce cross-hybridization among different bar codes (see e.g. Shoemaker et al., 1996, Nature Genetics 14, 450-456).
5.2.1. CONSTRUCTINGABAR-CODED VECTOR
The vector employed for use with a bar-coded cDNA library may be any vector incorporating a genetic bar code. In one embodiment, a suitable vector comprises the genetic bar code and a eukaryotic promoter. In another embodiment, a suitable vector comprises the genetic bar code, a eukaryotic promoter (e.g. a CMV promoter), a cDNA insert, an fl origin, an antibiotic resistance gene (e.g. ampicillin resistance gene), an SV40 origin and a ColE origin (see e.g. FIG. 1). For the phagemid vector illustrated in FIG. 1, sites 1 and 2 may be used for inserting the genetic bar code, sites 3 and 4 may be used for inserting cDNA, the fl origin may be used for making single-stranded DNA, and the antibiotic resistance gene provides for growth selection of the phagemid vector in E coli. The ColE and SV40 origins may be used to provide high copy number amplification of phagemid DNA in bacteria and eukaryotic cells, respectively (see FIG. 1).
By way of example and not limitation, a bar-coded library vector may be constructed as illustrated in FIG. 2A. Here, the vector and the bar code mixture are each digested with enzymes 1 and 2 (which cut at sites 1 and 2, respectively), and then ligated to produce the bar-coded library vector (FIG. 2A). A bar-coded library may be constructed as illustrated in FIG. 2B. Messenger RNA (mRNA) is reverse transcribed using an oligo-dT primer containing restriction site 3 using methods well known to those skilled in the art (see
FIG. 2B). Following conversion into double-stranded cDNA, an adapter or linker containing restriction site 4 is ligated to the 5' end (relative to the sense strand). The resulting double-stranded cDNA bears site 4 at its 5' end and site 3 at its 3' end. This cDNA and the bar-coded vector are each digested with enzymes 3 and 4 and ligated together to produce the double-stranded, bar-coded cDNA library. It is preferred that any library amplification be performed on plates, as opposed to in solution, to ensure equal amplification of all clones represented.
One skilled in the art will readily recognize and appreciate that the various features of a suitable vector need not be precisely as illustrated. For example, the location of the bar code sequence can be other than at the illustrated location within the vector.
- 16 - To ensure that each bar-coded vector only carries one genetic bar code, a population of double-stranded genetic bar codes is designed in dephosphorylated form and having a staggered restriction enzyme site at both ends of the bar code (e.g. EcoRI). For example, where chemically-synthesized oligonucleotides are already dephosphorylated following the chemical synthesis. An enzyme (e.g. alkaline phosphatase) can also be used to insure the dephosphorylated state. This method ensures that, after annealing of the two bar code strands, a double-stranded bar code is formed which lacks the phosphorylation which would be necessary for the formation of a bar code dimer.
Further, one can apply a "zero background" cloning system (e.g., such a system is commercially available from InVitrogen) to clone a bar code population in to the chosen vector. A zero background cloning system is a positive selection system for prokaryotic cloning which works by direct selection of inserts via disruption of the lethal gene ccdB (control of cell death). In such a system, only bacteria transformed with a genetic bar code inserted into the vector will survive and be propagated. In this way, the vector population generated will not contain any individual vectors lacking a bar code.
5.2.2. SORTING A BAR-CODED LIBRARY
Sorting of a bar-coded library may be carried out using supports having bound thereto oligonucleotide sequences that are complementary to the genetic bar codes of the cDNA library. A DNA sequence complementary to each genetic bar code in a bar-coded library is affixed (e.g. deposited or synthesized) at discrete locations of a nucleic acid array. Natural or modified nucleotides can be used for synthesis. Use of certain modified nucleotides may promote formation of stronger bonds with the complementary bar code of a vector. In this regard, the bonding properties of common modified nucleotides such as phosphorothioates have been well described.
An array may be a commercially available gene chip (e.g. Affymatrix, Incyte) or may be manufactured using methods known in the art. Many such methods have been described (for a brief review, see Ramsay, January 1998, Nature Biotechnol. 16, 40-44).
For example, light-directed, solid-phase synthesis technology permits massive numbers of oligonucleotides to be synthesized on a support at precise positions (see e.g. Fodor et al. ,
August 29, 1995, U.S. Pat. No. 5,445,934). To achieve gene separation and sorting using such an array, the array is hybridized with the single stranded genetic bar code region of a
- 17 - double stranded vector (see e.g. "ssGBC cDNA library" in FIG. 3, which illustrates such a single stranded genetic bar code region on a double stranded vector). In FIG. 3, a gene chip having a plurality of complementary genetic bar codes is shown. Each area of the chip (labeled A, B, C, and N) contains multiple copies of each unique complementary bar code. Following hybridization, each unique cDNA corresponding to each unique bar code is sorted into a discrete area on the chip (see bottom panel of FIG 7.)
Optimum hybridization conditions are used to ensure accurate base pairing between the various genetic bar codes of a library and their corresponding complements in a nucleic acid array so as to prevent or minimize mismatches. The hybridization (a) separates library vector molecules encoding distinct recombinants from each other and (b) sorts all library vector molecules encoding the same recombinant to discrete, known locations. This separation and sorting operation results in an equal amount of DNA at each location. Abundant genes of a library will be represented at multiple locations on a chip while rare genes will be represented at only one or a few locations. Equivalent amounts of DNA are hybridized at each location.
As an alternative to gene chip arrays, which may be expensive to manufacture, a "biological array" may be manufactured and used to sort a bar-coded expression library. Such a biological array is created from a library of sequences which are complementary to the bar codes of the expression vector. The biological array is segregated into discrete locati ons usi ng the physiology of the microbe (e.g. bacteria or yeast), as follows. Since each microbe which takes up a plasmid DNA containing a complementary bar code will only retain a single type of plasmid, each complementary genetic bar code is automatically separated from all others at this step. The user will note that the vector chosen to produce the complementary bar code array is different from the expression vector used to create the library so as to preclude any hybridization between the two vectors.
In this way, microbial colonies carrying a plasmid library encoding complementary bar codes are used to construct a biological array. Such a biological array can be easily reproduced by replica plating. The DNA of the array is easily immobilized on a solid support (e.g. nitrocellulose, nylon, etc.) by well known methods. The sequence of the complementary genetic bar code at each location of the array may be determined by standard sequencing reactions. This information may then be stored in a computer for later retrieval as needed.
- 18 - A biological array is different from that of a gene chip array as follows. Instead of having each complementary bar code present as a homogeneous nucleic acid at each location of the array, each is present in a background of other DNA (i.e. from the plasmid carrying the complementary bar code and the microbial genome).
5.2.3. IN SITU TRANSFECTION OF A SORTED LIBRARY
Analysis of a sorted library by functional expression provides a means to screen a large number of genes simultaneously in order to identify genes having the biological function specified by the chosen readout assay. Any of the following methods may be used for in situ transfection of a sorted library. Following all of the transfection procedures described below, readout cells are rinsed in a physiologic buffer and cultured for a period of time to allow expression of the transfected genes. The period of time to allow expression may be from 12 hours to 12 days, from 1 day to 6 days, or from 2 days to 3 days. In a preferred embodiment, the period of time is from 1 day to 4 days.
In one embodiment, in situ transfection may be performed using a gene chip having a plurality of concave loci (i.e. U-shaped areas), each locus having an oligonucleotide complementary to a genetic bar code attached thereto. Following library sorting by hybridization, such a gene chip will have an individual cDNA recombinant at each locus.
In situ transfection is performed by contacting the chip to a readout cell culture in the presence of a solution which facilitates the release of the hybridized recombinants. Such a solution may be, e.g., phosphate-buffered saline or tissue culture medium without serum supplement. Generally, any low-salt solution (e.g. 150 mM NaCl or lower) will result in the dissociation (i.e. release) of the hybridized recombinants from the gene chip. Chemical transfectants may also be included in the solution to facilitate uptake of the released DNA into the readout cells. Such transfectants may be any transfectant known in the art. For example, calcium phosphate, DEAE-dextran, polybrene or a lipid-based transfectant such as
LT1 (Panvera) or Lipofectamine (GibcoBRL) may be used.
In another embodiment, in situ transfection may be performed using a biological array or a gene chip having a flat surface. Here, in situ transfection may be performed by contacting the biological array or the gene chip to a readout cell line m the presence of a solution as described above. A chemical transfectant may also be used as described above.
In a preferred embodiment, a micro-compartmentalization grid device may also be used to
- 19 - restrict diffusion of each released recombinant. Such a micro-compartmentalization grid device may be as illustrated in FIG. 3 of copending U.S. Patent Application No. 09/065,776, filed April 24, 1998, entitled "MICRO-COMPARTMENTALIZATION DEVICE AND USES THEREOF," by Cen and Sun (Attorney Docket No. 9557-003), which is incorporated herein by reference in its entirety.
In yet another embodiment, in situ transfection may be performed by electroporation. Electroporation may be performed using, e.g., a cell culture device as described in U.S. Patent No. 5,134,070, which is incorporated by reference herein in its entirety. Here, the readout cell line used for electroporation may be any readout cell line which will attach to and proliferate on a solid support. Such a cell line may be grown in a monolayer on the bottom of a cell culture device which is electrically conductive (see e.g. U.S. Patent No. 5,134,070). A gene chip or biological array having a sorted cDNA library attached thereto is contacted with the cell line in the presence of a suitable electroporation solution (e.g. phosphate buffered saline or as described in U.S. Patent No. 5,134,070) such that it is between the electrically conductive upper and lower surfaces of the culture device. The contact of the upper surface of the culture device with the electroporation solution provides a continuous electric circuit for passing a current which mobilizes the DNA from the gene chip or biological array to the readout cells.
In yet still another embodiment, in situ transfection may be performed using enzymes to facilitate attachment to and release from a gene chip or a biological array. Here, one may covalently attach a sorted library to a nucleic acid array to ensure tight binding using T4 ligase, or an enzyme having a similar activity, to ligate the oligonucleotides encoding the complementary bar codes to the bar-coded cDNA library following hybridization. Such a covalently-bound bar-coded cDNA library may be released at will by . including a restriction endonuclease in the transfection solution (e.g. if an EcoRI site is used as site 2, see FIG. 1, then one may cut with EcoRI).
Finally, to ensure specific hybridization of a bar-coded cDNA library to a nucleic acid array, enzymes which cut mismatched nucleotides (such as T4 Endonuclease VII, also called resolvase, see Youil et al., 1996, Genomics 32:431) may be used to eliminate cross- hybridization following the completion of the hybridization process.
An overview of the in situ transfection (gene transfer) methods as they relate to the overall process of gene identification and retrieval, is illustrated schematically in FIG. 4.
- 20 - 5.2.4. BIOCHEMICAL ANALYSIS OF A SORTED LIBRARY
In addition to expression of a sorted cDNA library in one or more cellular readout assays, biochemical analysis of a sorted library may also be performed. A solid support having a plurality of concave loci may be used, as described above and depicted in FIG. 5. For biochemical analysis, each individual protein encoded by the sorted library is first expressed using the well-known techniques of in vitro transcription and translation. Since each individual protein expressed is compartmentalized at a discrete locus, each may be screened for any biochemical activity of interest (e.g. receptor-binding activity, ligand- binding activity, growth factor activity). See Section 5.6 below for a description of various biochemical readout assays.
Individual proteins may be subsequently immobilized on a solid support (e.g. nitrocellulose or nylon) for use in binding or other assays. Alternatively, individual proteins may be left free within U-shaped wells for subsequent assay of activity. For example, for detection of growth factor activity, in vitro translation products may be placed in contact with readout cells using mild centrifugation to transfer the contents of each U-shaped locus onto a readout cell grid.
5.2.5. GENE RETRIEVAL AND MONITORING
Multiple methods are available for retrieval and monitoring of library inserts using genetic bar codes. Following identification of a specific clone-of-interest in a readout assay, the unique genetic bar code situated in the vector next to the cDNA insert may be used to isolate the clone-of-interest. For example, localized releasing of DNA hybridized on a nucleic acid array may be carried out by competition using an oligonucleotide identical to the bar code of interest. Further, isolation of the clone-of-interest may be carried out by polymerase chain reaction (PCR) to amplify only insert cDNA linked to the identified genetic bar code. Under this approach, for example, the bar code sequence may be used as a specific primer together with a suitable vector primer and total library DNA as template.
The primer of the genetic bar code can also be used for isolating the specific plasmid by a procedure referred to as "gene trapping" (Gibco BRL). Briefly, "gene trapping" is a method for rapid isolation of cDNA clones from single stranded DNA prepared from a library. This method is based on isolating cDNA clones which hybridize with a biotinylated oligonucleotide complementary to a cDNA of interest (see Le et al., 1995, Focus 17, 45).
- 21 - Still further, isolation of the clone-of-interest may be carried out by physical "picking" since the location of each unique bar code sequence within an array is generally known.
The genetic bar code technology not only enables performing cellular functional assays on all genes represented in a cDNA library simultaneously, but is also amenable to automation. Such automation allows many functional assays to be performed in a single format, as described further in Section 5.8 below.
5.2.6. GENETIC BAR CODE DESIGN
To facilitate uniform hybridization between a bar coded library and its complementary sequences of a nucleic acid array, it is preferred that all oligonucleotides (i.e. genetic bar codes) have the same or nearly the same melting temperature (Tm). It is also preferred that conditions be provided in which only the genetic bar code of a bar-coded vector is in single stranded form, while the remainder of the vector remains in double stranded form, so as to minimize interactions among vectors carrying different but possibly related cDNA inserts. A bar code in a double-stranded vector can be exposed in single- stranded form using an enzyme having 3' to 5' exonuclease activity, such as T4 DNA polymerase, and a bar code population having one nucleotide omitted from all bar code sequences. Such an exonuclease activity is capable of cleaving nucleotides from a 3' recessed end, and such enzymes will cease exonuclease activity under certain conditions.
For example, if C is omitted from the bar code sequence, then T4 polymerase may be used in the presence of G and the linearized double stranded vector to expose the bar code in single stranded form. As a further example, if A is omitted from the bar code sequence, T4 polymerase may be used in the presence of T and the linearized double stranded vector to expose the bar code in single stranded form. In other words, during a T4 exonuclease digestion, all nucleotide triphosphates (NTPs) are omitted from solution except the NTP complementary to the nucleotide omitted from the bar code. In this way, T4 DNA polymerase and a bar code population lacking one of the four nucleotides in its sequence may be used to make the bar code single-stranded and protruding from the end of a linearized, double-stranded vector. Alternatively, such an enzymatic exonuclease may be used when all four nucleotides are present in the bar code so long as the timing of the reaction is closely controlled to expose only the bar code in single stranded form.
22 In one embodiment, a genetic bar code pool may be designed using a set of nucleotide dimers as building blocks for synthesizing the pool. For example, one nucleotide dimer set consists of TG, AG, GA and GT. It is notable that this set has a minimum of one nucleotide identity between any two given dimers. Such a set is referred to herein as a minimal mismatch set (MMS). The average pair-wise mismatch for the set of TG, AG, GA and GT is 1.7 nucleotides, or 83%. The average pair- wise mismatch is computed by adding all the possible pair-wise nucleotide differences and dividing by the total number of pairs. In this case, it is 10/6, or about 1.7 nucleotides, or 1.7/2 = 85%. While the omitted nucleotide in the MMS listed above is C, the omitted nucleotide may be any of the four nucleotides when designing such an MMS. Nucleotide dimers are chosen such that the Tm for each dimer remains constant. In this way, the Tm for each genetic bar code of a pool will be the same. Methods for computing Tm are well known to one skilled in the art. The omission of a nucleotide in design of a bar code pool may be used to allow formation of a protruding end encoding the bar codes, as described above. A pool of 20-mers generated through random synthesis using the above-listed set of nucleotide dimers will produce a pool having a diversity of 410 or 1.05 x 106 genetic bar codes. The minimum percentage of pair-wise mismatches within this pool of genetic bar code 20-mers is 1/20 or 5%, while, as noted above, the average pair-wise mismatch between any two genetic bar codes is 83%.
In another embodiment, genetic bar codes may be designed using a set of nucleotide tπmers, each trimer having a minimum of two nucleotides different from any other trimer and each trimer having one G. Such an example set, which omits C, consists of AGT, TGA,
TAG, ATG, GTA and GAT. The average pair-wise mismatch between any two genetic bar codes produced using this MMS is 2.4 nucleotides, or 80%. An oligonucleotide pool of genetic bar codes constructed randomly from eight rounds of synthesis using the above- Λ listed six trimers will have a diversity of 1.68 x 10 with each bar code having a length of
24 nucleotides. The minimum percentage of pair- wise mismatch within this pool of genetic bar codes is 2/24 or 8.3%, while the average pair-wise mismatch between any two genetic bar codes in this set is 80%.
If 4-mer oligonucleotides are used as building blocks, each having a minimum of three nucleotides different at each position from all other 4-mers and each having one G, then there are eight building blocks in the MMS, as follows: GATT, TGAT, TAGA, TTTG,
GTAA, AGTA, ATGT and AAAG (see also U.S. Patent No. 5,635,400 by Brenner).
- 23 - Alternatively, one can choose another MMS from a total of 32 possibilities (i.e. 23C'4 where C designates combinatorial) of 4-mers having one G nucleotide and 3 other nucleotides (T, A, or mixture of T and A). For example, an MMS may consist of GAT A, GTAT, AGAA, TGTT, AAGT, TTGA, ATTG and TAAG. A pool of genetic bar codes constructed using seven 4-mer subunits will produce a bar code length of 28 nucleotides and a bar code diversity of 1 x 106. The minimum percentage of pair- wise mismatches within this genetic bar code pool is 3/28 or 10.1%.
In general, if an N-mer oligonucleotide building block having a number of G nucleotides in the N-mer equal to k and the remaining nucleotides (i.e., N-k) consisting of A or T, or an A plus T mixture, then the bar code diversity is equal to 2<N"k)Ck N. Further, there exists 2(N"k)Ck N number of MMS in these total possible constructs for a give minimal mismatch cut-off. In general, the number of sequences in different MMS is not the same. For example, in the case 5-mer with two G in each sequence and three nucleotides of A or T or A and T mixture. There are 80 sets of MMS. Some of these MMS have 8 sequences and some have 12 sequences for a mismatch cut-off of 3. It is generally true also that the larger the N for a given minimal mismatch cut-off, the larger the number of sequences in any set of MMS. It is also true that for a given diversity number in a library of genetic bar codes constructed from N-mer nucleotide subunits (such as those listed above, 8 tetrameric MMS in 4-mer example), the larger the N, the higher the minimal mismatch percentage in the library.
Accordingly, another way of constructing a pool of genetic bar codes is to use a certain number (e.g. 100) of oligonucleotides as building blocks selected from all possible combinations of a fixed length (e.g. 9-mer) with a certain minimum number (e.g. four) of nucleotides different among them. The diversity of genetic bar codes will be precisely
1,000,000 if the bar codes are composed of three subunits of 9-mers. Further, the minimum percentage difference between any two bar codes within this pool is 4/27 or 14.8%. The average pair-wise sequence difference in this pool of 1,000,000 genetic bar codes is 65%>, or 17.5 nucleotides. In a preferred embodiment, to ensure hybridization stability, the number of G nucleotides in an MMS nucleotide subunit ranges from 45% to 50%>. The number of G nucleotides is the same in all bar codes. A pool of 36-mer genetic bar codes, each constructed from four 9-mers of this MMS has a diversity of one hundred million. The minimal pair- wise sequence difference mismatch between any two 36-mers in this pool of
- 24 - genetic bar codes is 14.8%> while the average pair-wise sequence mismatch is 65%>, or 23.4 nucleotides.
The following example lists one hundred 9-mers of a minimal mismatch set, each having four G nucleotides and five nucleotides selected from the group consisting of A, T, and an A plus T mixture, and further having a pair- wise mismatch of at least 4 nucleotides between any two 9-mers. Calculated using the formula set forth above, there exists 4,032 members in this MMS of 9-mers (i.e. 25C4 9). The typical number of sequences in each of these 4,032 sets ranges from 80 to 101. A pool of 27-mer genetic bar codes, each constructed from three 9-mers of this MMS, has a diversity of one million. The minimum pair- wise sequence mismatch between any two 27-mers in this pool of genetic bar codes is 14.8%) while the average pair-wise sequence mismatch is 65%>. Likewise, a population of 36-mer genetic bar codes, each constructed from four 9-mers of this MMS, has a diversity of one hundred million. The minimal pair-wise sequence mismatch between any two 36- mers in this pool is 14.8%, while the average pair-wise sequence mismatch is 65%, or 23.4 nucleotides.
The following one hundred 9-mers each has four G. This list sets forth one of the 4032 minimal mismatch sets available under this scheme. GGGTATGAA GGGGAAAAT GGGGTATTA
GGGAGTATA GGGAAGTTT GGGTTTAGT GGGATTTAG GGAGGTTAA
GGAGAGATA GGTGTGTAT GGAGTTGTT GGTGATTTG GGAAGGAAT
GGTTGGTTA GGTAGAGAA
- 25 - GGATGAAGA GGTAGTTGT GGAAGATTG GGTTGTAAG GGAATGTGA
GGATAGTAG GGTATGATG GGAAAAGGT GGTTTATGG GAGGGTTTT
GAGGAGTAA GTGGTGATT GTGGATAGA GTGTGGAAA GAGTGAGAT GAGAGATGA GAGATGGTA GAGTAGATG GTGTTAGGA GTGAAAGAG GAAGGAGTA GTTGGTGAT GATGGAAGT GTAGGAAAG GATGAGGTT GTAGTGGAA GTAGAGTGT GAAGTGTTG GATGTTGGA GAAGATGAG GTTGTAGTG GTATGGGTT
- 26 - GATAGGTAG GTAAGTGGA GAATGTTGG GAATAGGGA GTTATGGGT
GTATTGAGG GTTTATGGG AGGGTGAAA AGGGATTGT TGGGTTATG
AGGTGGATT TGGAGGTAA AGGAGTGAT TGGTGAGTA TGGAGAAGT AGGTGATAG TGGTTGGAT TGGTAGAGA TGGATTGGA AGGATAGTG TGAGGGTTT AGTGGAGTT AGTGGTAGA AGAGAGGAT TGTGTGGTA TGTGAGAAG AGAGTAGGA AGTGTTGAG TGAGAAGTG AGAAGGGTA TGATGTGGT TGTAGTGTG
- 27 - AGTTAGGTG
AGAAAGAGG
ATGGGGTAT
TAGGGGATA 5 ATGGGTGTA
AAGGGTAAG
TTGGGATTG
TAGGTGTGT
TAGGAAGGA 10 ATGGTAAGG
TTGTGTGAG
ATGAGTTGG
AAGAAGGGT
AAGTTTGGG i5 AATGGGGAA
TTTGGGTGA
ATTGGGATG
AATGAGTGG
TTAGTTGGG 20 TATTGGAGG
AAAAGAGGG If a restriction endonuclease or other mechanism is used to generate a single- stranded region in a bar-coded vector, then all four nucleotides may be used for design and synthesis of the genetic bar code. For example, restriction endonucleases such as Bbvl,
25
Bbsl, Bsal, BsmA 1, BsmF 1, BspM 1, Fokl, Hga I and SfaN I may be used to generate an end having four or five protruding nucleotides.
5.2.7. PRODUCTION OF SINGLE-STRANDED GENETIC BAR CODES IN DOUBLE STRANDED LIBRARY 3 0 VECTORS
An example restriction endonuclease for linearizing a bar-coded library and further exposing the genetic bar codes in single stranded form is Bgl II which may be used at site 1
- 28 - in FIG. 5. Bgl II generates a 3' recessed end lacking G. T4 DNA polymerase is then used in the presence of GTP (and the absence of other NTPs) to digest the complementary strand of the genetic bar code. If an enzyme such as EcoR I is used for site 2 (FIG. 1), T4 DNA polymerase will stop degradation at the EcoR I site when it encounters nucleotide G. T4 DNA polymerase is then inactivated by heat. The bar-coded cDNA library having protruding single-stranded bar codes may then be purified using standard phenol/chloroform extraction and ethanol precipitation. For the convenience of cloning, a Bgl II site plus two additional nucleotides at its 5' end (for effective digestion with BgL II) is synthesized as a standard component of the vector proceeding the first nucleotide of the genetic bar code. Likewise, an EcoR I site plus one additional nucleotide at its 3' end (for effective digestion with EcoR I) is synthesized as a standard component of the vector after the last nucleotide of the genetic bar code.
5.2.8. HYBRIDIZATION OF A BAR-CODED LIBRARY WITH A GENE CHIP, WITH BAR-CODED BEADS OR
WITH A BIOLOGICAL ARRAY
Hybridization of a bar-coded cDNA library with a genetic bar code population (whether on gene chips, beads or biological arrays) is carried out several hours to overnight at an optimal temperature, preferably 5 to 10 °C lower than the Tm or 2 to 5°C below Td of the genetic bar code population. The prehybridization buffer may be as follows: 6 x SSC
(or 6 x SSPE), 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 8.0), 0.5% SDS, 100 μg/ml denatured, fragmented salmon sperm DNA, and 0.1 % nonfat dried milk. Hybridization buffer may be 3.0 M TMA Cl or 2.4 M TEA Cl, 0.01 M sodium phosphate
(pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 μg/ml denatured, fragmented salmon sperm DNA, and 0.1 % nonfat dried milk. A hybridized bar-coded cDNA library may be washed with 6 x SSC solution and 2 x SSC solution as needed.
5.2.9. OTHER APPLICATIONS OF cDNA OR GENOMIC
LIBRARIES SORTED USING GENETIC BAR CODES
A sorted, single-stranded cDNA library or genomic library can be used to create a
"library array" for studying differential gene expression, as described below.
29 - To make a single-stranded library array, a bar-coded library having a protruding single-stranded genetic bar code region is hybridized with a gene chip. This is followed by incubation with a DNA ligase, such as T4 ligase, to covalently bond the complementary strand of cDNA to the gene chip. The sense strand of cDNA may be removed by, for 5 example, denaturing (e.g. 100 °C incubation) under high stringency or through using an enzyme such as Exo III and the like (Hoheisel, 1993, Anal. Biochem. 209:238), so as to convert a double stranded library into a single stranded library. For preparation of a single stranded antisense library, the genetic bar code region is placed in front of a CMV promoter and the same strand as sense cDNA. For preparation of a single stranded sense library, the 10 genetic bar code region is placed at the end of the cDNA and the same strand as antisense cDNA.
Among the many advantages of making such a sorted library array using the genetic bar code technology of the invention are the following: 1) such a library array can be constructed at a density as high as an oligonucleotide array on a gene chip; 2) only a single
T 5 sample is needed instead of a million samples for individual spotting for the current DNA array technology (i.e. since the cDNA library can be sorted using the genetic bar code method, only one sample containing the whole cDNA library is prepared, instead of preparing 106 individual samples as would be otherwise required); and 3) many different cDNA library arrays can be easily prepared (i.e. library arrays can be made from a plurality
20 of bar coded cDNA libraries obtained from various sources without changing the format of the gene chip).
5.2.10. BEADS CAN REPLACE CHIPS WHEN USING
GENETIC BAR CODES TO SORT A LIBRARY
25
In another embodiment of the invention, beads may be used instead of a gene chip or biological array, for sorting a cDNA library. In this embodiment, each individual bead carries multiple copies of one unique complementary bar code. Therefore, each individual bead will hybridize to multiple copies of a single recombinant, thereby sorting and concentrating individual members of the library to discrete loci. As spherical or spheroid
30 supports which can migrate in solution, beads may provide enhanced hybridization efficiency compared to gene chips or biological arrays. Following hybridization, each bead represents a single, easily manipulable recombinant which may be assayed under a high-
- 30 - throughput pharmaceutical screening format (e.g. one bead per well) to determine biological functions of the encoded cDNAs. For example, one can place each unique bead, with a bar- coded vector DNA hybridized thereto, into a well of an assay plate or into a micro- compartment of a micro-compartmentalization device (e.g. a 96-well plate, a 384-well plate, or a plate having a larger number of wells or micro-compartments). Such a micro- compartmentalization device may be as described in FIG. 1 through FIG. 4 of copending U.S. Patent Application No. 09/065,776, filed April 24, 1998, entitled "MICRO- COMPARTMENTALIZATION DEVICE AND USES THEREOF," by Cen and Sun (Attorney Docket No. 9557-003), which is incorporated herein by reference in its entirety. Bead placement may be performed robotically. The presence of a low salt solution in each well permits dissociation of vector DNA from the beads. The resulting solution in each well, now containing DNA of a single recombinant, may be mixed with a chemically transfected or electroporated into readout cells. A removable micro-compartmentalization device as described herein may be used during the transfection or electroporation procedure. If such a device is used, then the grid of the device may be removed from the readout cell culture following gene transfer so as to facilitate processing (i.e., rinsing, culturing, and assaying for biological function). Any recombinant producing a positive signal in a readout assay may be recovered, for example, by sampling the positive cell population and performing PCR using primers flanking the cDNA insert of the vector.
5.2.11. AN ALTERNATIVE METHOD FOR GENERATING A SINGLE STRANDED GENETIC BAR CODE REGION ON A DOUBLE STRANDED VECTOR
To selectively expose single stranded DNA encoding the genetic bar code region of a double stranded vector, a protein binding site may be installed in the vector. A DNA binding protein which recognizes the protein binding site in the vector can then be used to sterically hinder (i.e. block or prevent) the 3' to 5' exonuclease progression beyond the bar code region of the vector. Such DNA binding proteins may include prokaryotic proteins such as a lactose repressor which binds to a lactose operator sequence, a tetracycline (tet) repressor which binds to a tet operator sequence, etc. , or eukaryotic proteins such as a eukaryotic transcription factor (e.g. E2F, API, SP1, p53, etc.). Any chosen protein binding site may be installed outside of the genetic bar code region defined by site 1 and site 2 using
- 31 - standard recombinant DNA techniques (see FIG. 1). When cDNA library plasmids are linearized at site 1, DNA binding proteins may be applied to occupy two sets of DNA binding sites. Subsequently, a 3' to 5' single strand exonuclease (e.g. exonuclease III, T4 DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase, Pfu DNA polymerase, etc.) may be used to remove the complementary strand of the genetic bar code and site 2 for the genetic bar code end, and to remove site 1 for the non genetic bar code end. Exonuclease activity is stopped at the DNA binding sites by steric hindrance from the bound DNA binding proteins. Then the exonuclease is heat inactivated and both DNA binding proteins and exonuclease are removed from the DNA by phenol/chloroform extraction. When exonuclease III is used as the 3' to 5' exonuclease, the protection of the non genetic bar code end from 3' to 5' exonuclease activity can also be achieved by generating a 3' overhang which is resistant to exonuclease III. This approach may be used as an alternative to the DNA-protein complex formation described above which sterically hinders (i.e., blocks or prevents) exonuclease progression along the DNA. A 3' overhang for this purpose can be obtained by installing a restriction enzyme site which produces such an overhang on the right side of site 1 (see FIG. 1). Suitable enzymes include Hae II, Kpn I, Nsi I, Pst I, Sac I, etc.
If a single stranded genetic bar code region is generated in the manner described above, all four nucleotides may be included in construction of the pool of unique genetic bar codes (as opposed to using only three nucleotides as described herein). For a given nucleotide building block length (e.g. 3-mers to 10-mers or more), the number of unique sequences in any minimal mismatch set (MMS) for a given mismatch cutoff will be larger. In other words, it is possible to achieve a higher percentage of minimal pair- wise mismatch when using all four nucleotides. Accordingly, one benefit to using all four nucleotides is to achieve a reduction of potential cross hybridizations among similar but non-identical bar codes. Of course, this benefit must be weighed against the benefits of using bar codes consisting of three nucleotides in a given situation.
32 5.2.12. ALTERNATIVES FOR DESIGNING GENETIC BAR CODES NOT USING AN OLIGONUCLEOTIDE BUILDING BLOCK APPROACH
Computer algorithms may be used to facilitate the design of unique genetic bar code 5 sets having oligos of a given length, minimal cross hybridization, and the same or similar melting temperature (Tm) (see e.g. U.S. Patent Nos. 5,635,400 and 5,654,413 by Brenner).
Non-natural nucleotides (i.e. modified nucleotides or nucleotide derivatives) may be used when generating a complementary genetic bar code array to enhance the binding affinity between genetic bar codes and their complements. 10 Still further, when using the three nucleotide strategy to generate a protruding single stranded genetic bar code region from a double stranded vector, any DNA polymerase having 3' to 5' exonuclease activity may be used. Such polymerases include Klenow, T7, Vent, Pfu, and T4 DNA polymerases.
I5 5.3. OUTPUT CONSIDERATIONS
Methods are provided to systematically screen expressed genes of the human genome for specific functions using a large number of functional assays. Such technology provides a very rapid system for gene identification in which at least some functional information may be inferred. Readout assays may be cell-based or biochemical-based. In
20 examples of cell-based readout assays, changes in cellular morphology, immunostaining, or reporter gene expression can be detected within 1-2 days after library gene expression. In examples of biochemical-based readout assays, changes in ligand binding, growth factor activity, or enzymatic activity can be detected within 1-2 days after library gene in vitro transcription and translation. Using either approach, full functional screening of a library
25 ή having a diversity of 10 can be completed within a few days.
In this way, the methods of the invention provide the advantages of minimizing workload and reducing the occurrence of gene mutations which can arise in screening assays employing long-term culture. It is estimated that one of ordinary skill in the art can easily screen one library per week by the methods of the invention without using
30 automation. Of course, if automation is used, multiple libraries per week may be screened.
Cell based immunostaining assays have been extensively used under the single-gene approach for detection of gene expression, subcellular localization and biological functions.
- 33 - Two examples of established mammalian cell-based immunostaining assays are as follows. First, cytochemical detection of intracellular LDL-derived cholesterol accumulation has been used to demonstrate that cholesterol accumulates in fibrob lasts of individuals having Niemann-Pick Cl disease (see Carstea et al., Science 277, 228-231). Second, cellular localization of heat shock transcription factor (HSF) has been used to demonstrate that HSF nuclear localization changes from a uniform distribution to a punctate distribution when staining for activated HSF after c-myb expression in 293T cells (see Kanei-Ishii et al., Science 277, 246-248).
Similar to these two examples, functional assays of use together with the methods of the invention will measure changes (i.e. induction or reduction) of target gene expression, changes of cellular localization of a specific antigen, changes in cellular behaviors (e.g. growth factor secretion, apoptosis factor secretion, differentiation factor secretion), and changes in cellular morphology.
There are many assays of gene function in existence, each having a particular readout. For example, induction or reduction of target gene mRNA or protein expression can be detected by means standard in the art, including nucleic acid hybridization and antibody detection of specific antigens.
There are at least three categories of readout assays for use with the methods of the invention: (a) assays for genes associated with signaling pathways; (b) assays for genes associated with specific diseases; and (c) assays for genes associated with cellular physiological functions. Each of these categories is further described below.
5.3.1. PATHWAYS
The signals a cell receives, whether from outside or inside the cell, are generally transmitted through a cascade of molecular interactions, including protein-protein interactions. The overall process is generally termed signal transduction. The signaling pathways which may be assayed for identification of associated genes include, but are not limited to, a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
- 34 For proliferation signaling, BRDU incorporation or PCNA induction can be the cellular event detected. For stress signaling, p53 induction, Jun induction, or nuclear HSF3 aggregates can be the detectable cellular event. For apoptosis signaling, detection by Apo Alert™ staining (Gavrieli et al, 1992, J. Cell. Biol. 119, 493) or annexin staining can be the detectable cellular event. For anti-proliferation signaling, detection of p21, p27, p57, pl5, pl6, pl8, or pl9 induction can be the detectable cellular event. For Wnt signaling, detection of β-catenin induction or β-catenin re-localization can be the detectable cellular event. For STAT signaling, detection of induction of a reporter gene under the control of a STATl, 2, 3, 4, 5, 6, or 7 promoter can be the detectable cellular event. For AP-1 signaling, detection of c-fos induction can be the detectable cellular event. For CREB signaling, CREB phosphorylation, or induction of a reporter gene under the control of the CREB promoter, can be the detectable cellular event. For NFKB signaling, NFKB re-localization, or induction of a reporter gene under NFKB promoter control, can be the detectable cellular event. For NFAT signaling, IL-2 mediated proliferation can be the detectable cellular event. Other signaling pathways can include Hedgehog signaling (detectable by GLI-1, GLI-2, and GLI-3 induction); nuclear receptor signaling (detectable by induction or reduction of a reporter gene under estrogen, retinoic acid, vitamin D3 or thyroid hormone responsive promoters); antiviral signaling (detectable by induction of interferon alpha or beta); myc-max signaling (detectable by induction of a reporter gene under a myc-Max responsive promoter); BMP signaling (detectable by nuclear translocation of Smad); and insulin signaling (detectable by Glutl or Glut4 re-localization).
5.3.2. DISEASES
Specific diseases of interest include, but are not limited to, cancer, inflammation, atherosclerosis, autoimmune diseases, diabetes, infection, diseases of metabolism (e.g. obesity), and neurodegenerative diseases (e.g. Alzheimer's disease and Parkinson's disease).
Readout assays involving detection of changes (i.e. increases or decreases) in the levels of the following targets may identify genes associated with the indicated specific disease.
Assays that may detect genes involved in cancer include assays for detection of:
HLA for immune surveillance; OSM for anti-cancer growth; GADD45 and GADD153 for tumor suppression; nm23 for anti-metastasis; vEGFA, vEGFB, vEGFC, PIGF, and FGF2 for angiogenesis; MDR for drug resistance: CASP100 for apoptosis; and PDGFA, PDGFB,
- 35 - FGF1, 3, 4, 5, 6, 7, 8, and 9, IGF 1, IGF 11, cyclin A, Bl, C, Dl, D2, D3, E, F, Gl, and H, c-myc and c-Jun for growth. Assays that may detect genes involved in inflammation include detection of Cox2, IL-lβ, IL-6, TNFα, IL-13, E-selectin, VCAMI, ICAM 1 and 2,
NFKB, c-Rel, RelB, IκBα, IκBβ, and Bcl3. Changes in the level of expression of the following targets may be assayed immuno logically in response to expression of a heterologous cDNA in order to detect genes which may be involved in a given disease. For example, the potential targets that can be used for detecting genes involved in atherosclerosis include Egr-I. The potential targets that can be used for detecting genes involved in autoimmunity include Fas and Fas ligand. The potential targets that can be used for detecting genes involved in diabetes include insulin.
The potential targets that can be used for detecting genes involved in infection include chemokines (MlP-l , MlP-lβ, MIP-2, RANTES, MCP-1, MCP-2, GRO , GROβ, GROγ,
ENA-78, 1309, and IP10) and various cytokines (e.g. IL-2, IL-13, GM-CSF, G-CSF, and
M-CSF). The potential targets that can be used for detecting genes involved in obesity include leptin and the leptin receptor. The potential targets that can be used for detecting genes involved in Alzheimer's disease include Tau, CRF, CRF receptor, CRF-BP,
Urocortin, and neuronal growth factors (e.g. BDNF, NT3, NT4, NT5, CNTF, and GDNF).
The potential targets that can be used for detecting genes involved in Parkinson's disease includes TH, and α-synuclein.
5.3.3. FUNCTIONS Where identification of genes associated with various physiological functions is desired, an assay may be employed which can detect changes in such functions as cell growth, apoptosis, senescence, differentiation, adhesion, binding of a cell to a specific molecule, binding of a cell to another cell, cellular organization, organogenesis, intracellular transport, transport facilitation, protein synthesis, transcription, energy conversion, metabolism, myogenesis, neurogenesis, or hematopoiesis. Examples of such cellular physiological functions and assays for detecting changes in them include, but are not limited to: cholesterol transport, detectable by detecting intracellular cholesterol accumulation; myogenesis, detectable by detecting induction of MyoD or MEF-2; neurogenesis, detectable by detecting induction of neuro D; and vasodilation and neurotransmission, detectable by induction of inducible nitric oxide synthase (iNOS).
- 36 - A number of specific exemplary assays which may be used to identify genes in conjunction with the methods of the invention are set forth in detail below.
5.4. CELLULAR READOUT ASSAYS 5 5.4.1. PROLIFERATION PATHWAY
Bromodeoxyuridine (BRDU) incorporation may be used as an assay to identify genes involved in proliferation. The BRDU assay identifies a cell population undergoing DNA synthesis by incorporation of BRDU into newly-synthesized DNA. Newly- synthesized DNA may then be detected using an anti-BRDU antibody (see Hoshino et al., 1° 1986, Int. J. Cancer 38, 369; Campana et al., 1988, J. Immunol. Mem. 107, 79).
A proliferating cell nuclear antigen (PCNA) assay may also be used to identify genes involved in cell proliferation. PCNA (a.ka. cyclin or the polymerase d associated protein) is a 36 kilodalton protein whose expression is elevated in proliferating cells. PCNA is synthesized in early Gl and S phases of the cell cycle and therefore serves as an excellent marker for proliferating cells. Positive cells are identified by immunostaining using an anti-PCNA antibody (see Li et al, 1996, Current Biology 6, 189; Vassilev et al, 1995, J. Cell Sci. 108, 1205).
5.4.2. STRESS SIGNALING PATHWAY
20 p53 is an important modulator of the stress response. p53-dependent transcriptional activation may therefore be used to identify genes involved in a stress signaling pathway. A readout cell population containing a reporter gene under the control of a p53 -inducible promoter may be used for the assay. Suitable reporter genes include, but are not limited to, β-galactosidase (β-gal), chloramphenicol acetyltransferase (CAT), and luciferase. Positive
25 cells may be identified by blue color in a β-gal reporter gene assay (see e.g. Komarova et al, 1991, EMBO J. 16, 1391-1400) or by immunostaining for the reporter gene product. A p53 induction assay may also be used to identify genes involved in a stress signaling pathway. p53 induction (i.e. increases in cellular p53 protein expression) may be identified by immunostaining using a specific anti-p53 antibody (Anker et al, 1993, Int. J. Cancer 55, 30 982; Weiss et al, 1993, Int. J. Cancer 54, 693).
A heat shock transcription factor 3 (HSF3) aggregation assay may also be used to identify genes in a stress signaling pathway. The HSF3 aggregation assay measures HSF3
- 37 - aggregation in the nucleus induced by cellular stress signals through immunostaining using a specific anti-HSF3 antibody (Kanei-Ishii et al, 1997, Science 277, 246).
An activated c-Jun kinase assay may be used to identify genes in a stress signaling pathway. c-Jun kinase (JNK) is a transcription factor which is activated by phosphorylation 5 (p-JNK). Many stress signals result in activation of c-Jun kinase by phosphorylation (Derijard et al, 1994, Cell 76, 1025). The availability of p-JNK specific antibodies (Santa Cruz) allows in situ detection of cells in which JNK is activated by heterologous library genes.
° 5.4.3. LOSS OF INVASIVENESS
Invasion inhibition assays may be used to identify genes involved in cancer. One such assay measures induction of E-cadherin-mediated cell-cell adhesion. The induction of E-cadherin-mediated adhesion can result in phenotypic reversion and loss of invasiveness of epithelial cells. This assay measures increased expression of E-cadherin at the cell junction through immunostaining using a specific anti-E-cadherin antibody (Hordijk et al, 1997, Science 278, 1464). Another such assay measures loss of hepatocyte growth factor (HGF)- induced cell scattering. Loss of HGF-induced cell scattering is correlated with loss of invasiveness of epithelial cells such as Madin-Darby canine kidney (MDCK) cells. This assay identifies a cell population which has lost cell scattering activity in response to HGF 0 and therefore forms compact colonies (Hordijk et al, 1997, Science 278, 1464).
5.4.4. APOPTOSIS SIGNALING PATHWAY
One assay for apoptosis is the terminal deoxynucleotidyl transferase-mediated dUTP nick-end-labeling (TUNEL) assay. The TUNEL assay is used to measure nuclear DNA 5 fragmentation, the hallmark of apoptosis in many cell types (see e.g. Lazebnik et al, 1994,
Nature 371, 346), by following the incorporation of fluorescein-dUTP (Yonehara et al, 1989, J. Exp. Med. 169, 1747). These assay kits are commercially available through suppliers such as Clontech and Boehringer Mannheim.
0 5.4.5. ANTI-PROLIFERATION PATHWAY
One assay useful for gene identification in an anti-proliferation signaling pathway is the pi 5 induction assay, pi 5 is a member of a family of specific inhibitors of Cdk4 and
- 38 - Cdk6. The latter are essential for Gl progression into S phase of the cell cycle (Sherr et al, 1995, Genes & Dev. 9, 1149). The expression of pl5 is positively regulated by transforming growth factor-β (Reynisdottir et al, 1997, Genes & Dev. 11, 492). pi 5 induction may be identified by immunostaining using a specific anti-pl5 antibody available commercially (e.g. Santa Cruz).
Another assay useful for gene identification in an anti-proliferation signaling pathway is the p21 induction assay. Increased levels of p21 expression in cells results in Cdk inhibition, thus resulting in delayed entry into Gl of the cell cycle (Harper et al, 1993, Cell 75, 805; Li et al, 1996, Current Biology 6, 189). For example, p21 expression can be elevated by p53 and transforming growth factor-β activities. p21 induction may be identified by immunostaining using a specific anti-p21 antibody available commercially (e.g. Santa Cruz).
Yet another assay useful for gene identification in an anti-proliferation signaling pathway is the p27 induction assay. As for the assays above, p27 is also a member of the r
Cdk inhibitor family of proteins. The expression of p27 is increased upon mitogen withdrawal or contact inhibition (Polyak et al, 1994, Cell 78, 59). p27 induction may be identified by immunostaining using a specific anti-p27 antibody available commercially (e.g. Santa Cruz).
0 5.4.6. WNT SIGNALING PATHWAY
One assay for detection of genes which modulate the Wnt signaling pathway is a β-catenin induction and/or translocation assay. The activation of the Wnt signaling pathway results in an increased expression of β-catenin and the translocation of β-catenin from the cytoplasmic compartment to the nucleus (Kuhl et al, 1997, BioEssays 19, 101). 5
This assay is used to identify cells and/or cell populations in which the expression of β-catenin is increased compared to background levels, and/or in which a change of β-catenin localization occurs, in response to expression of a heterologous gene. Changes in β-catenin expression or localization are detected using a specific anti-β-catenin antibody
(e.g. Tao et al, 1996, J. Cell Biol. 134, 1271). 0 . . .
Another assay for detection of genes which modulate the Wnt signaling pathway is a
LEF-1 inducible promoter induction assay, β-catenin activates downstream targets in the Wnt signaling pathway by binding to a transcription factor known as LEF-1, thus resulting
- 39 - in activation of a LEF-1 inducible promoter (Korinek et al, 1997, Science 275, 1785). A readout cell line containing a reporter gene, such as β-gal, under a LEF-1 inducible promoter is used for the assay. When β-gal is used as a reporter gene, positive cells are the darker blue cells. 5
5.4.7. STAT SIGNALING PATHWAY
The STAT (signal transducers and activators of transcription) signaling pathway is activated by many growth factors and cytokines and plays essential roles in cell differentiation, cell cycle control, and development. There are six known members of the
10 STAT transcription factor family. Each STAT family member (except STAT2) is known to recognize a specific DNA binding sequence (Ihle, 1996, Cell 84, 331). The assay employs a readout cell line containing a reporter gene, such as β-gal, under the control of any of these known STAT-inducible promoters (White et al, 1996, Cytokine Growth Factor Rev. 7, 303). Positive cells stain dark blue when β-gal is used as the reporter gene. This assay may
I5 be used to identify genes in a STATl signaling pathway, a STAT3 signaling pathway, a STAT4 signaling pathway, and/or a STAT5/STAT6 signaling pathway. Since STAT5 and STAT6 share the same DNA recognition site, the assay does not distinguish between these two STAT pathways. Readout cells expressing a gene which activates a particular STAT transcription factor will produce a positive signal. Accordingly, the genes identified reside
20 just upstream in the particular STAT pathway assayed.
5.4.8. MAP KINASE SIGNALING PATHWAY
MAP kinase signaling pathway genes may be identified using a p-ERK assay. The activation of this signal transduction pathway by certain growth factors, hormones and
25 neurotransmitters is mediated through two closely-related MAP kinases, p44 and p42, also known as ERKl and ERK2. ERK proteins are activated by dual phosphorylation at specific tyrosine and threonine sites. The p-ERK assay is used to identify genes by immunostaining readout cells with an antibody which specifically detects the presence of phosphorylated
ERK (p-ERK). Such p-ERK antibodies, which only recognize phosphorylated ERKl and
30
ERK2, may be obtained commercially (e.g. Santa Cruz). See Boulton et al, 1991, Cell 65,
663.
40 5.4.9. AP-1 SIGNALING PATHWAY
Genes in an AP-1 signaling pathway may be identified using a c-fos induction readout assay. The AP-1 signaling pathway is involved in cell proliferation, cell survival and cell stress. Activation of the AP-1 signaling pathway results in an increased expression of genes under the control of an AP-1 promoter sequence such as the c-fos gene (see e.g.
Karin et al, 1997, Curr. Opin. Cell Biol. 9, 240). The c-fos induction assay identifies genes expressed in cell populations in which the level of endogenous c-fos protein is increased by immunostaining c-fos using a specific anti-c-fos antibody (Telford et al, 1996, J. Comp.
Neurol. 375, 601).
5.4.10. CREB SIGNALING PATHWAY
In one embodiment, genes in a cyclic adenosine monophosphate response element binding protein (CREB) signaling pathway may be identified using a phosphorylated CREB (p-CREB) readout assay. CREB is activated by phosphorylation following an increase in the intracellular concentration of cAMP or Ca2+. An antibody which specifically recognizes phosphorylated CREB allows detection of an activated CREB pathway in readout cells (Ginty et al, 1994, Cell 77, 713).
In another embodiment, genes in a CREB signaling pathway may be identified using a cyclic adenosine monophosphate response element (CRE) reporter gene assay. In this assay, a readout cell containing a reporter gene (e.g. β-gal, CAT or luciferase) under the control of the CRE is used for the assay. Positive cells may be identified by, e.g., blue staining in a β-gal assay (Himmler et al, 1993, J. Recept. Res. 13, 79; Kruger et al, 1997,
Naunyn Schmiedebergs Arch. Pharmacol. 356, 433) or by immunostaining for the reporter gene product.
5.4.11. NFKB SIGNALING PATHWAY
In one embodiment, an NFKB translocation assay may be used to identify genes in an NFKB signaling pathway. Activation of the NFKB signaling pathway results in translocation of NFKB from the cytoplasm to the nucleus. The NFKB translocation assay identifies cells with NFKB translocated to the nucleus by immunostaining for NFKB using a specific anti-NFκB antibody (Han et al, 1997, J. Biol. Chem. 272, 9825; Janssen et al, 1995, Adv. Cancer Res. 151, 389).
- 41 - In another embodiment, an NFKB reporter gene assay may be used to identify genes in an NFKB signaling pathway. In this assay, a readout cell containing a reporter gene (e.g. β-gal, CAT or luciferase) under the control of an NFKB response element is used for the assay. Positive cells may be identified, e.g., by blue staining in a β-gal assay or by immunostaining for the reporter gene product (Rothe et al, 1995, Science 269, 1424).
5.4.12. NFAT SIGNALING PATHWAY
Genes in an NFAT signaling pathway may be identified using a NFAT reporter gene assay. In this assay, a readout cell expressing a reporter gene (e.g. β-gal, CAT or luciferase) under the control of an NFAT response element is used. Positive clones may be identified by blue staining in a β-gal assay (see e.g. Burres et al, 1995, J. Antibiot. 48, 380) or by immunostaining for the reporter gene product.
5.4.13. INSULIN SIGNALING PATHWAY
Genes in the insulin signaling pathway may be identified using a GLU4 translocation assay. Insulin stimulation of adipocytes results in translocation of the GLU4 glucose transporter to the plasma membrane. This assay identifies cells in which the insulin signaling pathway is activated by immunostaining GLU4 protein localized at the plasma membrane (Martin et al, 1996, J. Biol. Chem. 271, 17605). 0
5.4.14. MDR SIGNALING PATHWAY
Genes in the multiple drug resistance (MDR) gene regulation pathway may be identified using an MDR reporter gene assay. MDR gene expression is often greatly increased in cancer cells resistant to chemotherapy. In this assay, a readout cell containing a 5 reporter gene (e.g. β-gal, CAT or luciferase) under the control of an MDR gene promoter may be used for the assay. Positive cells may be identified by blue staining in a β-gal assay
(Walther et al, 1997, Gene Ther. 4, 544) or by immunostaining using an antibody specific for the reporter gene product.
° 5.4.15. CHOLESTEROL TRANSPORT PATHWAY
Genes important in a cholesterol transport pathway may be identified using an intracellular cholesterol accumulation assay. For example, mutations of the Niemann-Pick
- 42 - type C (NP-C) gene result in lysosomal accumulation of low density lipoprotein (LDL)-derived cholesterol. The accumulated cholesterol in the cytoplasm is detected by staining with filipin, a specific cytochemical marker of unesterified cholesterol. The filipin staining assay may be used to identify cells with cholesterol accumulation due to the expression an exogenous sense or anti-sense cDNA (see Eugene et al, 1991, Science 277, 228).
5.5. BIOCHEMICAL READOUT ASSAYS
In the practice of this invention, biochemical readout assays may be used to identify genes modifying specific activities following in vitro transcription and translation. Such biochemical readout assays include, but are not limited to, enzymatic and receptor-based assays. There are a wide variety of assays for enzymatic activities and receptor-binding activities which may be adapted for use in identification of new genes upon screening a library of interest, as further exemplified in this Section below.
There are many resources available describing such enzymatic and receptor-based assays suitable for use with the methods of the invention. For example, Methods in Enzymology is a multi-volume reference published by Academic Press which describes biological methods, including enzymatic and receptor-based assays, in detail. Further,
Fernandez-Botran and Vetvicka, 1995, Methods in Cellular Immunology, CRC Press, describes assays for immune cell activation, including cytokine receptor assays.
Biochemical readout assays may include, e.g. , detection of: GAB A receptor activity, glutamate receptor activity, monoamine oxidase activity, nitric oxide synthetase activity, opiate receptor activity, serotonin receptor activity, adenosine A! agonist and antagonist activity, adrenergic α„ α2, β; agonist and antagonist activity, calcium channel blocker activity, inflammatory mediator activity, such as the interleukins (e.g. IL-1, IL-6), tumor necrosis factor activity, arachidonic acid activity and phosphatase activity (e.g. tyrosine phosphatase). Further, biochemical readout assays may include, for example, binding to protein domain or subdomain, for example, a PDZ domain, a PH domain, an SH2 domain, and an SH3 domain. Still further, biochemical readout assays may include binding to a molecule, for example, phosphotyrosine and phosphorylated inositol. A functional assignment given to a particular gene may be derived from results obtained in more than one assay. Indeed, it is preferred that a functional assignment be derived from results
- 43 - obtained in a panel of two or more assays. Generally, one skilled in the art would know which assays are appropriate to employ to best identify genes having, or modifying, a particular function-of-interest.
Further specific examples of assays based on enzymes or receptors include the following: acetylcholinesterase; aldol-reductase; angiotensin converting enzyme (ACE); cyclooxygenases; DNA repair; β-glucuronidase; lipoxygenases; monoamine oxidases; phosphohpase A2, platelet activating factor (PAF); potassium channel assays; prostaglandin synthetase; serotonin re-uptake activity; and steroid receptors. Additional assays may include: ATPase inhibition, benzopyrene hydroxylase inhibition, HMG-CoA reductase inhibition, phosphodiesterase inhibition, protease inhibition, and tyrosine kinase inhibition.
5.6. USER-DEFINED ASSAYS
The methods of the invention are not limited to use with the readout assays described herein. Such readout assays merely serve to exemplify a few of the myriad possibilities suitable for use with the invention. When the readout assay is a cellular readout assay, virtually any cell line identified as suitable by one skilled in the art may be used. Further, virtually any reporter gene, or endogenous gene functioning as a reporter gene, identified as suitable by one skilled in the art may be used. It will be well noted by one skilled in the art that the methods of the invention are suitable for use with any known readout assay, whether the assay be cellular or biochemical.
The skilled practitioner will recognize that it is the particular readout assay, whether chosen from the literature or designed by the user, which determines the type (i.e. function) of genes identified. For example, if one wishes to identify genes associated with cancer, one may choose to screen the library of interest using the p53 and or MDR assays described above. Often, the user will provide the most appropriate readout assay to be employed for identification of particular genes-of-interest.
- 44 5.7. AUTOMATION
It is preferred that automation technology be applied throughout the entire functional gene identification process. Many steps in the overall process are amenable to such automation. For example, robotic colony picking may be used for building a library of 106 clones from plates containing well-isolated colonies. Robots suitable for this purpose are available commercially from, e.g., Qiagen, Gentix, etc. Similarly, transfection of retro viral vectors into producer cells, and in situ transfection of bar-coded, sorted libraries into readout cells, are repetitive operations suitable for robotic automation. Further, the system is suitable for automated immunostaining of the co-culture, and to automated microscopic viewing of the immunostained result. Only one population of bar codes is needed for all screenings and the same nucleic acid array can be used repeatedly. Automation can be applied to hybridization to an array such that the same hybridization conditions are used for various libraries. Automation can also be applied to in situ transfections and in situ bioassays.
6. EXAMPLES
The following examples are provided merely as illustrative of several embodiments and should not be construed to limit in any way the invention.
6.1. ASSAYS FOR CELLULAR PROLIFERATION
In the proliferation/anti-proliferation assays described in this example, the function of genes identified will depend on the type of library screened. For example, if a sense cDNA library is screened, genes associated with proliferation will be identified. By contrast, if an antisense cDNA library is screened, genes associated with anti -proliferation will be identified.
In one assay, PCNA immunostaining is performed using readout cells that are starved for at least 24 hours in low serum (e.g. 0.5%) medium prior to tetracycline induction. After 12-24 hours of tetracycline treatment, the cells are fixed in cold methanol
(e.g. 5 minutes at -20 °C) and air dried. Cells are blocked with 10% normal goat serum in phosphate buffered saline (PBS) for 30 minutes at 37°C. This is followed by incubation with a 1:10 dilution of anti-PCNA antibody (e.g. PC 10 antibody, Dakkopat) for 2 hours at
37 °C. Cells are rinsed with PBS (e.g. three times for 5 minutes each) and incubated with
- 45 - goat anti-mouse IgG antibody conjugated with fluorescein isothiocyanate (FITC; NBL, Nagoya) for 45 minutes at 37 °C. After washing, cells are mounted in mounting medium (e.g. 3% hydroquinone, 50%> glycerin, pH 8-9). Observation (i.e. readout visualization) is performed using a fluorescence microscope (Zeiss, Germany). 5 In another assay, BRDU incorporation is performed using readout cells similarly starved as above. 10 μM 5-bromo-2'-deoxyuridine (BRDU) may be added at the time of tet induction, or a few hours later (e.g. 2-12 hours). After 12-24 hours of induction, the readout cells are fixed (e.g. absolute methanol for 10 minutes at 4°C) and air dried. Cells are next rehydrated (e.g. PBS for 3 minutes), DNA is denatured (e.g. 2 M HC1 for 60 minutes at 10 37°C), the preparation is neutralized (e.g. 0.1 M borate buffer, pH 8.5, 2X for 5 minutes) and washed (e.g. PBS 3X over 10 minutes). The readout cell preparation is incubated with anti-BRDU antibody (e.g. 50 μg/ml for 60 minutes at room temperature; Boehringer Mannheim), washed (e.g. PBS 3X over 10 minutes), and counterstained with Harris- modified hematoxylin. The preparation is then dehydrated and mounted for observation (i.e. visualization of readout cells staining positive with anti-BRDU antibody).
6.2. ASSAYS FOR p53 REGULATION
An assay for p53 induction may be used to identify genes associated with p53 expression. One or more p53 inducer genes may be identified if a sense cDNA library is
20 screened. On the other hand, one or more p53 inhibitor genes may be identified if an antisense library is used. The p53 assays may be conducted by measuring endogenous p53 levels or by measuring levels of a reporter gene operably linked to a p53 promoter, as described below.
For an assay for p53 induction in which endogenous p53 expression is activated,
25 readout cells are treated with tetracycline for 12-24 hours, to induce library gene expression, prior to anti-p53 antibody staining for endogenous p53 expression. Cells are fixed with 1 : 1 acetone:methanol at -20°C for 10 minutes and air dried. This is followed by blocking with
3%> BSA in PBS for 30 minutes. Monoclonal antibody Pab 1801 (Novocastra, Newcastle,
UK) which recognizes both wild-type and mutant p53, may be used at a dilution of 1 :25 for
30 a 1 hour incubation. After washing with PBS three times over 10 minutes, FITC conjugated anti-mouse IgG antibody (Cappel) may be used for detection.
46 For an assay of p53 induction using a reporter gene, a p53 promoter operably linked to a β-gal reporter gene may be used. Readout cells containing the β-gal gene under the direction of the p53 promoter can be obtained, e.g., from transgenic mice (see Komarova et al, 1997, EMBO J. 16, 1391-1400) or by establishing a stable cell line expressing a reporter gene under control of the p53 promoter. Such readout cells are induced with tetracycline, fixed with 1%> glutaraldehyde in PBS, washed three times with PBS, and stained in 0.2%o X- gal, 3.3 mM K4Fe(CN)6, 3.3 mM K3Fe(CN)6 and 1 mM MgCl2 for one or more hours.
Positive cells are detected by the characteristic blue color which develops from the β-gal staining.
6.3. HSF INTRACELLULAR TRANSLOCATION ASSAY
A heat shock transcription factor (HSF) intracellular translocation assay may be used to identify genes which are associated with transport of HSF from the cytoplasm to the nucleus. One or more HSF transport inducer genes may be identified if a sense library is screened. Alternatively, one or more genes associated with inhibition of HSF transport may be identified if an antisense library is screened. Readout cells may be fixed with either absolute methanol or 4%> paraformaldehyde. After blocking with 10% normal goat serum in PBS, cells are incubated with a 1:300 dilution of anti-HSF3 in 10%> normal goat serum in
PBS, followed by FITC-conjugated goat anti-rabbit IgG antibody (1 :200 dilution) (Cappel).
The preparation may be washed and mounted as described above.
6.4. CHOLESTEROL TRANSPORT ASSAY
A filipin staining assay may be used to identify genes associated with blocking cholesterol transport when screening a sense cDNA library or to identify genes associated with facilitating cholesterol transport when screening an antisense cDNA library. The assay is based on the principle that filipin can specifically stain unesterified cholesterol located inside of cells (Carstea et al, 1997, Science 277, 228). The presence of large amounts unesterified cholesterol inside of cells indicates breakdown of the cholesterol transport pathway.
After tetracycline induction, readout cells are washed three times with Dulbecco's
PBS and fixed with 10% phosphate-buffered (pH 7.4) formalin at room temperature for a minimum of 1 hour. Cells are rinsed three times with Dulbecco's PBS before they are
- 47 - stained in filipin (Sigma) for 60 minutes. The filipin staining solution is prepared by dissolving 2.5 mg of filipin in 1 ml of DMSO, which is then added to 50 ml of Dulbecco's PBS. The stained cells are washed three times with Dulbecco's PBS and mounted with glycerol/gelatin containing 1%> phenol. 5
6.5. LIBRARY MUTATIONAL ANALYSIS OF A SINGLE GENE
This invention provides a high throughput method for structure- function analysis of a particular protein using random mutagenesis of a single gene of interest and the functional screening methods described herein. In this embodiment, a library containing randomly
1 mutagenized recombinants from a gene are obtained using, e.g., PCR with two primers framing the DNA region to be mutagenized under conditions of reduced Taq polymerase fidelity (see e.g. Rice et al, 1992, Proc. Natl. Acad. Sci. U.S.A. 89:5467; Leung et al, 1989, Technique 1 :11). The mutagenized library may also be a deletion library which can be obtained through inverse PCR using 5'-truncated primers (Pues et al, 1997, Nucl. Acids
I5 Res. 25:1303).
6.6. CONSTRUCTION OF A BIOLOGICAL ARRAY VECTOR
In one embodiment, when a bar-coded cDNA library is in double stranded form
(except for the genetic bar code region), there is no limitation on the type of vector which
20 may be used to construct the biological arrays. However, when a bar-coded cDNA library is in single stranded form, only vectors which share no homology with the cDNA library vector should be used to preclude vector hybridization outside of the bar code region. For example, if the cDNA library plasmid is pBR322 and contains an ampicillin resistance gene, suitable vectors for construction of biological arrays include plasmid Ml 3 or pACYC
25
(available from New England BioLabs) with deletion of ampicillin resistance gene. If the microbial used to construct a biological array is yeast, a yeast vector such as Yep24, Yip5, etc. (available from New England BioLabs) may be used.
30
48 6.7. MANUALLY-SORTED cDNAs AND OLIGONUCLEOTIDES FOR USE IN IN SITU TRANSFECTION AND CELLULAR READOUT ASSAYS
Described hereinabove is the use of nucleic acid arrays for sorting bar-coded cDNA libraries. In another embodiment, this invention provides a method for manually sorting cDNAs and oligonucleotides for screening using the in situ transfection procedures and cellular readout assays described herein. Such manual sorting has the advantage of not requiring a bar-coded vector. Manual sorting may be carried out by mechanical spotting of individual cDNAs onto a solid support (e.g. nitrocellulose or nylon). Such a manually- sorted cDNA population can be considered to be another form of a nucleic acid array. Such an array can also be used in the in situ transfection and cellular readout assays described herein, so long as the cDNA is cloned into an expression vector which is capable of expressing either sense or antisense cDNA (as desired by the user) when transfected into readout cells. In this way, a manually-sorted nucleic acid array can be used to analyze a collection of full-length genes-of-interest from any given source.
A manually-sorted single-stranded oligonucleotide array can also be constructed and used for the in situ transfection procedures and cellular readout assays described herein to identify a particular oligonucleotide which is most effective in manifesting a biological function-of-interest (e.g. antisense inhibition of oncogene expression). Such a manually- sorted oligonucleotide array may be obtained through mechanical spotting of individual oligonucleotides onto a solid support (e.g. nitrocellulose or nylon). Such an approach may be an effective way for identifying an optimal antisense oligonucleotide from among a population of antisense oligonucleotides which is effective in altering the expression of a particular target gene, such as the ras oncogene.
49 The invention described and claimed herein is not to be limited in scope by the specific embodiments herein disclosed since these embodiments are intended as illustration of several aspects of the invention. Any equivalent embodiments are intended to be within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. Throughout this application various publications and patents are cited.
Their contents are hereby incorporated by reference into the present application in their entireties.
- 50 -

Claims

We claim:
1. A method for conducting a biological readout assay used to screen a bar- coded cDNA library comprising: (a) sorting the bar-coded cDNA library using a nucleic acid array;
(b) transfecting the library sorted in step (a) into a readout cell line in situ; and
(c) conducting the biological readout assay.
2. The method of Claim 1, wherein the nucleic acid array is a biological array or a gene chip.
3. The method of Claim 2, wherein the biological array comprises a population of vectors, each vector containing a different bar code complementary to a bar code of the cDNA library to form a population of complementary bar codes, wherein the population of vectors is immobilized on a support.
4. The method of Claim 3, wherein the population of complementary bar codes consists of from 102 to 108 complementary bar codes.
5. The method of Claim 3, wherein the support is formed of nitrocellulose or nylon.
6. The method of Claim 1, wherein transfecting in situ is carried out using a chemical transfectant or electroporation.
7. The method of Claim 1, wherein the readout cell line is NIH 3T3 cells carrying a reporter gene under the control of a response element or promoter.
8. The method of Claim 7, wherein the reporter gene is selected from the group consisting of ╬▓-galactosidase, luciferase and chloramphenicol acetyltransferase.
51
9. The method of Claim 7, wherein the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1 -inducible promoter and a p53-inducible promoter.
10. The method of Claim 1 , wherein expression of the bar-coded cDNA library is tetracycline inducible or estrogen inducible.
11. The method of Claim 1 , wherein the biological readout assay is capable of detecting genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
12. A method for conducting a biological readout assay used to screen a bar- coded cDNA library comprising:
(a) sorting the bar-coded cDNA library using a nucleic acid array having a plurality of concave loci;
(b) expressing the bar-coded cDNA library sorted in step (a) using in vitro transcription and translation to produce a population of proteins; and
(c) screening the population of proteins produced in step (b) for an activity-of- interest, so as to conduct the biological readout assay.
13. The method of Claim 12, wherein the activity-of-interest screened in step (c) is selected from the group consisting of a receptor-binding activity, a ligand-binding activity and a growth factor activity.
14. The method of Claim 12, wherein screening is carried out by immobilizing the population of proteins on a solid support and conducting a binding assay with the immobilized population of proteins.
- 52 -
15. The method of Claim 14, wherein the solid support is formed of nitrocellulose or nylon.
16. The method of Claim 12, wherein screening is carried out by placing the population of proteins in contact with readout cells and conducting a biological activity assay.
17. A method for identifying one or more genes-of-interest in a pre-sorted cDNA library comprising: (a) transfecting the pre-sorted cDNA library into a population of readout cells; and
(b) screening the population of transfected readout cells in a biological readout assay, to identify one or more genes-of-interest.
18. The method of Claim 17, wherein the pre-sorted cDNA library comprises a bar-coded cDNA library hybridized to a nucleic acid array.
19. The method of Claim 17, wherein transfecting is carried out using chemical transfectants or electroporation.
20. The method of Claim 17, wherein the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
21. A method of expression cloning one or more genes-of-interest in a cDNA library comprising:
(a) sorting the cDNA library;
(b) transfecting the sorted library into a readout cell line; and
- 53 - (c) identifying a positive signal from the transfected library in a biological readout assay, so as to expression clone one or more genes-of-interest in the cDNA library.
22. The method of Claim 21 , wherein sorting the cDNA library is carried out using a nucleic acid array.
23. The method of Claim 21 , wherein transfecting the sorted library is carried out using chemical transfectants or electroporation.
24. The method of Claim 21 , wherein the positive signal is identified by immunocytochemistry.
25. The method of Claim 21 , wherein the biological readout assay identifies one or more genes-of-interest in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
26. A method of sorting a cDNA library for use in an expression cloning assay comprising:
(a) cloning a population of cDNA inserts into a population of bar-coded vectors;
(b) preparing the population of bar-coded vectors for hybridization to a
DNA array by exposing only the bar code region in single-stranded form; and
(c) hybridizing the population of bar-coded vectors to a nucleic acid array to sort the cDNA library.
27. The method of Claim 26, wherein the nucleic acid array is selected from the group consisting of a gene chip and a biological array.
- 54 -
28. The method of Claim 26, wherein preparing the population of bar-coded vectors for hybridization to a DNA array by exposing only the bar code region in single- stranded form in step (b) is carried out by a method comprising the following steps in the order stated: (a) digesting the population with a restriction endonuclease to linearize the population;
(b) binding a DNA-binding protein to at least two sites on the population; and
(c) digesting the population bound in step (b) to expose the single- stranded bar code region.
29. The method of Claim 28, wherein the DNA-binding protein is selected from the group consisting of a lactose repressor protein, a tetracycline repressor protein, E2F,
APl, SPl and p53.
30. The method of Claim 28, wherein the restriction endonuclease is selected from the group consisting of NotI, Sfil and EcoRI.
31. The method of Claim 28, wherein digesting the vector population in step (c) i-s carried out using an enzyme selected from the group consisting of exonuclease III, T4
DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase and Pfu DNA polymerase.
32. A method for conducting a biological readout assay used to screen a bar- coded cDNA library comprising:
(a) sorting the bar-coded cDNA library using a nucleic acid array having a plurality of concave loci;
(b) contacting the nucleic acid array to a readout cell line in the presence of a solution which facilitates release of the bar-coded cDNA library from the nucleic acid array to carry out in situ transfection; and
(c) conducting the biological readout assay.
55
33. The method of Claim 32, wherein the nucleic acid array is a biological array or a gene chip.
34. The method of Claim 33, wherein the biological array comprises a population of vectors, each vector containing a different bar code complementary to a bar code of the cDNA library to form a population of complementary bar codes, wherein the population of vectors is immobilized on a support.
35. The method of Claim 34, wherein the population of complementary bar codes consists of from 102 to 108 complementary bar codes.
36. The method of Claim 34, wherein the support is formed of nitrocellulose or nylon.
37. The method of Claim 32, wherein transfecting in situ is carried out using a chemical transfectant or electroporation.
38. The method of Claim 32, wherein the readout cell line is NIH 3T3 cells carrying a reporter gene under the control of a response element or promoter.
39. The method of Claim 38, wherein the reporter gene is selected from the group consisting of ╬▓-galactosidase, luciferase and chloramphenicol acetyltransferase.
40. The method of Claim 38, wherein the response element or promoter is selected from the group consisting of an NFKB response element, an NFAT response element, a cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1 -inducible promoter and a p53-inducible promoter.
41. The method of Claim 32, wherein expression of the bar-coded cDNA library . is tetracycline inducible or estrogen inducible.
56 -
42. The method of Claim 32, wherein the biological readout assay is capable of detecting genes in a pathway selected from the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFKB signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway.
57
PCT/US1999/008823 1998-04-24 1999-04-21 Function-based gene discovery WO1999055886A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU35727/99A AU3572799A (en) 1998-04-24 1999-04-21 Function-based gene discovery

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6577598A 1998-04-24 1998-04-24
US09/065,775 1998-04-24

Publications (1)

Publication Number Publication Date
WO1999055886A1 true WO1999055886A1 (en) 1999-11-04

Family

ID=22065021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/008823 WO1999055886A1 (en) 1998-04-24 1999-04-21 Function-based gene discovery

Country Status (2)

Country Link
AU (1) AU3572799A (en)
WO (1) WO1999055886A1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001020015A1 (en) * 1999-09-17 2001-03-22 Whitehead Institute For Biomedical Research Reverse transfection method
WO2001032858A1 (en) * 1999-11-05 2001-05-10 Novozymes A/S A high throughput screening (hts) method
DE19963536A1 (en) * 1999-12-20 2001-09-06 Epigenomics Ag Procedure for the analysis of nucleic acid sequences
WO2001098533A2 (en) * 2000-05-31 2001-12-27 Bernauer Hubert S Artificial marking using synthetic dna
WO2002014553A2 (en) * 2000-08-11 2002-02-21 Favrille, Inc. A molecular vector identification system
EP1239038A1 (en) * 2001-03-07 2002-09-11 Galapagos Genomics B.V. High-throughput identification of modulators of e2f activity
WO2002070744A2 (en) * 2001-03-07 2002-09-12 Galapagos Genomics B.V. Adenoviral library assay for e2f regulatory genes and methods and compositions for screening compounds
WO2002072783A2 (en) * 2001-03-12 2002-09-19 Irm, Llc Identification of cellular targets for biologically active molecules
EP1249499A1 (en) * 2001-04-10 2002-10-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for the determination and selection of molecule-molecule interactions
EP1313883A2 (en) * 2000-08-29 2003-05-28 YEDA RESEARCH AND DEVELOPMENT Co. LTD. Methods of isolating genes encoding proteins of specific function and of screening for pharmaceutically active agents
US6652878B2 (en) 2001-09-24 2003-11-25 Corning Incorporated Cell transfection apparatus and methods for making and using the cell transfection apparatus
EP1379642A2 (en) * 2001-03-22 2004-01-14 Whitehead Institute For Biomedical Research Arrayed transfection method and uses related thereto
WO2004101819A1 (en) * 2003-05-13 2004-11-25 Universität Potsdam Method for the identification of cell lineages
US6897067B2 (en) 2000-11-03 2005-05-24 Regents Of The University Of Michigan Surface transfection and expression procedure
US6902933B2 (en) 2000-11-03 2005-06-07 Regents Of The University Of Michigan Surface transfection and expression procedure
WO2005106037A1 (en) * 2004-04-19 2005-11-10 Pioneer Hi-Bred International, Inc. A method for identifying activators of gene transcription
US7056741B2 (en) 2000-11-03 2006-06-06 Regents Of The University Of Michigan Surface transfection and expression procedure
EP1997889A3 (en) * 2003-01-29 2009-09-23 454 Corporation Method for preparing single-stranded dna libraries
US20110021369A1 (en) * 2007-09-12 2011-01-27 Institut Pasteur Single cell based reporter assay to monitor gene expression patterns with high spatio-temporal resolution
US7993925B2 (en) 2005-05-31 2011-08-09 Cold Spring Harbor Laboratory Methods for producing microRNAs
US8137907B2 (en) 2005-01-03 2012-03-20 Cold Spring Harbor Laboratory Orthotopic and genetically tractable non-human animal model for liver cancer and the uses thereof
EP2436766A1 (en) * 2010-09-29 2012-04-04 Deutsches Krebsforschungszentrum Means and methods for improved protein interaction screening
US8637638B2 (en) 2000-08-11 2014-01-28 Mmrglobal, Inc. Method and composition for altering a B cell mediated pathology
US8945884B2 (en) 2000-12-11 2015-02-03 Life Technologies Corporation Methods and compositions for synthesis of nucleic acid molecules using multiplerecognition sites
US9534252B2 (en) 2003-12-01 2017-01-03 Life Technologies Corporation Nucleic acid molecules containing recombination sites and methods of using the same
EP3822365A1 (en) * 2015-05-11 2021-05-19 Illumina, Inc. Platform for discovery and analysis of therapeutic agents
US11767534B2 (en) * 2011-05-04 2023-09-26 The Broad Institute, Inc. Multiplexed genetic reporter assays and compositions

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4675285A (en) * 1984-09-19 1987-06-23 Genetics Institute, Inc. Method for identification and isolation of DNA encoding a desired protein
EP0534619A2 (en) * 1991-08-26 1993-03-31 Rhode Island Hospital Expression cloning method
US5445934A (en) * 1989-06-07 1995-08-29 Affymax Technologies N.V. Array of oligonucleotides on a solid substrate
US5604097A (en) * 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags
US5654150A (en) * 1995-06-07 1997-08-05 President And Fellows Of Harvard College Method of expression cloning
EP0799897A1 (en) * 1996-04-04 1997-10-08 Affymetrix, Inc. (a California Corporation) Methods and compositions for selecting tag nucleic acids and probe arrays

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4675285A (en) * 1984-09-19 1987-06-23 Genetics Institute, Inc. Method for identification and isolation of DNA encoding a desired protein
US5445934A (en) * 1989-06-07 1995-08-29 Affymax Technologies N.V. Array of oligonucleotides on a solid substrate
EP0534619A2 (en) * 1991-08-26 1993-03-31 Rhode Island Hospital Expression cloning method
US5604097A (en) * 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags
US5654150A (en) * 1995-06-07 1997-08-05 President And Fellows Of Harvard College Method of expression cloning
EP0799897A1 (en) * 1996-04-04 1997-10-08 Affymetrix, Inc. (a California Corporation) Methods and compositions for selecting tag nucleic acids and probe arrays

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHOEMAKER D D, ET AL.: "QUANTITATIVE PHENOTYPIC ANALYSIS OF YEAST DELETION MUTANTS USING A HIGHLY PARALLEL MOLECULAR BAR-CODING STRATEGY", NATURE GENETICS., NATURE PUBLISHING GROUP, NEW YORK, US, vol. 14, 1 December 1996 (1996-12-01), NEW YORK, US, pages 450 - 456, XP002919517, ISSN: 1061-4036, DOI: 10.1038/ng1296-450 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6951757B2 (en) 1999-09-17 2005-10-04 Whitehead Institute For Biomedical Research Transfection method and uses related thereto
WO2001020015A1 (en) * 1999-09-17 2001-03-22 Whitehead Institute For Biomedical Research Reverse transfection method
US6544790B1 (en) 1999-09-17 2003-04-08 Whitehead Institute For Biomedical Research Reverse transfection method
WO2001032858A1 (en) * 1999-11-05 2001-05-10 Novozymes A/S A high throughput screening (hts) method
DE19963536C2 (en) * 1999-12-20 2003-04-10 Epigenomics Ag Procedure for the analysis of nucleic acid sequences
DE19963536A1 (en) * 1999-12-20 2001-09-06 Epigenomics Ag Procedure for the analysis of nucleic acid sequences
WO2001098533A2 (en) * 2000-05-31 2001-12-27 Bernauer Hubert S Artificial marking using synthetic dna
WO2001098533A3 (en) * 2000-05-31 2003-04-10 Hubert S Bernauer Artificial marking using synthetic dna
WO2002014553A2 (en) * 2000-08-11 2002-02-21 Favrille, Inc. A molecular vector identification system
US8637638B2 (en) 2000-08-11 2014-01-28 Mmrglobal, Inc. Method and composition for altering a B cell mediated pathology
WO2002014553A3 (en) * 2000-08-11 2003-02-27 Favrille Inc A molecular vector identification system
US9309520B2 (en) 2000-08-21 2016-04-12 Life Technologies Corporation Methods and compositions for synthesis of nucleic acid molecules using multiple recognition sites
EP1313883A4 (en) * 2000-08-29 2005-01-12 Yeda Res & Dev Methods of isolating genes encoding proteins of specific function and of screening for pharmaceutically active agents
EP1313883A2 (en) * 2000-08-29 2003-05-28 YEDA RESEARCH AND DEVELOPMENT Co. LTD. Methods of isolating genes encoding proteins of specific function and of screening for pharmaceutically active agents
US6897067B2 (en) 2000-11-03 2005-05-24 Regents Of The University Of Michigan Surface transfection and expression procedure
US7056741B2 (en) 2000-11-03 2006-06-06 Regents Of The University Of Michigan Surface transfection and expression procedure
US6902933B2 (en) 2000-11-03 2005-06-07 Regents Of The University Of Michigan Surface transfection and expression procedure
US8945884B2 (en) 2000-12-11 2015-02-03 Life Technologies Corporation Methods and compositions for synthesis of nucleic acid molecules using multiplerecognition sites
WO2002070744A3 (en) * 2001-03-07 2003-10-09 Galapagos Genomics B V Adenoviral library assay for e2f regulatory genes and methods and compositions for screening compounds
EP1239038A1 (en) * 2001-03-07 2002-09-11 Galapagos Genomics B.V. High-throughput identification of modulators of e2f activity
WO2002070744A2 (en) * 2001-03-07 2002-09-12 Galapagos Genomics B.V. Adenoviral library assay for e2f regulatory genes and methods and compositions for screening compounds
WO2002072783A3 (en) * 2001-03-12 2003-04-10 Irm Llc Identification of cellular targets for biologically active molecules
WO2002072783A2 (en) * 2001-03-12 2002-09-19 Irm, Llc Identification of cellular targets for biologically active molecules
EP1379642A4 (en) * 2001-03-22 2006-04-19 Whitehead Biomedical Inst Arrayed transfection method and uses related thereto
EP1379642A2 (en) * 2001-03-22 2004-01-14 Whitehead Institute For Biomedical Research Arrayed transfection method and uses related thereto
EP1249499A1 (en) * 2001-04-10 2002-10-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for the determination and selection of molecule-molecule interactions
WO2002083942A2 (en) * 2001-04-10 2002-10-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Method and device for determining and selecting molecule-molecule interactions
WO2002083942A3 (en) * 2001-04-10 2003-11-20 Fraunhofer Ges Forschung Method and device for determining and selecting molecule-molecule interactions
US6670129B2 (en) 2001-09-24 2003-12-30 Corning Incorporated Cell transfection apparatus and methods for making and using the cell transfection apparatus
US6652878B2 (en) 2001-09-24 2003-11-25 Corning Incorporated Cell transfection apparatus and methods for making and using the cell transfection apparatus
EP1997889A3 (en) * 2003-01-29 2009-09-23 454 Corporation Method for preparing single-stranded dna libraries
WO2004101819A1 (en) * 2003-05-13 2004-11-25 Universität Potsdam Method for the identification of cell lineages
US9534252B2 (en) 2003-12-01 2017-01-03 Life Technologies Corporation Nucleic acid molecules containing recombination sites and methods of using the same
WO2005106037A1 (en) * 2004-04-19 2005-11-10 Pioneer Hi-Bred International, Inc. A method for identifying activators of gene transcription
US8137907B2 (en) 2005-01-03 2012-03-20 Cold Spring Harbor Laboratory Orthotopic and genetically tractable non-human animal model for liver cancer and the uses thereof
US8426675B2 (en) 2005-05-31 2013-04-23 Cold Spring Harbor Laboratory Methods for producing microRNAs
US7993925B2 (en) 2005-05-31 2011-08-09 Cold Spring Harbor Laboratory Methods for producing microRNAs
US20110021369A1 (en) * 2007-09-12 2011-01-27 Institut Pasteur Single cell based reporter assay to monitor gene expression patterns with high spatio-temporal resolution
US9663833B2 (en) * 2007-09-12 2017-05-30 Institut Pasteur Single cell based reporter assay to monitor gene expression patterns with high spatio-temporal resolution
EP2436766A1 (en) * 2010-09-29 2012-04-04 Deutsches Krebsforschungszentrum Means and methods for improved protein interaction screening
WO2012041802A1 (en) * 2010-09-29 2012-04-05 Deutsches Krebsforschungszentrum Means and methods for improved protein interaction screening
US11767534B2 (en) * 2011-05-04 2023-09-26 The Broad Institute, Inc. Multiplexed genetic reporter assays and compositions
EP3822365A1 (en) * 2015-05-11 2021-05-19 Illumina, Inc. Platform for discovery and analysis of therapeutic agents
US11795581B2 (en) 2015-05-11 2023-10-24 Illumina, Inc. Platform for discovery and analysis of therapeutic agents

Also Published As

Publication number Publication date
AU3572799A (en) 1999-11-16

Similar Documents

Publication Publication Date Title
WO1999055886A1 (en) Function-based gene discovery
US11845979B2 (en) Spatial transcriptomics for antigen-receptors
McDonel et al. Approaches for understanding the mechanisms of long noncoding RNA regulation of gene expression
Gilbert Evaluating genome-scale approaches to eukaryotic DNA replication
Hayashi et al. Genome‐wide localization of pre‐RC sites and identification of replication origins in fission yeast
Johnson et al. REST regulates distinct transcriptional networks in embryonic and neural stem cells
US20020142345A1 (en) Methods for encoding and decoding complex mixtures in arrayed assays
EP1761639B1 (en) Analysis of methylated nucleic acid
US20070077571A1 (en) Methods and apparatus for identifying allosterically regulated ribozymes
Dieudonné et al. The effect of heterogeneous Transcription Start Sites (TSS) on the translatome: implications for the mammalian cellular phenotype
US20210301329A1 (en) Single Cell Genetic Analysis
WO1999055826A1 (en) Micro-compartmentalization device and uses thereof
Xue et al. Single cell sequencing: technique, application, and future development
US11946163B2 (en) Methods for measuring and improving CRISPR reagent function
Xia et al. CRISPR-based engineering of gene knockout cells by homology-directed insertion in polyploid Drosophila S2R+ cells
US7125664B2 (en) Method for identifying genes that are upstream regulators of other genes of interest
Boettcher et al. Decoding pooled RNAi screens by means of barcode tiling arrays
Gross et al. Characterization of CRISPR/Cas9 RANKL knockout mesenchymal stem cell clones based on single-cell printing technology and Emulsion Coupling assay as a low-cellularity workflow for single-cell cloning
Brossas et al. Clustering of strong replicators associated with active promoters is sufficient to establish an early‐replicating domain
Hankinson A genetic analysis of processes regulating cytochrome P4501A1 expression
Ishikawa et al. A highly sensitive trap vector system for isolating reporter cells and identification of responsive genes
Yan et al. Breaks labeling in situ and sequencing (BLISS)
Zhang et al. [27] Yeast three-hybrid system to detect and analyze RNA-protein interactions
US20080248958A1 (en) System for pulling out regulatory elements in vitro
Litwin et al. ISW1a modulates cohesin distribution in centromeric and pericentromeric regions

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase