US20100235105A1 - Method for analyzing dynamic detectable events at the single molecule level - Google Patents

Method for analyzing dynamic detectable events at the single molecule level Download PDF

Info

Publication number
US20100235105A1
US20100235105A1 US12/686,372 US68637210A US2010235105A1 US 20100235105 A1 US20100235105 A1 US 20100235105A1 US 68637210 A US68637210 A US 68637210A US 2010235105 A1 US2010235105 A1 US 2010235105A1
Authority
US
United States
Prior art keywords
data
donor
acceptor
event
traces
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/686,372
Inventor
Andrei Volkov
Costa M. Colbert
Ivan Pan
Anelia Kraltcheva
Mitsu Reddy
Nasanshargal Battulga
Michael A. Rea
Keun Woo Lee
Susan H. Hardin
Brent Mulder
Chris Hebel
Alok Bandekar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Life Technologies Corp
Original Assignee
Life Technologies Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/901,872 external-priority patent/US6681682B2/en
Priority claimed from US10/007,621 external-priority patent/US7211414B2/en
Application filed by Life Technologies Corp filed Critical Life Technologies Corp
Priority to US12/686,372 priority Critical patent/US20100235105A1/en
Publication of US20100235105A1 publication Critical patent/US20100235105A1/en
Assigned to VISIGEN BIOTECHNOLOGIES, INC. reassignment VISIGEN BIOTECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LEE, KEUN WOO, BATTULGA, NASANSHARGAL, BANDEKAR, ALOK, HARDIN, SUSAN H., HEBEL, CHRIS, KRALTCHEVA, ANELIA, PAN, IVAN, REDDY, MITSU, VOLKOV, ANDREI, MULDER, BRENT, REA, MICHAEL A., COLBERT, COSTA
Assigned to Life Technologies Corporation reassignment Life Technologies Corporation ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VISIGEN BIOTECHNOLOGIES, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6408Fluorescence; Phosphorescence with measurement of decay time, time resolved fluorescence
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/645Specially adapted constructive features of fluorimeters
    • G01N21/6456Spatial resolved fluorescence measurements; Imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6428Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
    • G01N2021/6439Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks

Definitions

  • the present invention relates to a method for characterizing signals generated from molecular events at the single molecule level, such as donor-acceptor fluorescent resonance energy transfer events, of dynamic systems or static systems over a period of time, where the event data can be collected continuously, periodically, or intermittently and analyzed continuously, periodically or intermittently.
  • the data collection and analysis thus, can be in real time or near real time, while analysis can be any time post collection.
  • a dynamic system means that the data is collected on the system in real time over the period of time as the system undergoes detectable changes in one or more detectable properties
  • a static system means that the data is collected for a given period of time and the system is unchanging during that period of time.
  • the present invention relates to a method for characterizing signals generated from detectable molecular events at single molecule level, where the method includes the steps of collecting and storing data from a viewing field associated with a detector, where the viewing field includes a plurality of molecules or molecular assemblies capable of being detected directly and undergoing a detectable event or a plurality of detectable events, where direct detection involves monitoring at least one detectable property associated with the molecule or molecular assembly and where the detectable events involve interactions associated with or occurring at the molecule or molecular assembly.
  • Data associated with the viewing field is collected into one data channel or a plurality of data channels, where each data channel corresponds to an attribute of the detected events, such as intensity, frequency or wavelength, duration, phase, attenuation, etc.
  • the method also includes the step of reading the stored data and spatially registering or calibrating the data channels so that a given location within the viewing field in one channel corresponds to the same location in the other channels—the data is registered relative to the viewing field.
  • candidate molecules or molecular assemblies are identified.
  • the candidate identification is generally designed to minimize locations within the viewing field that include more than a single directly detected molecule or molecular assembly to simplify data analysis.
  • an nxm array of data elements such as pixels is selected for each candidate so that the array includes all data elements having a detection value above a definable threshold originating from or associated with each candidate such as a definable intensity threshold value.
  • a plurality of “dark” data elements or pixels in an immediate neighborhood of the array associated with each candidate are selected to improve background removal.
  • a hybrid dataset for each candidate is constructed derived from data from two or more data channels.
  • the hybrid dataset is then smoothed and differentiated.
  • non-productive events are separated from productive events based on a set of criteria, where the criteria are dependent on the detectable property and events being detected.
  • the productive events are then placed in time sequence.
  • the method includes determining anti-correlated donor and acceptor fluorescent signals.
  • the criteria are designed to separate binding and mis-incorporation events from true incorporation events, and when placed in time order, evidence a sequence of monomers in a target sequence of monomers.
  • single molecule level means any individual system capable of undergoing detectable chemical or physical events that can be detected and analyzed independently.
  • systems of isolated atoms, molecules, ions, or assemblages of atoms, molecules and/or ions that have a detectable property that changes during a chemical or physical event capable of individual detection and analysis satisfy the definition.
  • Such systems include, without limitation, any isolated reactive system having a detectable property that undergoes as change before, during or after a chemical and/or physical event or reaction.
  • Exemplary examples of such systems including, again without limitation, DNA replication complexes, protein translation complexes, transcription complexes, any other isolated or isolatable biological system, quantum dots, catalysts, cellular sites, tissue sites, domains on chips (groove, lines, channels, pads, etc.), or any other system having a detectable property that undergoes a change before, during and/or after a chemical and/or physical event.
  • DNA replication complexes protein translation complexes
  • transcription complexes any other isolated or isolatable biological system
  • quantum dots catalysts
  • cellular sites tissue sites, domains on chips (groove, lines, channels, pads, etc.)
  • any other isolated single reactive systems simplify analysis, images including overlapping or multiply occupied sites can be analyzed as well, but with greater difficulty.
  • detection at the single molecule level means that chemical events are being detected at the single molecule level.
  • anti-correlated means that changes in a value of a first detected response are opposite to changes in a value of a second detected response.
  • correlated means that changes in a value of a first detected response are coincide (same direction) to changes in a value of a second detected response.
  • data channel or data quadrant means data that has a particular attribute such as data within a given frequency range of light derived from a given detector or imagining system.
  • a quadrant more specifically is terminology relating to a data channel of a particular type of imaging apparatus such as a charge coupled device (CCD) imaging apparatus.
  • CCD charge coupled device
  • sample means an actual sample, which is often disposed on the surface of a treated or untreated surface such as the surface of a cover slip.
  • viewing field or “viewing volume” means the actual portion of the sample that is being observed by the imagining or detecting system. Often this volume is considerably smaller than the actual sample and is dependent on the exact nature of the imagining or detection system being used.
  • frame means an image of the viewing field taken over a short period of time within the imagining or detecting system prior to being outputted to the processing system.
  • the size and time span of the frame depends on the memory, buffering, outputting speed and receiving speed of the imagining system and of the processing system.
  • stack or “stream” means a set of frames.
  • frames from a single slide are collected as a stack of frames or a stream of frames.
  • trace means data for a particular data element or pixel over all the frames in a stack or over a given number of frames in a stack.
  • related data means data from other data channels that are related to data from a selected data channel.
  • the data can be spatially related, temporally related, network related, etc. or related through a combination of these relationship types.
  • data calibration or registration means transforming data in one data channel so all locations within that data channel are matched to corresponding locations in other data channels.
  • assemblage means a collection of atoms, molecules and/or ions to form an isolated or isolatable system.
  • a DNA replication complex is an assemblage and a ribosome translation complex is an assemblage.
  • the collection can be of a single atomic or molecular type (atom clusters, molecular cluster, etc.) or a collection of mixtures of atoms, molecules, and/or ions.
  • Assemblages can also be constructed of assemblages.
  • the main criterion in the definition is that the assemblage be capable of being isolated or formed in an isolated manner so that detectable events occurring at each individual assemblage can be separately detected and analyzes.
  • spot means a location within a viewing field of an imaging apparatus that evidence fluorescent light from one or more atoms, molecules, ions or assemblages.
  • the method have focused on fluorescent light, the method can be applied to any detectable property that corresponds to one or more atoms, molecules, ions or assemblages within a viewing field.
  • the present invention provides a method implemented on a computer for collecting data in real or near real time, at the single molecule level corresponding to detectable chemical and/or physical events and analyzing the collected data to identify the events and classify the events as to their intrinsic nature.
  • the method can be used to collect and analyze data from monomer additions, polymerase extension reactions, protein biosynthesis at ribosomal machinery, (translation reactions), saccharide polymerization reactions, kinase phosphorylation reactions, or any other reaction that involves interactions between atoms, ions, molecules or assemblages having at least one detectable that undergoes a change before, during or after the reaction being monitored.
  • the present invention also provides a method implemented on a computer including the step of collecting data representing values of an attribute or attributes of a detectable property or detectable properties of an atom, an ion, a molecule or an assemblage of atoms, ions and/or molecules or a plurality of atoms, ions, molecules or assemblages of atoms, ions and/or molecules within a viewing volume or field over a period of time.
  • the collected data includes data derived directly from the atom(s), molecule(s) and/or assemblage(s) and data derived from events evidencing interactions between the atom(s), ion(s), molecule(s) or assemblage(s) and other atomic, ionic, molecular, and/or assemblage species or between different parts of the ion(s), molecule(s) or assemblage(s). If the data is collected simultaneously in a plurality of data channels, then after data collection, the data in the data channels are calibrated or registered to align the data within the channels spatially and temporally.
  • data in one data channel After data registration, data in one data channel, often times a primary data channel corresponding to the directly detected data, are scanned and an atom, ion, molecule or assemblage candidate or atom, ion, molecule, or assemblage candidates within the viewing volume or field that meet a set of detection criteria are selected.
  • candidate selection After candidate selection, the candidate data is smoothed, hybridized and differentiated.
  • data from other data channels are scanned and related data are selected from these other channels, where the related data is data that evidences changes in a detectable property or an attribute or attributes thereof spatially, temporally, or otherwise related to the candidate data.
  • the related data is data that evidences changes in a detectable property or an attribute or attributes thereof occurring within a neighborhood of each candidate.
  • Anti-correlation means that changes in the detectable property(ies) of the atom(s), ion(s), molecule(s) or assemblage(s) and opposite changes in the detectable property(ies) of the other atomic, ionic, molecular or assemblage species, such as a reduction in a donor intensity and a corresponding increase in acceptor intensity.
  • the anti-correlated events are classified as relating to one of a set of event types, such as a productive event type, a non-productive event type, a binding event type, a pre-binding event type, a group release event type, a mis-incorporation event type, a complexing event, a transition event, etc.
  • a productive event type such as a productive event type, a non-productive event type, a binding event type, a pre-binding event type, a group release event type, a mis-incorporation event type, a complexing event, a transition event, etc.
  • the classification scheme includes a correct base incorporation event type, a mis-match or incorrect base incorporation event type, a binding event type, a pre-base incorporation event type, a proximity event type, a pyrophosphate release event, etc.
  • the present invention also provides a method implemented on a computer including the step of collecting data including a plurality of data channels representing fluorescent data from a plurality of fluorophores within a viewing volume or field. After data collection, the data within the data channels are calibrated or registered to align the data spatially and temporally, i.e., locations within the viewing field are matched between the channels. After data alignment, the data in a primary channel is scanned for the candidate fluorophores within the viewing volume that meet a set of candidate criteria. For example, if the system is a donor-acceptor system, then the primary channel is the donor channel. After candidate selection, the data associated with each candidate is smoothed, hybridized and differentiated.
  • related data from the other channels are selected, where the related data is data within a neighborhood of each donor candidate that undergoes a change over time.
  • the related data is smoothed, hybridized and differentiated.
  • the candidate and related data are then analyzed together to identify events.
  • the events are then classified. If the system is a donor-acceptor system, the related data is acceptor data and the donor data and the acceptor data are analyzed for anti-correlated events evidence by anti-correlated intensity shifts. After identification of anti-correlated intensity events, the identified anti-correlated events are classified as relating to one of a set of event types, such as a productive binding event, a pre-binding event, a non-productive binding event, etc.
  • the classification scheme includes a correct base incorporation event, a mis-match or incorrect base incorporation event, a non-productive base binding event, a pre-base incorporation event, a proximity event, etc.
  • the present invention provides a system for characterizing events at the single molecule level, including a sample subsystem and optionally an irradiating subsystem for irradiating a sample in the sample subsystem.
  • the system also include a detector subsystem for detecting and collecting data evidencing changes in a detectable property associated with an atom, ion, molecule or assemblage within the sample subsystem or within a region of the sample subsystem.
  • the system also includes a processing subsystem that stores and processes the data collected by the detector.
  • the processing subsystems uses methods of this invention to identify event and to classify the identified events. The classification is then related to aspects of the dynamic system being detected. For DNA, RNA or DNA/RNA hybrid sequencing, the classification permits identification of the base sequence of an unknown nucleic acid molecule.
  • the system collects data in real time, the data processing can occur in real time, near real time or it can be processed later or both.
  • the present invention also provides a system for characterizing donor-accept fluorescent resonance energy transfer events at the single molecule level, including a TIRF or similar sample assembly, a detector system for irradiating the sample assembly with an incident light having a wavelength range designed to excite the donor fluorophores within a sample viewing volume and detecting fluorescent light emitted by emitters within the volume, where the emitters are the donors, acceptors activated by a donor via fluorescent resonance energy transfer (FRET), and background or nor donor/acceptor emitters.
  • the system also includes a processing subsystem that stores and processes the data collected from the detector. The processing subsystems uses methods of this invention to produce a classification of detected fluorescent events. The classification is then related to aspects of the dynamic system being detected. For DNA, RNA or DNA/RNA hybrid sequencing, the classification permits identification of the base sequence of an unknown nucleic acid molecule. Although the system collects data in real time, the data processing can occur in real time or it can be processed later or both.
  • the present invention also provides a method for characterizing signals generated from molecular events at the single molecule level, dNTP or nucleotide incorporation fluorescent resonance energy transfer (dNTPFRET) events at the single molecule level, where the method includes the steps of collecting and storing pixelated data in a plurality of data fluorescent channels of a plurality of dNTPFRET events, reading the stored data, spatially registering or calibrating the data channels, identifying candidate single polymerase/primer/template complexes, selecting an n ⁇ n array of pixels including each identified candidate, selecting a plurality of “dark” pixels in the immediate neighborhood of the pixel array associated with each identified candidate for background removal, constructing a hybrid dataset for each candidate, smoothing the hybrid dataset, differentiating the hybrid dataset, determining anti-correlated donor and acceptor fluorescent events, separating true incorporation event from mis-incorporation and non-productive binding events and identifying one or a plurality of incorporated dNTPs corresponding to sequencing information associated with an unknown nucleic acid sequence.
  • FIG. 1 depicts a graphical illustration of certain of the parameters that are used to define an event.
  • FIG. 2 depicts spot candidates displayed on an overlay picture of the viewing filed, where the accepted candidates are shown as large dots sometimes with gray boxes (green in a color image) and the very faint dots represent candidates rejected by staged filtering (in a color image, blue spots are candidates eliminated by the stage 1 filter and red dots are candidates rejected by the stage 2 and 3 filters).
  • FIG. 2 ′ is a black and white version of FIG. 2 , where ⁇ represents accepted spots, ⁇ represents spots rejected at stages 2 and 3 and represents spots rejected at stage 1 .
  • FIG. 3 a depicts the intensity of the candidate pixel is below 3-na, the candidate is rejected.
  • FIG. 3 a ′ is a black and white version of FIG. 3 a , where + represents the brightest pixel, ⁇ represents background pixels selected for computing c and na, dashed square represent the 7 ⁇ 7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 3 b depicts the intensity of the candidate pixel is equal to or above 3-na, the candidate is accepted.
  • FIG. 3 b ′ is a black and white version of FIG. 3 b , where + represents the brightest pixel, ⁇ represents background pixels selected for computing c and na, dashed square represent the 7 ⁇ 7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 4 depicts a “poor” spot candidate passed through stage 1 filter.
  • FIG. 4 ′ is a black and white version of FIG. 4 , where + represents the brightest pixel, ⁇ represents background pixels selected for computing c and na, dashed square represent the 7 ⁇ 7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 5 depicts stage 2 filter.
  • FIG. 5 ′ is black and white version of FIG. 5 , where the dotted line represents doubt* avgna value and the dashed dotted line represents the minc * avgna value.
  • FIG. 6 a depicts graphically the spot candidate filtering process of the stage 1 filter.
  • FIG. 6 a ′ is black and white version of FIG. 6 a , where + represents the brightest pixel, ⁇ represents background pixels selected for computing c and na, dashed square represent the 7 ⁇ 7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 6 b depicts graphically the spot candidate filtering process of the stage 2 filter.
  • FIG. 6 b ′ is black and white version of FIG. 6 b , where the dotted line represents doubt * avgna value and the dashed dotted line represents the minc * avgna value.
  • FIG. 6 c depicts graphically the spot candidate filtering process of the stage 3 filter.
  • FIG. 6 c ′ is black and white version of FIG. 6 c , where the dashed dotted line represents minc2* avgna value.
  • FIG. 7 a depicts pixel values (9 ⁇ 9 neighborhood) after voting over average donor image.
  • FIG. 7 b depicts selection of single spots in an average donor image after voting.
  • FIG. 7 c depicts snapshot of grouped spots after voting and selection of the donor pixel.
  • FIG. 8 depicts histogram of an average intensity stack image.
  • FIG. 9 depicts donors detected using dynamic threshold and consolidated donors.
  • FIG. 10 a depicts the noise pixel traces are averaged into a single averaged noise trace (top graph), then its polynomial approximation is computed using least squares algorithm, and finally, the value of the polynomial is subtracted from every individual pixel trace.
  • FIG. 10 b depicts the value of the approximating polynomial is subtracted from donor signal pixels (above graph), the result is shown on the below graph—the horizontal line now represents the zero-level (mean of the background noise intensity distribution).
  • FIG. 10 c depicts the noise pixel traces from an acceptor channel are averaged into a single averaged noise trace (top graph), then its polynomial approximation is subtracted from every individual acceptor pixel trace.
  • FIG. 10 d depicts the value of the approximating polynomial is subtracted from acceptor signal pixels (above graph), the result is shown on the below graph—the horizontal line now represents the zero-level (mean of the background noise intensity distribution).
  • FIGS. 10 a ′-d′ are black and white version of FIGS. 10 a - d , where the left panel represents the donor data, the middle panel represents the acceptor 1 data, the right panel represents acceptor 2 data, represent signal pixels, represents noise pixels, the solid square represents the 3 ⁇ 3 pixel area for donor signal pixels and the dashed square represents the 7 ⁇ 7 pixel area for donor noise pixels.
  • FIGS. 11 a - d depicts donor pixel selection.
  • FIGS. 11 a ′-d′ are black and white version of FIGS. 11 a - d , where the top trace in each graph represents the original (non-background subtracted) signal and the bottom trace in each graph represent the signal after background subtraction— 11 a presents the donor noise signals, 11 b presents the donor signal, 11 c presents acceptor noise signals, and 11 d presents the acceptor signals.
  • FIG. 12 depicts the intensity-based donor pixel selection algorithm.
  • FIG. 12 ′ is black and white version of FIG. 12 showing donor pixel selection, where the top panel represents the hybrid trace, the bar right below represents the donor lifetime, the remaining 9 panels represent individual donor pixel traces, and the grayed ones represent pixels rejected by pixel selection process.
  • the symbol represents accepted pixels, the symbol represents rejected pixels, and the + symbol represent noise pixels.
  • FIG. 13 a depicts the intensity-based acceptor pixel selection algorithm.
  • FIG. 13 b depicts the derivative-based acceptor pixel selection algorithm.
  • FIGS. 13 a ′-b′ is black and white version of FIGS. 13 a - b showing acceptor pixel selection, where 13 a ′ represents intensity based selection and 13 b ′ represents DAC-based selection. From top to bottom: Donor (with donor lifetime bar), Acceptor hybrid (with lifetime), 9 individual pixel traces, the grayed ones rejected by the selection process. In the overlayed image, the ⁇ symbols represent accepted pixels, the + symbols represent rejected pixels and the ⁇ symbols represent noise pixels
  • FIG. 14 depicts graphically the results of the donor and acceptor pixel selection process showing donor—acceptor 1 —acceptor 2 overlays after pixel selection, where the ⁇ symbols represent accepted pixels, the + represent rejected pixels, the ⁇ symbols represent noise pixels.
  • FIG. 15 depicts donor model represent initial segments.
  • FIG. 15 ′ is black and white version of FIG. 15 showing donor model-initial stage selection, where the top panel represents donor, the darker curve is smoothed donor signal, the lighter represents original; grayed area represent donor noise level. In the middle panel, donor derivatives are shown, grayed area is its standard deviation. The bottom bar represents donor derivative “lifetime” used to set segment boundaries (vertical lines).
  • FIGS. 16 a - c depicts donor model represent optimizing segments.
  • FIGS. 16 a ′- c ′ is black and white version of FIGS. 16 a - c showing the donor model optimization.
  • FIG. 17 depicts donor model representing final stage optimization.
  • FIG. 17 ′ is black and white version of FIG. 17 showing the donor model final stage.
  • the segmented curve represents suggested ‘donor high’ level, gray area around it represents ‘noise level in donor high state’.
  • Bottom represents donor lifetime computed based on the donor model.
  • FIG. 18 depicts a numeric experiment using 17-point Savitzky-Golay smoothing filter.
  • FIG. 18 ′ is black and white version of FIG. 18 , where the dark circle represents the middle sample, the dark squares represents samples being used together with the middle one to compute the polynomial (curve), the light squares represent samples not in use, and represents the value of the polynomial at the middle data sample location (smoothed value).
  • FIG. 19 depicts simulated data, simulated data after addition of noise and after smoothing of the noisy data to shown the veracity of the smoother.
  • FIG. 19 ′ is black and white version of FIG. 19 , where the top panel represent a simulated signal (numbers showing duration in data samples), middle panel represents the simulated signal with added Gaussian noise, and the bottom panel represents smoothed signal.
  • FIG. 20 a depicts derivative anti-correlation for simulated non-noisy data.
  • FIG. 20 b depicts derivative anti-correlation of simulated moderately noisy data.
  • FIG. 20 c derivative anti-correlation of simulated heavily noisy data.
  • FIG. 20 a ′- c ′ are black and white version of FIGS. 20 a - c , where the top panel represents the donor signal, the middle panel represents the acceptor signal, and the bottom panel represents the DAC function— 1 0 a ′—no noise, 1 0 b —low noise (high S/N), 1 0 c —high noise (low S/N).
  • FIG. 21 depicts a smart smoothing process.
  • a system including a method implemented on a computer can be constructed that is capable of collecting data corresponding to changes in a detectable property of one or more atoms, molecules, ions or assemblages within a viewing volume or field of an imaging apparatus such as a charge coupled device of a viewing volume or field.
  • the method processes the single molecule level image data to identify and classify chemical events occurring at the atoms, molecules, ions or assemblages within a viewing volume.
  • the system and method are ideally well suited for collecting and analyzing DNA extension data derived from single molecule fluorescent events, especially single molecule fluorescent resonance energy transfer event between a donor associated with a replication complex and acceptors on incorporating nucleotides.
  • the system and method are capable of being applied to any single molecule level data corresponding to events occurring at atomic, molecular, ionic or assemblage sites.
  • the inventors have found that the system and method are also well suited for detection formats with limited viewing fields such as TIRF limiting viewing field, wave-guide limited viewing field, channel limited viewing fields, or any other method of restricting the volume or field being detected by the detector or imaging apparatus.
  • the methods of this invention are well suited for detecting fluorescent resonance energy transfer (FRET) fluorescent events between a donor and an acceptor or plurality of acceptors, especially FRET fluorescent events associated with nucleic acid sequencing complexes including a donor labeled polymerase and an acceptor labeled nucleotide.
  • FRET fluorescent resonance energy transfer
  • the inventors have applied the system and method to the identification and analysis of spots (fluorescent light) derived from individual DNA replicating complexes within a viewing field of an imaging apparatus.
  • the method and associated software is designed to:
  • the present invention broadly relates to a system for collecting and analyzing chemical and/or physical event data occurring at one or a plurality of locations withing a viewing volume or field of an imagining apparatus.
  • the system including a sample subsystem for containing a sample of be detected an analyzed, where the sample includes one atom, molecule, ion and/or assemblage or a plurality of atoms, molecules, ions and/or assemblages, at least one having detectable property that undergoes a change before, during or after one or a sequence of chemical and/or physical events involving the atom, molecule, ion or assemblage.
  • the system also includes a detection apparatus having a viewing field that permits the detection of changes in the detectable property of one atom, molecule, ion and/or assemblage or a plurality of atoms, molecules, ions and/or assemblages within the viewing field.
  • the system also includes a data processing subsystem connected to the imagining for collecting, storing and analyzing data corresponding to the chemical and/or physical events occurring at definable locations in the viewing field involving one or more atoms, molecules, ions and/or assemblages within the viewing field of the imagining subsystem.
  • the data processing subsystem converts the data into classifications of events according the event type determined by a set of parameters defining or characterizing each event type.
  • the method broadly includes the step of receiving data from the detection apparatus comprising one or a plurality of data channels.
  • the data channels can represent data associated with different parts of the viewing field of can represent data from the same viewing field, but separated by attributes such as frequency, intensity, phase, attenuation, flux density, any other detectable property, and mixtures thereof.
  • the data from each channels is stored. After and simultaneous with storage, the data in each data channel is registered or calibrated. This process matches locations in one data channel to corresponding locating in the other channels. Often times, the data in different channels does not directly line up, i.e., a location in the data in one data channel is not coincident with its corresponding location the another data channel.
  • This distortion may occur over the entire image, in portions of the image, or may vary across the image.
  • the registration process makes sure that all locations are registered between the channels—each location in one channel directly corresponds to the same location in all the other channels. If one data channel is a primary channel, then the primary channel data is analyzed to identify localized areas or regions—spots—within the viewing field that evidence a given value of the detected property. For example, if the primary channel represents immobilized or confined components of a reaction system such as a DNA replication complex, then the data in the primary channel is analyzed to locate the confined or immobilized components within the viewing field.
  • data in the other channels is analyzed to determine if data in the other channels can be related to the spots in the primary data. If a spot is associated with a reactive species, then the other channels should include data evidencing reactions involving the identified reactive species. Otherwise, each data channel is analyzed for such localized areas or regions—spots, and locations are identified in which data in some or all of the channels evidence reactions—changes in detectable properties over time at each spot.
  • the event data is classified into a set of event types. After classification, a time profile of events occurring at each active site is determined. The time profile of events is then output to the user. This time profile can evidence a single event or a sequence of events. For sequences of events, the sequence can correspond to a sequence of monomer additions, a sequence of catalytic reactions, a sequence structural changes, a sequence of monomer removals, etc.
  • the present invention broadly relates to a method for analyzing fluorescent resonance energy transfer (FRET) events corresponding to interactions between a donor fluorophore associated with a first molecule or assemblage and an acceptor fluorophore associated with a second molecule or assemblage, e.g., a donor fluorophore associated with a member of a polymerase/template/primer complex and acceptor fluorophores associated with nucleotides for the polymerase.
  • the method includes the step of collecting or receiving data from a viewing volume of an imagining apparatus such as an CCD or iCCD detection system, in real time or near real time.
  • the data can be in a single data channel or a plurality of channels.
  • the data is collected in a plurality of data channels, each data channel representing a different frequency range of emitted fluorescent light, e.g., one channel can include fluorescent light data emitted by a donor, a donor channel, while other channels include fluorescent light data emitted by an acceptor channel, an acceptor channel, or by another donor, a second donor channel channel.
  • a channel will exit for each different fluorophore being detected simultaneously.
  • the number of data channels monitored is five (5).
  • the number of data channels monitored is four (4).
  • the number of data channels monitored is three (3), where three generally represents a minimally configured system.
  • two (2) channels can be used provided that the acceptors are selected so that they can be separately identified based on detectable attributes of their signals e.g., intensity, frequency shifts, signal duration, attenuation, etc.
  • the separate data channels are spatially correlated within the viewing volume so that active fluorophores can be spatially and temporally related, called calibration or registration.
  • the goal of calibration is to determine the pixel coordinates in each quadrant that correspond to a single position on the slide or a single location within the viewing field—to make sure that the data in each channel is spatially coincident over the viewing field and through time of detection.
  • location distortions between channels comprises almost exclusively translations and rotations. In other systems, the distortions may be translations, rotations, shearing, stretching, compressing, screwing, twisting, etc. and the calibrating process must be able to register the data between the channels so that locations within one channel correspond to the same locations in the other channels.
  • the calibration procedure includes two principal components. Both components utilize image files comprising an average over a set of frames of a data stream from a data channel, where the set of frames can be the entire data stream collected or any subset thereof.
  • a frames is data collected by the imagining apparatus over a given short period of time that is received by the processing unit and assembly into a temporal data set for each data channel.
  • the frames generally represent average data over the collection time period depending on the imagining apparatus data collection and transmission speeds.
  • the first component is a visual tool that allows the quadrants or data channel averaged data or cumulated image to be overlaid with transparency to quickly check data alignment. This tool was constructed using standard MATLAB libraries.
  • the second component is an automated tool based on maximizing mutual information across the quadrants or data channels.
  • Mutual information quantifies the predictive power that one image has for another. For example, knowing there is a bright spot in one quadrant should mean that there is a corresponding bright spot in one or more of the other quadrants or data channels.
  • the component determines and outputs the rotation and translation operators that when applied to the data in one or more the channels produces the greatest mutual information between the quadrants.
  • This calibration process produces improved data calibration or registration.
  • the process avoids the effects of individual pixels having poor brightness, spurious or missing data or other noise.
  • the program encoding this second component was written in C++ and includes libraries from the standard ITK project libraries.
  • the method then includes the step of reading a configuration file and a corresponding open log file. After reading the configuration file and the open log file, calibrations, if any, are loaded from the command line. After loading the calibration information, a corresponding directory is read as specified in the command line with all subdirectories, for each one.
  • This read step includes: (1) scanning for calibration stacks, and if there are some not matched by the available calibrations, generate new calibrations out of them; (2) scanning for stacks; if there are some, assume this directory is a slide; and (3) scanning the directory path for a date and slide name comprising reaction conditions such as donor identity, acceptor identity, buffers, etc.
  • the method also includes the step of looping over all stacks for every slide.
  • the looping step includes: (1) finding calibration data by date and frame dimensions; (2) averaging all the donor frames in the stack or averaging the donor frames over an adjustable number of frames in the stack; (3) finding spots in the averaged donor data or quadrant; (4) applying the calibration data to the acceptor channels to find acceptor coordinates corresponding to each found donor spot; (5) identifying a 3 ⁇ 3 pixel array associated with each found donor spot in the donor and acceptor channels (although the method has been tuned to use a 3 ⁇ 3 array, the method can use smaller and larger array and the array size will depend on the detector system and on the system being detected); (6) collecting traces for each pixel in the array over the frames in the averaged data; (7) applying a pixel selection algorithm to the pixels in the array to select pixels that have a value above a threshold value; (8) averaging the selected pixels to form hybrid traces (signals); and (9) checking the donor traces for minimal requirements on lifetime and average intensity; and (10)
  • the method also includes the step of computing the acceptor “lifetimes” for each found donor spot using two different smoothing algorithms, a regular Savitzky-Golay smoother, which is adapted to identify short-lived, sharp signals, and a smart smoother, which is adapted to identify long-lived, weak signals and “broken” signals.
  • a regular Savitzky-Golay smoother which is adapted to identify short-lived, sharp signals
  • a smart smoother which is adapted to identify long-lived, weak signals and “broken” signals.
  • the method also includes the step of creating lists of acceptor events from the identified acceptor lifetimes.
  • the method also includes the step of adjusting boundaries of the acceptor events using numeric derivatives using a similar Savitzky-Golay process to achieve maximum correlation/anticorrelation with the donor.
  • the method also includes the step of computing a set of parameters for every acceptor event and assigning the every acceptor event a score based on these parameters as described below.
  • the method also includes the step of joining adjacent segments from the acceptor event lists, and find and resolve overlaps (if any) as describe in detail below. For instance, if there is a long event overlapped by several shorter events, check their scores as to decide which case describes the data better: one large event or a series of smaller ones.
  • the method also includes the step of using the resulting acceptor event list as a list of FRET event candidates: for every candidate, compute a set of FRET event parameters, such as FRET efficiency, acceptor and donor signal to noise ratios, probabilities, boundary anti-correlation coefficients, etc. as described in more detail below.
  • FRET event parameters such as FRET efficiency, acceptor and donor signal to noise ratios, probabilities, boundary anti-correlation coefficients, etc. as described in more detail below.
  • the method determines if these parameters meet minimal criteria (specified in the configuration file), and if they do, accept this candidate as a FRET event for output.
  • the method also includes the step of sorting spots of the current stack by how “event-reach” they are, and output an event-list for the whole stack. Also, add the detected events to the slide's event list. The method also includes the step of after finishing with all the stacks in the slide, generating the combined report containing results from every spot of every stack in the slide.
  • the process states with the construction of workspace and data structures to support the analysis.
  • the workspace includes configurational data, current state information such as slide/stream information stored in a separate structure, data result structures, etc.
  • the process reads the default configuration file, if present in the same directory.
  • the configuration file includes a set of configurational parameter data, which are throughout the process by the routine to find needed configurational data.
  • the process then scans a command line for a log file of options. If a log file is present, then the process opens the specified log file. If the log file is not present, then the process attempts to open a log file in the directory specified by the configurational parameter data. If no log file is found in this directory, then the process attempts to open a log file in the current working directory. If that fails, the process exits with an error message.
  • the log file is opened with shared reading options, which is required for proper inter routine communications and proper interactions with Windows operating system routines.
  • the process then checks the command line for the first argument, which is supposed to be a sub-directory in the source root directory, specified by the configurational parameter data. If not present, the process prompts the user to enter the sub-directory from the standard input (generally a keyboard).
  • the extra arguments can be either additional configurational files, an user-specified log file, or a no calibration flag.
  • the last option overrides configurational parameter data, and specifies whether the routines in the process are allowed to use the cached calibrations either found in the calibration directory given in the configurational parameter data or default calibrations given in the configurational parameter data separately for each frame size. If the configurational parameter data or the command line sets no calibration flag on, instructs the process not to use the cached calibration data. In this case, original calibration stacks must be present in the directory starting with date of the slide, and a new calibration is generated every time subsequent routines require calibration. If the calibration stacks are not present, the process fails with the error message “No calibration present”.
  • the process recursively scans the subdirectory for stacks/slides data. The process then clean up and exits.
  • This routine scans the directories for calibration data and slide information. The routine then constructs corresponding output directory names. Assuming the current directory correspond to data derived from a slide, the routine reads the list of stack files contained in the directory. If the list is not empty, the routines processes each stack file. The routine then reads the list of FITS files (FITS files stand for Flexible Image Transport System files) and generates slide wide statistics for reporting purposes. The routine then reads the list of associated sub-directories, and call processes the subdirectories recursively extracting the data contained in the subdirectories.
  • FITS files FITS files stand for Flexible Image Transport System files
  • the routine If directory name start with the proper date pattern, then the routine reads the date pattern from the directory name; otherwise, the routine returns control to the calling routine. The routine then scans the directory configurational parameters data for calibration data matching the date pattern and downloads any matches found. The routine next scans the current directory for stack and fit data files containing no more than 3 planes or frames of data. The routine then checks if calibration data for the given frame size and date is present. If the calibration data is not present, then the routine queues the file for generation of new calibration data. A queue is necessary because there can be more than one calibration stack so that the routine implemented in add calibration data can chose the best calibration stack by comparing the number of donor spots detected in each stack.
  • the calibration data is generated in context and is represented by a data structure containing overlays and spot lists from each quadrant, generated by the find spot routine described herein.
  • the routine checks the calibration queue, and generates calibration data via a generate calibration routine that determines the transformation needed to register pixel locations in one channel with corresponding pixel locations in the other channels.
  • the transform is generally comprised of simply a translation and a simply rotation. However, the transformation can be much more complex and is constructed to map pixels from one channel into corresponding pixels in other channels.
  • the routine starts by opening a stack file.
  • the routine then applies non-standard geometry settings if specified.
  • the routine checks to ensure that the file is valid, i.e., the file includes 16-bit non-compressed data, has a known frame size, has enough frames, and has an ok integration cycle time. Search for calibration data associated with the frame size and the date/time of the file collection.
  • the calibration is cached as defined above. If all conditions are met and the calibration is found, then allocate the data structures needed for detection processing and forward control to the stack processing routines.
  • the stack processing routine reads and averages frames from the stack file to generate an overlay.
  • the routine then generates an overlay picture for the donor quadrant and searches for donor spots in the donor quadrant using the find spot routines.
  • the routine then uses the existing picture object to mark the initial donor spots.
  • the routine then creates signal to noise structures for individual pixel traces, one per channel per spot.
  • the routine then applies the calibration transform to register the acceptor pixel coordinates to the donor channel pixel coordinate system.
  • the routine then reads the stack file again, collecting data samples at each frame for the identified pixel traces. For each spot, the routine applies the hi-pass filter to the donor traces and performs the donor pixel selection and generates the donor hybrid traces.
  • the routine applies the hi-pass filter to acceptor traces, and performs the Acceptor pixel selection and generates acceptor hybrid traces.
  • the acceptor hybrid trace routine is repeated for each acceptor channel.
  • the routine then stores the hybrid traces into a signal structures, which is stored as part of the signal to noise structures.
  • the routine then filters out spots that do not satisfy the donor lifetime and the donor S/N ratio conditions from the initial data file.
  • the routine then generates an overlay picture of the donor quadrant with spots found/filtered out.
  • the routine then writes the results as the list of donor spots.
  • the routine then sends the list of donor spots to the FRET analysis routines.
  • the routine generates an overlay picture of the donor quadrant with active spots, and outputs text data files related to the current stack.
  • the FRET analysis routine first allocates structures to keep the results from the analysis.
  • the routine then, for each spot in the donor spot list, makes a separate array of signal structures by copying the signal data structure from the input signal to the noise data structure previously stored.
  • the FRET analysis routine then calls the create donor model routine.
  • the create donor model routine then adds a dynamic list of acceptor data traces from corresponding pixels in the acceptor channels.
  • the FRET analysis routine then generates a list of FRET event candidates from the donor spot list.
  • the routine stores the resulting event list into previously allocated data structures.
  • the routine then counts the number of high probability events and low probability events in the list, and determines the highest probability to set a spot efficiency entry on the current spot.
  • the routine sorts the arrays based on spot efficiency entry, the number of high probability events, the number of low probability events, and the highest probability. The index within this sorted array becomes the spot ranking.
  • the routine For each spot, the routine creates a list of donor events by calling a construct donor events routines. This routine computes adjusted donor lifetimes by calling a compute adjusted lifetime routine. The routine then stores all the data such as event lists, noise level, donor lifetime, adjusteddonor lifetime, etc. into a previously allocated entry in the spot list structure, associated with current slide. The stored information becomes persistent across the whole slide, while the rest of data is deallocated.
  • the routine For each spot, the routine detects donor around events stuff, and store it into a slidewide persistent area and generates signal and FRET detection trace pictures if necessary. The routine then generate as rich spot file that contains spot info for so-called rich spots.
  • a rich spot is a spot that contain at least one FRET event.
  • the routine also generates an activity picture, with the rich spots colored.
  • the signal data structures is a data structure containing hybrid traces of one of the channels, donor, acceptor 1 , acceptor 2 , etc.
  • the elements of the data structure include:
  • nsamp number of data samples in the trace (same as number of frames in the stack file)
  • nlvl noise level computed as standard deviation (sometimes scaled by a factor) of the noise channel
  • sigbuf buffer containing hybrid trace data samples *noise buffer containing hybrid noise data samples ACC-DETECTOR *first first element in the list of additional data structures, usually related to a particular detection algorithm
  • the ACC-DETECTOR data structure containing additional information about a hybrid trace, such as intermediate data from different types of detectors, simulation data or donor model.
  • the data structure includes the following elements:
  • double stdac Standard deviation of the derivative void (*destructor)(struct-tag-ACC- pointer to a function which is called wnen the object DETECTOR *ad) is deallocated.
  • An actual implementation of ACC_DETECTOR object may contain some extra data, which is sometimes allocated dynamically. Since the control logic is not aware of such data, an implementation-specific code must be provided to handle that.
  • the standard delete_acc_detector( ) function it checks whether this pointer is not NULL, and if so, calls that function, which is supposed to take care of any implementation-specific de-initialization.
  • routine When a routine (such as a detection routine) needs to associate some extra data with a given signal, the routine constructs an ACC-DETECTOR object, and adds it to the list of ACC-DETECTOR objects, pointed to by ‘->first’ member of the SIGNAL data structure.
  • the model constructs a Smart Smoother object for subsequent operations via construct smart smoother routine.
  • the routine allocates a donor model object.
  • the model smart smoothes the original donor trace, and then compute its first derivative using a Savitzky-Golay (SG) fitting routine.
  • the model then computes a standard deviation of the derivative and stores it in the donor model object. This derivative will be used to detect slow changes in the donor trace.
  • SG Savitzky-Golay
  • the model then calls a donor lifetime routine to compute the donor's derivative lifetime. It computes another “finer” derivative of the original trace using a different SG smoother to detect fast changes in the donor trace. The model then computes segments, where both derivatives go outside their standard deviations either way (positive or negative), and then combines detected segments from both processes.
  • the model then the results representing segments, where fast donor changes were detected (high derivative value) are stored in life time buffer.
  • the SG-smoothed original donor trace is stored in signal smoothed buffer for subsequent operation using the SG-smoother from the Smart Smoother object.
  • the model then calls a routine to create initial static segments, which examines each segment having a high-derivative value, to find the sample index at which the change is highest (max/min derivative), and to break down the entire donor trace into segments with the boundaries set at those ‘high-change’ points.
  • the model typically creates a large set of tiny segments, which need certain types of optimization to determine if neighbor or adjacent donor segments (i.e., donor segments to the immediate right or left of a particular donor segment) are substantially different. If adjacent segments are not substantially different, the adjacent donor segments are joined into a single larger segment.
  • substantially different is determined by applying a variety of criteria, such as close enough average value, a tiny segment in between two larger ones with close averages, etc.
  • the model decides whether each segment represent donor on state or donor off state.
  • model iteratively calls a finalize donor model routine a few times (each time the routine iteratively improves the segment joining process) to compute final donor lifetimes and to construct a best polynomial fit of the appropriate donor segments.
  • the routine For each acceptor channel, the routine calls a subroutine to generate a list of long lived acceptor event candidates using the long lived event detection algorithm, an algorithm optimized to identify long lived events. Next, the routine calls a subroutine to generate a list of short lived event candidates using the short lived event detection algorithm, an algorithm optimized for to identify short lived events. The routine then join all the event candidate lists into a single event candidate list, where the total number of candidates in the list is 2 times number of acceptor channels—long lived events and short lived events per channel. The routine then calls a subroutine adapted to exclude conflicting entries in the joint list of event candidates as describe below. The routine then returns list of event candidates to its calling routine.
  • This routine constructs a Smart Smoother object.
  • the routine first checks to determine whether ACC-DETECTOR objects of type DETECTOR-LONG are already attached to both donor and acceptor SIGNAL objects. If not, the routine create new ones, fills them with smoothed data, and attaches the objects the SIGNAL objects.
  • the routine operates by calling a static routine to determine rough acceptor lifetimes to fill the lifetime buffer. Zero values in the lifetime buffer represent signal in the channel that are in an OFF state, while non-zero values in the lifetime buffer represent signal in the channel that are in an ON state.
  • the routine then reads the acceptors events from the lifetime buffer to create an initial array of event candidates stored as ACC-EVENT objects by scanning for non-zero segments in the lifetime buffer.
  • the routine then optimizes the acceptor event segments by joining adjacent segments iteratively based on a set of joining criteria to form joined acceptor event segments. This process is a more thorough test to determine whether adjacent ‘on’-segments should be joined together because they belong to a single event, accidentally broken apar by noise spikes.
  • the routine then calls a subroutine to determine and adjust event boundaries, where the subroutine uses the Derivative Anti-correlation (DAC) function to adjust boundaries of the event candidates.
  • DAC Derivative Anti-correlation
  • the routine For each event candidate, the routine also computes a variety of event parameters like average intensities, signal to noise ratios, etc., and compute an event, which is used later to evaluate how “good” this event candidate is.
  • the event score is computed in static according to the following formula:
  • x1 is the acceptor signal to noise ratio
  • x2 is the product of differential acceptor and donor signal to noise ratios at the beginning
  • x3 is the product of differential acceptor and donor signal to noise ratios at the end of the event. If the product is negative, it is multiplied by ⁇ 0.25.
  • the coefficient f depends on the event duration and is computed according to the following formula:
  • dl is the ratio of the event duration to a long scan distribution parameter in the configurational parameter data.
  • the coefficient f is to provide a configurable boost to the score of longer lived events.
  • the routine then cleans up and return the resulting list of acceptor events to its calling routine.
  • This routine constructs SG smoother objects for a signal trace (function) and its derivative.
  • the routine checks whether acceptor detector objects of type short lived detector objects are already attached to both donor and acceptor SIGNAL objects. If not, the routine create new ones, fills them with smoothed data, and attaches them to the appropriate SIGNAL objects.
  • the routine operates by calling a static subroutine adapted to to fill in a lifetime buffer.
  • Zero values in the lifetime buffer represent channel signals in an OFF state, while non-zero values in the lifetime buffer represent channel signals in a ON state.
  • the routine calls a subroutine adapted to join lifetime segments, which comprises segments separated by short interruptions, generally by noise.
  • the routine then calls a subroutine adapted to split up lifetime segments, which were unjustifiably joined by accidental noise or smoothing algorithm peculiarities.
  • the routine then calls a subroutine to create and initial array of event candidates stored in an acceptor event objects by scanning for non-zero segments in the lifetime buffer.
  • the routine calls a subroutine to adjust short event boundaries, where the subroutine uses the Derivative Anti-correlation (DAC) function to adjust boundaries of the event candidates.
  • DAC Derivative Anti-correlation
  • the routine calls a subroutine adapted to compute a variety of event parameters like average intensities, signal to noise ratios, etc., and compute the event acceptor score, which is used later to evaluate how “good” this event candidate is.
  • acceptor event score is computed in according to the formula:
  • x1 is the acceptor signal to noise ratio
  • x2 is the products of differential acceptor and donor signal to noise ratios at the beginning of the event
  • x3 is the product of differential acceptor and donor signal to noise ratios at the end of the event. If the product is negative, it is multiplied by ⁇ 0.25. If the event is in the beginning of the trace, x2 is forced to the value of 2.0; likewise, if the event is at the end of the trace, x3 is forced to the value of 2.0. This forcing value process reflects the fact that the anti-correlation status is not known under these circumstances.
  • the routine then cleans up and return the resulting list of acceptor events to its calling routine.
  • the purpose of this routine is to eliminate overlapping event candidates from the list of acceptor events.
  • the routine first sorts the input array of event candidates in order of event starts. Next, the routine breaks down the array into sub-arrays containing conflicting areas. The routine operates by adding a first event to the current sub-list. The routine then iterates over subsequent events until no events overlap with any events in the sub-list, adding each overlapping event to the list. If no new overlapping event are found, the routine closes that sub-list, selects an event and creates a new sub-list of overlapping events. The routine repeats this process until all events have been processed, creating a set of sub-lists including overlapping events.
  • the sub-lists contain a set of conflicting (overlapping) event candidates, but each sub-list is independent of events in any other sub-list, i.e., the sub-lists are distinct with no shared events.
  • the routine For each conflicting or overlapping area sub-list, the routine calls a subroutine to find best rated non-conflicting sub-list of event candidates. The routine operates by sorting events in the conflicting sub-list by their acceptor event score. Next, for every event in the sub-list, the routine constructs a further sub-list containing only events, which do not conflict with the starting event. The routine then compute the resulting score of every sub-list as the sum of adjusted scores of their events, then selects the sub-list with the highest adjusted score.
  • the ‘adjusted score’ is computed according to the following formula:
  • score is the acceptor event score and bias is the configurational parameter data element biasN (N is the acceptor channel number) and is set to baisN for segments from the long lived routine or 1-biasN for segments from the short lived routine.
  • the routing After resolving overlapping event data, the routing join the non-conflicting sub-lists into a single list of event candidates, and return control to its calling routine.
  • This routine is to compute FRET event parameters for every input event candidate.
  • the routine also applies certain basic criteria to filter out any obvious non-events or trash events.
  • the routine operates by computing DAC functions based on derivatives from the acceptor detector objects of type short lived events.
  • the routine then creates a ‘finer’ SG-smoother/derivative, and compute DAC functions based on the smoother output.
  • the routine adjust event boundaries. If the resulting duration does not exceed a parameter maximum short event in the configurational parameter data, the routing repeats event boundary adjustments with the ‘finer’ DAC functions.
  • the six frame problem occurs with standard smoother used to analyze short lived signals.
  • the DAC functions which are based on donor and acceptor derivatives, have peaks at the event boundaries, and the peaks are not infinitely narrow, but have certain widths. If the event duration is less than or equal to about two times the boundary widths, then adjusting the event boundaries using the standard smoothing routines gives inaccurate results. As the event duration gets shorter, the adjusted duration does not, which creates certain errors. To reduce these errors to a tolerable level, ‘finer’ digital derivatives/DAC functions are used.
  • the routine computes a whole set of parameters, associated with FRET events.
  • basic FRET event parameters e.g., start, duration, acceptor number
  • the routine determines if the computed probability is smaller than an desired or allowed minimum value given in the configurational parameter data as the low probability limit. If the probability of the event is less than the low probability limit, then the event is removed from the final FRET event list. The routine then compacts the FRET event list.
  • the routine sets the parameters il and ir for each event.
  • the parameter il is the acceptor intensity at the beginning of the event, while ir is the acceptor intensity at the end of the event.
  • the routine sets the values of il and ir equal to the average acceptor intensity value during the event, if the duration or length of the event is less than 20 frames, set both values equal to average acceptor intensity. Otherwise, the routine first best fits the acceptor trace during the event with a straight line.
  • the routine then set the value of il to the value of the straight line at the beginning of the event and the value of ir to the value of the straight line at end of the event.
  • the best fit routine can be to a polynomial of any dimension, provided that il and ir are set to the polynomial values at the beginning and end of the event, respectively.
  • routines performs cleanup operations and returns the FRET event list to is calling routine.
  • routine that in certain embodiment includes data structures having the following data.
  • the following table tabulates the slide event data stored in the data structures.
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05. Rank Spot trace rank within the slide based on how event-rich is the spot. Lower number means richer spot. DonCol Donor X-coordinate of the spot. DonRow Donor Y-coordinate of the spot. Start Start of the event in ms. Length Duration of the event in ms. Acc Acceptor number of the acceptor causing the event. Currently can be either 1 or 2, but in the future releases it will also take values 3 and 4. Prob Event probability. A value in the range 0 . . .
  • FRETEff FRET Efficiency computed as AiSN/(AiSN + DSN), where AiSN is the acceptor signal to noise ratio AiInt/AiNL (i is the acceptor number, same as Acc), and DSN is the donor dark state signal to noise ratio, either DLR/DNLC or DRL/DNLC, depending on which difference is higher, DLL ⁇ DLR or DRR ⁇ DRL. Style Event style.
  • Possible values are: 0 - No correlation between donor and acceptor of any kind (both LACC and RACC are above ⁇ 1 but below 2); 1 - Positive correlation at least at one end (either LACC, or RACC, or both are below ⁇ 1, while none of them is above 2); 2 - Negative (anti-) correlation at one end (one of the LACC or RACC is above 2, while the other is not); 3 - Negative (anti-) correlation at both ends (both LACC and RACC are above 2.)
  • Hi Indicates whether the event is hi-prob. If Prob is greater than the value of the configurational parameter hi_probi, then Hi is 1, otherwise, 0.
  • DSN donor differential signal to noise ratio, either (DLL ⁇ DLR)/ DNLL or (DRR ⁇ DRL)/DNLR, whichever is higher;
  • WTD is a coefficient equal to 0.4 for short events (shorter that configurable max_short_event), or 0.71 for long events;
  • DInt is either DLL or DRR, depending on which differential signal to noise ratio is higher.
  • Ac1Prob Acceptor 1 “probability” computed as 1.
  • DNLL Donor noise level right before the start of the event. Normally taken from the donor model, and is equal to the standard deviation from the polynomial fit at the corresponding donor segment. DLR Donor Intensity right after the start of the event. DNLC Donor noise level during the event. It is taken from the donor model, and frequently equal to DNL. DRL Donor Intensity right before the end of the event. NR Number of donor data samples following the end of the event, that were used to calculate DRR (see below.) DRR Donor Intensity right after the end of the event.
  • NR is large enough (larger than 20)
  • an average is computed, otherwise, a peak value of fine-smoothed data less DNL/,/v2.
  • DNLR Donor noise level right after the end of the event. Normally taken from the donor model, and is equal to the standard deviation from the polynomial fit at the corresponding donor segment. DNL Donor Background Noise Level. Computed as the standard deviation of the donor “noise” hybrid trace.
  • A1Int Average for long events, longer than max_short_event
  • peak acceptor 1 intensity during the event A1L Acceptor 1 Intensity at the start of the event. Computed by modeling acceptor with a straight line best fit.
  • A1NL Acceptor 1 background Noise Level Computed as the standard deviation of the acceptor 1 “noise” hybrid trace.
  • A2Int Average for long events, longer than max_short_event) or peak acceptor 2 intensity during the event.
  • A2L Acceptor 2 Intensity at the start of the event Computed by modeling acceptor with a straight line best fit.
  • A2R Acceptor 2 Intensity at the end of the event Computed by modeling acceptor with a straight line best fit.
  • A2NL Acceptor 2 background Noise Level Computed as the standard deviation of the acceptor 2 “noise” hybrid trace.
  • FIG. 1 a graphical illustration of certain of the parameters that are defined for an event are shown.
  • the parameters are defined in the table above.
  • the first line contains tab delimited text labels, the rest, data, one line per donor trace.
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05. Rank Spot trace rank within the slide based on how event-rich is the spot. Lower number means richer spot. DonCol Donor X-coordinate of the spot. DonRow Donor Y-coordinate of the spot. AvgInt Average Donor Intensity during Lifetime. LifeTm Donor Lifetime (ms). DE Ratio (Total Donor Event Duration)/(Total Trace Duration). DEAC Ratio (Total Anti-Correlated Donor Event Duration)/(Total Trace Duration). Cnt Number of Donor Events detected.
  • NPDon Number of Donor pixel traces selected by Pixel Selection and averaged into Hybrid Donor Trace.
  • NPAc1 Number of Acceptor 1 pixels selected by Pixel Selection for averaging into Acceptor 1 Hybrid Trace.
  • NPAc2 Number of Acceptor 2 pixels selected by Pixel Selection for averaging into Acceptor 2 Hybrid Trace.
  • Tab delimited file The first line contains tab delimited text labels, the rest, data, one line per donor event.
  • a Donor Event is defined as a temporary switch to dark state of limited duration, which happens in the middle of the trace (that is, there is always excited donor before and after that event.)
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05. Rank Spot trace rank within the slide based on how event-rich is the spot. Lower number means richer spot. DonCol Donor X-coordinate of the spot. DonRow Donor Y-coordinate of the spot. DonProb Donor “probability”, computed in a way similar to slide_events:DonProb. Start Start time of the Donor Event (ms). Length Duration of the Donor Event (ms). AC Anti-Correlation. If ‘Y’, the Donor Event has a match of a detected FRET Event.
  • the first line contains tab delimited text labels, the rest, data, one line per donor segment.
  • Label Name Description DSegId Slidewise unique number, identifying a Donor Segment.
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05.
  • Length Duration of the Donor Segment (ms). Excited 1 - excited, 0 - dark. Int Average Intensity. Dev Deviation of the polynomial approximation from the average intensity. Valid only for large (80 frames or more) excited segments. NL Noise Level within the segment. Based on standard deviation of the actual intensity from the polynomial approximation (or average intensity if no PA).
  • Label Name Description DSegId Slidewise unique number, identifying a Donor Segment.
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05.
  • Int Average Intensity Dev Deviation of the polynomial approximation from the average intensity. Valid only for large (80 frames or more) excited segments. NL Noise Level within the segment. Based on standard deviation of the actual intensity from the polynomial approximation (or average intensity if no PA).
  • Tab delimited file The first line contains tab delimited text labels, the rest, data, one line per FRET Event. Line-to-line match with slide_events.dat.
  • Stream Stream ID Normally, 2-digit number taken from the stack file name. For example, if the stack name is Stream05, the stream ID is 05. Rank Spot trace rank within the slide based on how event-rich is the spot. Lower number means richer spot. DonCol Donor X-coordinate of the spot. DonRow Donor Y-coordinate of the spot. Start Start time of the FRET Event (ms). Length Duration of the FRET Event (ms). LDur Duration of portion of the Donor Segment immediately preceeding the FRET Event (ms). LDInt Average Intensity of the Donor Segment on the left (same as donor segments:Int).
  • LDDev Deviation of the polynomial approximation from the average intensity of the Donor Segment on the left (same as donor _segments:Dev).
  • LDNL Noise Level within the Donor Segment on the left (same as donor _segments:NL).
  • RDur Duration of portion of the Donor Segment immediately following the FRET Event (ms).
  • RDInt Average Intensity of the Donor Segment on the right (same as donor_segments:Int).
  • RDDev Deviation of the polynomial approximation from the average intensity of the Donor Segment on the right (same as donor_segments:Dev).
  • RDNL Noise Level within the Donor Segment on the right (same as donor_segments:NL).
  • the sequencing technology utilized for analysis in this application produces fluorescence events at multiple wavelengths in a large number of individual sequencing complexes (polymerase/template/primer/nucleotides).
  • the primary analysis centers around identifying positions of the individual sequencing complexes generally within a small viewing volume or field associated with an experimental sample. That is, the actual sample volume may be disposed over a fairly large area of a surface of a substrate or in a fairly large volume of a container and the system is adapted to only view a small volume or field of the actual sample volume.
  • the viewing field could be the entire small volume if the sample is sufficiently confined to restrict its overall volume.
  • the technology is adapted to follow fluorescence intensity at multiple wavelengths over time within the viewing volume, and extracting sequence information from the coordinated, time-dependent changes in fluorescence at each wavelength (base calling).
  • the imager used specifically in this application is a frame-based CCD camera
  • data acquisition can be considered a parallel array of single detectors, each monitoring one sequencing complex.
  • simultaneous sequencing estimated to be several hundred up to 1000 individual sequencing complexes
  • efficient use of computational resources particularly where our goal is to have a near real-time output. While the inventors have not yet needed to rely on parallel computing to produce results quickly, the technology lends itself to straightforward parallelization—pipeline or matrix processing.
  • Routines were implemented in C++ in conjunction with standard functions in MatLab as well as MPI libraries (Gropp et al., 1994).
  • the routines can be run on any acceptable computer operating system platform such as Windows, Linux, Macintosh OS X, or other windowing platforms.
  • Each sequencing complex produces fluorescence signals at multiple wavelengths or frequencies. Individual fluorophores produce signals in specific wavelength or frequency ranges or bands of the electromagnetic spectrum. Thus, each sequencing complex will include more than one fluorophore, at least one donor and at least one acceptor. Each wavelength band is independently monitored. In certain detection systems, the optical system splits the spectrum and directs various wavelength or frequency bands to different quadrants of a single CCD imager. Calibration is needed to determine pixel coordinates within each quadrant or data channel of the CCD that correspond to a single sequencing complex, i.e., the calibration permits the individual quadrants to be spatially correlated or registered—locations in one quadrant correspond to locations in the other quadrants.
  • the necessary transformation is primarily a translation operation, however, a small amount of rotation may also occur requiring correction due to misalignments in the optical system.
  • translation and rotation are the major components of the calibration operation, in other systems, the calibration may have to correct for many other types of data distortion such as twisting, stretching, compressing, skewing, etc.
  • the inventors have found that light emitted from each sequencing complex is generally localized within a single pixel or a small array of contiguous pixels within the frames and quadrant rotations of even a fraction of a degree are sufficient to mis-align pixel positions at the ends of the sensors. Additionally, small deviations in the optical system over time requires that the system be calibrated on a daily basis.
  • the system can be calibrated more frequently if desired. While it is desirable to minimize these errors inherent in the hardware, the inventors believe that all systems will have some type of errors, such as alignment errors, that require calibration.
  • the inventors currently use a calibration program that adjusts translation and rotation of each image until multi-wavelength emitting fluorescent beads and/or grids (Molecular Probes) are brought into alignment.
  • Automated calibration routines are based on maximizing mutual information (MI; Viola and Wells, 1997; National Library of Medicine Insight toolkit).
  • MI Mutual information
  • the MI approach appears to work very well for data having small errors in alignment.
  • the inventors believe that the mutual information approach allows them to tweak the calibration using the fluorescence captured during sequencing itself, because the errors in alignment are small and develop slowly.
  • Using the actual sequencing data for registration should eliminate the need for a separate calibration step (i.e., with beads), and thus allow constant updating during sequencing, but is not absolutely necessary.
  • Fluorescence within the viewing field is continuously monitored by the CCD imager.
  • the first step in the analysis is to identify sequencing complexes within the viewing volume of the imaging device. Computationally, this process must be highly efficient because it is carried out for each pixel or data element in the imager (i.e., millions of pixel positions). Once the sequencing complexes are found, more complex and time consuming analyses can be directed at this subset of pixel positions.
  • the inventors have been successful using a simple averaging approach to identify potential sequencing complexes. By observing an image formed by averaging pixel intensity values over all the collected data frames or over a subset of the collected data frames, pixels localions that have fluorescence values greater than background fluorescent can be identified, particularly under conditions of static FRET. In situations where FRET is more dynamic, the inventors have found that this approach still works, but requires a running average over fewer frames.
  • the fluorescent signals are recorded by the CCD imager by counting the number of photons that arrive at a given pixel during a fixed integration time, an adjustable parameter of the imagining device.
  • Estimating the fluorescent state of each fluorophore in a sequencing complex requires two interrelated processes. First the instantaneous fluorescence intensity emitted in each band of the spectrum, donor fluorescence and acceptor fluorescence, must be extracted from background noise. Second, the fluorescence state must be estimated using this multi-band information (see below). It is clear at this point that there is considerable variance in the fluorescence intensity both from the coming together of the sequencing reagents and from instrumentation noise such as laser intensity fluctuations and camera readout noise.
  • the signals can be smoothed by standard techniques such as averaging, fast Fourier transform techniques, and wavelet techniques (Donoho and Johnstone, 1994; Cooley and Tukey, 1965; Frigo and Johnson, 1998).
  • standard techniques such as averaging, fast Fourier transform techniques, and wavelet techniques (Donoho and Johnstone, 1994; Cooley and Tukey, 1965; Frigo and Johnson, 1998).
  • the inventors have or are systematically characterizing the statistical properties of each of the noise sources. This characterization involves performing controlled experiments where each noise source, alone and in combination, is isolated as much as possible and characterized. These experiments are used to determine instrumentation noise, and characteristics of each of the fluorescent indicators. Next, controlled experiments are used to characterize dynamic spFRET.
  • FRET signatures for different event types such as true nucleotide incorporation event, mis-incorporation events, a nonproductive binding events, random collisions FRET events, etc.
  • event types such as true nucleotide incorporation event, mis-incorporation events, a nonproductive binding events, random collisions FRET events, etc.
  • sample runs can be preformed in the absence of donors.
  • mis-incorporation events the inventors observe samples where only a mismatched base is available.
  • nonproductive binding events reactions are performed in conditions that incorporation cannot occur, e.g., in the presence of a 3′ dideoxy-terminated primer. Other similar controlled reaction conditions can be used to characterize other event types.
  • Signal estimation is the process of assigning a fluorescent state to each of the molecules of interest.
  • a molecule can be at the base state (non-emitting), the excited state (emitting), triplet blinking, or bleached. Additionally the molecule may be in FRET with another fluorophore, or in partial FRET, where it transfers energy to another molecule, but continues to emit light, but at a lower intentity level.
  • certain fluorophores emit light in more than one band of the spectrum. Under some conditions where the signal-to-noise ratio is relatively high, this assignment is easily accomplished.
  • the ability to assign the correct state of each of the fluorophores at each time point in a trace ultimately determines the sensitivity of the system and will determine whether specific sequencing strategies are feasible. For example, FRET efficiency decreases rapidly with distance. The maximum usable distance is that in which the fluorescence of the acceptor molecule can still be distinguished reliably from background noise.
  • model-based estimation routines such as Kalman filtering, where each sequencing complex is considered to be in one of a series of internal states.
  • a set of observables is defined (in this case fluorescence intensity of the various molecules).
  • the observables are also analyzed for how their values vary as a function of the internal state and how their values are influenced, corrupted or degraded by various noise sources.
  • the Kalman filter then produces a maximum likelihood estimate of the state of the model given the observables.
  • This filtering represent a powerful approach, is well developed and has been applied to a variety of areas from satellite position detection to stock market prediction.
  • the time-dependent changes in the states are then interpreted as or related to sequencing events occurring at the observed sequencing complexes. This interpretation depends on the specific configuration of reagents. For example, if an acceptor molecule on a labeled nucleotide travels into a FRET volume surrounding a donor, such as a donor-labeled enzyme, FRET may occur, where the FRET volume surrounding a donor is the volume in which a donor can transfer energy to an acceptor at a rate to be observed by the imagining system. Because of the nature of a FRET event, FRET events are characterized by a decrease in a donor fluorescent signal and a corresponding and simultaneous increase in an acceptor signal—the signals are anti-correlated.
  • This time-dependent pattern of fluorescence at different wavelengths may represent or be interpreted as an incorporation event. If the fluorescence data are relatively clean, this step is very straightforward. One simply looks for specific patterns in the fluorescence signals. However, depending on the signal-to-noise ratio, it may be difficult or impossible to decide whether a specific set of changes in fluorescence is just noise. Thus, the inventors developed a set of criteria based on studying sequencing reactions subjected to a set of specific controls so that each assignment is accompanied by a numerical indicator of confidence. Such criteria includes the strength or clarity of the FRET signal, and the specific base being incorporated (characteristic patterns and/or lifetimes associated with fluorescence throughout incorporation).
  • the process starts by looking for pixels in the donor channel or quadrant that have a ‘local maximum’ donor intensity value in an averaged image, an image formed from averaging all or some of the frames in a stack for a given slide. For every value a of a pixel located at [col,row] in the image, the process determines whether the value a is greater than or equal to adjacent pixel values, and greater than 0.95 times diagonal neighbor pixel values. The condition ‘greater than or equal to’ is chosen to resolve the situation when two or more adjacent pixels have equal intensity, then the first one is picked as a candidate.
  • the pixel at [col,row] is taken as a spot candidate. Because the number of candidates can be huge (typically around 3000 on an 360 ⁇ 360 overlay), several filters are applied to limit the number of spot candidates that are passed on for subsequent processing.
  • spot candidates on an overlay image are shown as large and small dots (large dotes are green and small dots are blue and red in a color image).
  • the small dots represent candidates rejected by the stage 1 filter and by the stage 2 and 3 filters (blue and red, respectively).
  • the stage 1 filter estimates background noise level around each candidate pixel, then compares it to the pixel value a.
  • the stage 1 filter determines these levels by selects 15 least bright pixels in a 5 ⁇ 5 area [col-2,row-2 . . . col+2,row+2] and computes a mean c and a standard deviation na of their intensity distribution.
  • the signal to noise ratio (a-c)/na is a measure of how much a candidate pixel intensity value is above local background noise. If this ratio is less than a signal-to-noise threshold value, then the candidate is rejected.
  • the signal-to-noise threshold value is generally between about 1.5 and about 5. In certain embodiments, the signal-to-noise threshold value is 3.
  • FIGS. 2 a & b and 2 a ′&b′ the methodology for candidate pixel rejection and acceptance is shown.
  • candidate rejection is shown, where pixel candidates are rejected if their intensity values are below (less than) the signal-to-noise threshold value of 3 or equivalently, where the intensity a is below (less than) 3 na.
  • candidate acceptance is shown, where pixel candidates are accepted if their intensity values are greater than or equal to the signal-to-noise threshold value of 3 or equivalently where the intensity a is greater than or equal to 3 na.
  • a cross marks the candidate pixel in the left hand portion of the averaged image.
  • a gray square (blue in a color image) surrounds that candidate pixel and is a 5 ⁇ 5 surrounding pixel area. 15 least bright pixels within the 5 ⁇ 5 surrounding pixel area are marked with dots (green in a color image).
  • the graph on the right in the figures plots the intensity distribution of the 15 selected pixels represented by the dots inside the square.
  • a gray area in the plot shows the standard deviation of the background noise level.
  • a black vertical line marks the mean value c of the distribution.
  • a dark grey vertical line (red in a color image) is 3 times standard deviation na (same as the threshold signal to noise ratio) away from the mean.
  • a light grey vertical line (green in a color image) is the intensity value a of the candidate pixel. If the light gray (green) line is to the left of the dark gray (red) line, the candidate is filtered out.
  • This filter typically eliminates about 2 ⁇ 3 of the pixel candidates, leaving about 1000 out of ⁇ 3000 spot candidates.
  • the inventors have found that about 3 ⁇ 4 of the remaining candidates also do not represent a true candidate.
  • this stage 1 filter is not real efficient at candidate elimination.
  • the principal reason for the stage 1 filters lack of robustness is that it uses a local noise level, computed on statistically insufficient data. Referring now to FIGS. 3 and 3 ′, an example of a “poor” spot candidate that passed through the stage 1 filter is shown.
  • the stage 2 filter was designed to compensate for the lack of robustness of the stage 1 filter.
  • the stage 2 filter works in a very similar way from the stage 1 filter.
  • the stage 2 filter uses a global noise level, which is an average avgna of the local noise levels na of all spot candidates from the previous step.
  • the graph shows a horizontal slice of the overlay area around the candidate pixel shown in FIGS. 3 and 3 ′.
  • the dark grey (green in a color image) bars represent pixel intensity values around and including the candidate pixel, which is the middle bar.
  • a black horizontal line marks a local ‘zero’ level, the mean c of the intensity distribution of low-intensity pixels, which passes through most of the bars.
  • the gray area with the black horizontal line centered in the middle represents the global noise level avgna, an avarage of standard deviations na derived from all the spot candidates as explained above.
  • a bell curve (green in a color image) represents an estimated intensity model of the spot candidate, having its maximum at the brightest (middle) pixel. The maximum is also shown as a horizontal (green in a color image) line touching the top of the bell curve.
  • a dark (red in color image) line represents a level of minc times avgna, where minc is a parameter having value between about 3 and about 12. In certain embodiments, the parameter is 7.
  • a light gray (brown in color image) line represents a level of doubt times avgna, where doubt is a parameter having value between about 5 and about 20. In certain embodiments, the parameter is 12.
  • the signal to noise (SN) ratio is re-computed for every spot candidate as (a ⁇ c)/avgna. If the candidate SN ratio is below (less than) the value of minc, the candidate is rejected. If the candidate SN ratio is greater than or equal to the value of doubt, the candidate is accepted with no further checking. If the candidate SN ratio is in between the value of minc and doubt, which typically happens in approximately 50 to 100 cases, the candidate is passed onto the stage 3 filter.
  • stage 2 filter effectively eliminates almost all candidate pixels found by the inventors to not represent spots for further analysis, leaving only good spots (typically, 250 out of ⁇ 1000) with a relatively small amount of doubtful spot candidates.
  • the stage 3 filter is applied only to the doubtful spot candidates from the stage 2 filter.
  • the stage 3 filter starts by computing a more precise spot model by best-fitting spot pixel intensities in the 5 ⁇ 5 area [col-2,row-2.col+2,row+2] according to the formula:
  • I ( col,row ) C+A ⁇ exp( ⁇ (( col ⁇ Xm ) 2 +( row ⁇ Ym ) 2 )/ R 2 )
  • C, A, Xm, Ym, and R are computed as to satisfy least squares condition.
  • the adjusted signal to noise ratio A/avgna is then compared to the value of a parameter minc2 (ranging from about 5 to about 12, and in certain embodiments having the value 9 ). If the adjusted signal to noise ratio A/avgna is below (less than) minc2, then the doubtful spot candidate is finally rejected.
  • FIGS. 5 a - c and 5 a ′- c ′ the stage 3 filter is depicted graphically.
  • a spot candidate that passed through the stage 1 filter is shown
  • FIGS. 5 b and 5 b ′ a spot candidate that passed through the stage 2 filter is shown.
  • the best-fitted pixel intensity model is shown as a bell curve (blue in a color image) used in a stage 3 filter rejection.
  • a curve horizontal line (green line in a color image) represents a maximum intensity of the model; the line contacts a top of the bell curve.
  • a dark (red) horizontal line represent the level of minc2 times avgna. If the curve horizontal line is below the dark minc2 times avgna line, the spot candidate is finally rejected.
  • the stage 3 filter typically eliminates 10 to 20 percent of the doubtful spot candidates.
  • the remaining spot candidate objects are stored in an array and returned to the caller. They are shown as green dots on the FIGS. 1 and 1 ′.
  • potential donor candidates are not identified due to averaging over too large of a set of frames in a stack. This missing of potential donor is especially apparent when averaging is performed on an average including all frames of a stack.
  • the potential reasons for missing acceptable candidates is that certain active sequencing complexes may not have donor that have detectable lifetimes that span all of frames or a significant amount of the total frames to be selected in an averaging over too large a frame set. Thus, these potential donor candidates generally have shorter lifetimes, and the average donor intensity is consequently too low for the site to be selected as a donor candidate.
  • a dynamic binning process (adjusting the number of frames to average over) was implemented to determine whether the process changed the number of donor candidates.
  • the user enters the number of the bins as a parameter, e.g., 1, 2, 4, 8 and 10 as number of bins.
  • the parameter is modifiable based on the observed experimental donor lifetimes results.
  • the inventors found an increase in the number of the donor candidates. The inventors also found that the number of candidates increased with decreasing binning number.
  • the process generates multiple average images requiring consolidation of the donor spots.
  • the spot find process I is applied to identify initial spots.
  • the process performs voting of the donor spots. Voting involves adding the binary value associated with each spot across the averaged images and that value is stored the new master image. For example, if the stack include 1000 frames, which were imaged in 250 frame bins, then the voting would have a maximum value of 4 for each spot and an minimum value of 1.
  • FIG. 6 a depicts pixel values after voting over average donor images.
  • a neighborhood criterion to obtain a consolidated donor image. All pixels which have a value greater or equal to 1 are considered donor candidates.
  • the spots with highest votes are selected, with consecutive selections proceeding on decreasing vote values. Any donor candidate within the 3 ⁇ 3 neighborhood of a previously selected candidate is rejected. This is a recursive operation performed until all pixels with votes greater than or equal to 1 (donor candidates) have been considered. In the case of a tie in vote value, the pixel with higher intensity is selected as a donor spot.
  • the process identifies both single spots and grouped spots. Only the grouped spots undergo the consolidation operation.
  • FIG. 6 b depicts single spot selection in an average donor image after voting
  • FIG. 6 c depicts a snapshot of grouped spots after voting and selection of the donor pixel.
  • Dynamic thresholding is an alternate process for identifying or find spots (pixels location for which fluorescence is above background and may represent active sequencing complexes).
  • the pre-selection stage of the selection of donor candidates sometimes overestimates the donors and can be seen as redundant.
  • initial donor candidates can be estimated by computing a dynamic threshold. The user can enter expected donors (default is set to an experimental obtained value). Using histogram analysis, the brightest spots on the image are selected using intensity information as shown in FIG. 7 .
  • An accurate threshold value is generally determined from the intensity data alone, but can also be based on intensity and lifetime data.
  • Thresholding is a global operation and may result in donor candidates that are actually with the closed 3 ⁇ 3 neighborhood of a previously identified donor candidate.
  • the candidate identification process keeps track of single spots and grouped stops or clusters, by using morphological operations (single pixels in the 3 ⁇ 3 neighborhood matrix are separated from grouped pixels).
  • FIG. 6 b and FIG. 6 c depict single spots and group spots identified after voting and selected donor pixels after conkoidation.
  • the process uses an approach similar to the approach used for consolidation of donors as described before, where the process analyzed the distance (3 ⁇ 3 neighborhood information) between candidates, votes and intensity information.
  • the thresholding gives rise to several instances of a donor candidate within the 3 ⁇ 3 neighborhood of another donor candidate. These occurrences are resolved into real donor candidates using vote and intensity information as discriminators.
  • the process For every spot (donor) at [col,row], the process selects nine brightest pixels for the donor signal and up to eight pixels around the nine brightest pixels as donor noise data. At first, the process sorts pixels in a 7 ⁇ 7 area [col-3,row-3 col+3,row+3] surrounding a spot by decreasing intensity. Then, the process selects nine (9) pixels in a 3 ⁇ 3 array or area [col-1,row ⁇ 1 . . . col+1,row+1] with the candidate pixel in the middle of the 3 ⁇ 3 area. After that, the process randomly selects up eight (8) pixels having the lowest intensity from the set of pixels outside of the 3 ⁇ 3 array or in the second part of the 7 ⁇ 7 area as noise pixels for each 3 ⁇ 3 array including a bright pixel.
  • the method was tuned used 3 ⁇ 3 and 7 ⁇ 7, but the method can equally well work with larger and smaller arrays, n ⁇ n and n ⁇ m array where m and n are integers and m>n, with the array size being a function of the detection system and the system being detected.
  • the donor quadrant coordinates [col,row] are transformed into acceptor quadrant coordinates [colA,rowA] by applying the coordinate transform obtained from the calibration data. That is, the data in the acceptor channels are transformed by the calibration transform so that locations in the acceptor channels correspond to location is the donor channel. Then, the nine (9) pixels in a 3 ⁇ 3 area or array including a pixel location [colA,rowA] in the acceptor channel corresponding to each of the selected donor pixel location [col, row] are selected as candidates from acceptor channel. Because at this stage of the analysis, there is no way to a prior discriminate between good and poor acceptor pixels, all nine pixels are selected in the 3 ⁇ 3 array including a donor corresponding acceptor pixel. The coordinates of acceptor noise pixels are obtained by applying the coordinate transform to donor noise pixels.
  • FIGS. 9 a - d and 9 a ′-d′ four examples of the initial pixel selection methodology are depicted graphically.
  • an inner square green in a color image delimits the 3 ⁇ 3 area [col-1,row-1 col+1,row+1] from which the 9 donor signal pixels are selected.
  • An outer square blue in a color image delimits the 7 ⁇ 7 area [col-3,row-3 . . . col+3,row+3] from which the 8 donor noise pixels are selected shown as gray dots (cyan dots in a color image).
  • dark dots red in a color image
  • gray dots represent the 8 selected acceptor 1 noise.
  • the dark dots represent the 9 selected acceptor pixels in acceptor channel 2 and gray dots represent the 8 selected acceptor 2 noise.
  • the exact location of the acceptor pixels are determined by the application of the calibration transformation derived calibration routines.
  • the process reads the stack file again, frame by frame, and collects individual pixel traces, i.e., data associated with a given pixel location in each frame through all the frames in the entire stack or that portion of the stack that includes potentially relevant sequencing data.
  • individual pixel traces i.e., data associated with a given pixel location in each frame through all the frames in the entire stack or that portion of the stack that includes potentially relevant sequencing data.
  • the candidates would represent pixels that have values above a threshold.
  • the candidates would represent pixels that have values above a threshold as well, but the average would be over less than all the frames.
  • the candidates signals may extend from one bin to the next bin so the trace would extend until the relevant data is collected into the trace.
  • Every signal trace can be considered as a useful signal to which an amount of random (chaotic) noise is added.
  • the zero-point of the signal intensity can be defined as the mean of the noise intensity distribution. This zero-point is not constant as it has been found to slowly change over time.
  • This slowly changing portion of the intensity is computed as a polynomial approximation (using a least squares fitting approach) of the averaged noise trace, which is a simple arithmetic average of all noise pixel traces in a channel. Although least squares fitting has been used, other fitting approaches can be used as well as a hi-pass filter for the pixel traces. The value of the approximating polynomial is then subtracted from every individual pixel trace in a channel to remove this slowly varying noise.
  • FIGS. 10 a - d and 10 a ′-d′ the operation of the hi-pass filter is graphically illustrated.
  • the noise pixel traces are averaged into a single averaged noise trace (top graph), then its polynomial approximation is computed using a least squares algorithm.
  • the value of the polynomial is subtracted from every individual pixel trace.
  • the value of the approximating polynomial is subtracted from donor signal pixels as shown in the top graph with the result of the subtraction shown in the bottom graph.
  • the horizontal line (blue in a color image) represents the zero-level, the mean of the background noise intensity distribution for the donor data.
  • FIGS. 10 c and 10 c ′ the noise pixel traces from an acceptor channel are averaged into a single averaged noise trace shown in the top graph.
  • its polynomial approximation is subtracted from every individual acceptor pixel trace.
  • FIGS. 10 d and 10 d ′ the value of the approximating polynomial is subtracted from acceptor signal pixels as shown in the top graph with the result of the subtraction shown on the bottom graph.
  • the horizontal line (blue is a color image) represents the zero-level, mean of the background noise intensity distribution for the acceptor data.
  • This procedure is performed separately on the traces from each channel, donor and acceptors. As a result, for every identified spot object, a set of channel objects is created. Every channel object contains 9 signal pixel traces, and up to 8 noise pixel traces that were picked from around the signal pixels. Not all of the 9 signal traces are retained in the final data output, since not all of them contain useful signal information. Lower intensity signal traces are eliminated by subsequent processing of donor and acceptor pixel selection methodology described herein.
  • a pixel trace set typically includes 9 signal pixel traces and up to 8 noise pixel traces.
  • the process described below constructs single hybrid traces from the donor channel and from each acceptor channel for every spot.
  • the hybrid traces are constructed to optimize or maximize the signal to noise ratio of the data from every channel.
  • Every individual donor pixel trace is smoothed with a Smart Smoother as described below, then compared to the noise level in order to determine segments, where the signal goes above thenoise level (lifetime).
  • the noise level NL is computed as a square root of a square average of all noise samples across all noise pixel traces, assuming that the mean of the noise intensity distribution is zero after application of the hi-pass filter.
  • a score of every pixel trace is computed as an average of original (non-smoothed) data during the lifetime. If the lifetimes of individual traces differ significantly, the traces with short lifetimes (shorter than half of the longest lifetime in the set) are rejected.
  • the remaining traces are sorted by score. Then those traces having a score higher than half of a highest score are selected for averaging into the hybrid trace. However, if the number of traces having a score greater than half the score of the highest score is greater than 5, then only five traces are selected so that the five have the highest score and their score is greater than half the score of the highest scored pixel.
  • donor pixel selection process is illustrated graphically.
  • the figure includes an overlaid data image and ten panels that include pixels traces.
  • the nine bottom panels show the individual donor pixel traces in the 3 ⁇ 3 donor pixel array.
  • the traces that do not include solid segment lines below the trace represent traces rejected by the analysis and are not used in producing the average donor trace shown in the top panel.
  • the rejected donor pixels are shown as dots in the pixel image box.
  • Each trace having a solid segment line below the trace is graphed with its original, non-smoothed data (light green in a color image) shown as fine line about a solid thicker line (dark green in a color image) representing its smoothed data generated using the Smart Smoother of this invention.
  • the horizontal bars (green in a color image) below the accepted traces are the lifetime segments used in calculating the hybrid donor trace.
  • the top panel in the figure is the hybrid trace, an average of the selected traces.
  • the gray horizontal strip centered about a zero line evidences the final noise level, computed as the standard deviation centered at 0 of the hybrid noise trace.
  • the solid bar (green in a color image) underneath the trace shows the donor's hybrid lifetime.
  • the overlaid data image shows the spacial position of the donor signal pixels and noise pixels.
  • the selected traces are shown as large boxes, while rejected traces are shown as small boxes. In this example, four traces were selected and five traces were rejected.
  • An equal number of noise traces randomly picked from the 8 available are averaged into a single hybrid noise trace. From this averaged noise, the final noise level is computed as the standard deviation from 0 of the hybrid noise pixels.
  • a lifetime LT representing the number of data samples (frames) above the noise level (convertable to secondsby multiplying by time between samples)
  • average donor intensity during the lifetime Int and (3) donor signal to noise ratio S/N, computed as Int/NL.
  • the rejection criteria is based on the computed average lifetime and signal to noise ratio computed during the donor lifetime compared to the configurable minima of these values.
  • the minimum lifetime parameter contained in the parameter bad lifetime which is adjustable and is currently set to 20 data samples or frames, and a signal to noise minimum parameter designated bad_dsn, which is also adjustable and is currently set to 1.5.
  • the configurable minima were chosen based on empirical evidence that it is practically impossible to reliably detect anything at all in traces that do not meet these criteria.
  • the discrimination between good and not so good acceptor pixel traces is more tricky, because the acceptor signals are typically short and week.
  • the inventors currently use two competing methods to analyze the acceptor signals. These two methods can and often do produce different results. The inventors then use special logic to choose the method that yield the best results.
  • the first method is an intensity-based method and was optimized to detect long-living events.
  • the method applies a Smart Smoothing routine (described below) to each pixel trace, then computes lifetimes as segments in the acceptor traces, where the smoothed data values are above the noise level.
  • the method then assigns a score to the computed lifetimes as the ratio of standard deviation during lifetime to standard deviation outside lifetime.
  • FIG. 12 a shows the score scaled by the factor 1000 next to each pixel trace.
  • the factor 1000 is chosen solely for presentation, it has no meaning in the application of the method.
  • the traces are then sorted by score in descending order, and a cut-off value is defined as half the average of the two highest scores.
  • the cut-off at 50% is chosen because adding lower intensities to the final hybrid trace does not improve signal to noise ratio, which has been confirmed experimentally on both simulated and real data.
  • the traces that have lower scores, are rejected.
  • An additional routine is applied to check whether the lifetimes of individual traces match each other at least half of the time. If the lifetime of a trace has a significant (more than 50% of the longest lifetime) mismatch with the others, the trace is also rejected.
  • a spacial configuration of the pixel cluster is checked to ensure that non-adjacent pixels were not included in the cluster, because non-adjacent pixels cannot be from the samereplication or sequencing complex.
  • the intensity-based acceptor pixel selection method is illustrated graphically.
  • the nine bottom graphs show individual acceptor pixel traces.
  • the grayed graphs are the traces that have been rejected by the logic.
  • the top (green) graph shows donor hybrid trace, and the graph right below it, the hybrid acceptor trace obtained by averaging selected (non-grayed) individual acceptor pixel traces.
  • the overlay picture shows spacial location of all nine candidates, selected pixels shown in bold, and individual noise pixels.
  • FIG. 12 b a derivative-based acceptor pixel selection process is illustrated graphically.
  • the graphs below the time line show individual acceptor pixel traces.
  • the grayed one(s) have been rejected, and did not contribute to the average (red) graph at the top.
  • Below each graph the product of its derivative and donor's derivative is shown.
  • the green graph at the top is the hybrid donor signal.
  • the logic checks whether it has produced satisfactory results. That means, it detected one or more acceptor lifetime segments, comparable in duration to the S-G parameters nL and nR, and if the signal to noise ratio of these segments is higher than minimal signal to noise ratio, which can range from about 1.5 to about 2, the current preferred value is 0.7. If the above conditions are not met, the logic applies derivative-based algorithm. Finally, the logic averages selected acceptor traces into a single hybrid trace, then averages an equal number of noise traces, to create a hybrid acceptor noise channel, which is expected to have a compatible noise level.
  • the result may be saved into a signal file in the following format:
  • stack_name file name of the stack file (normally, without extension);
  • stack_directory path to the directory of stack file
  • nsamples number of data samples in every trace, equal to the number of frames in the stack file
  • donCol,donRow coordinates of the central donor pixel
  • spotname trace name, one of the following:
  • col,row represents the coordinates of the signal center pixel.
  • the parameter mask is a bit mask that shows which of the 9 pixels in the 3 ⁇ 3 area around the center pixel have contributed to the cumulative signal. Bit 0 is set when the pixel at (col-1,row-1) has been selected, bit 1 for (col,row-1), and so on.
  • the value is an hexadecimal sum of one or more bit values represented in the table below.
  • a signal can be considered as transitioning between a digital zero state and a digital unit state, i.e., transitioning between 0 and 1. While the digital zero level can be established fairly well by examining the noise channel, the digital unit level poses a problem, because it is not stable.
  • acceptor channels For acceptor channels, the task seems to be relatively easy and straightforward, because the acceptors are normally at their zero level, well established and fixed by the hi-pass filter. That is, the acceptors are in a dark state unless or until they receive sufficient energy from a source to fluoresce. Although some background acceptor emissions are seen, the principal pathway to acceptor fluorescence is via energy transfer from an excited donor as the sample is being irradiated with light the only the donor can accept. Therefore, the process simply assumes that an acceptor is at zero level as long as its intensity does not go above the noise level.
  • the donor data is more difficult to digitize.
  • the donor signal can be on—it is being irradiated by a light source on a continuous basis.
  • the donor can be transferring energy to an acceptor.
  • the donor can inter-system cross from a singlet manifold to a triplet manifold, which is observed experimentally as blinking.
  • the donor can non-radiatively lose excitation energy, also observed as blinking.
  • the donor can temporarily photobleach or permanently photobleach. Additionally, the donor intensity has been found to fluctuate around its unit level and its unit level has been found not to remain constant over time. Thus, this routine is designed to find donor unit levels at different moments in time.
  • the process breaks down the entire donor signal into segments, on which no swift and rapid changes occur. This segmentation of the signal is done by computing the signal's derivative and finding its outstanding extrema, that is, where the derivative goes above or below 1.2 times its own standard deviation. The value of 1.2 times was experimentally established to give the best overall results, but the parameter can range from about 0.8 to about 2.0. Every such extremum defines a segment boundary. The area between two consecutive extrema is a segment. At this point, there are too many segments, and most of them are too small.
  • the bottom portion graphs the derivative of the donor signal (red in a color image).
  • the gray area denotes 1.2 times its standard deviation as an evidence of the noise level associated with the signal.
  • the vertical lines (cyan in a color image) in the bottom graph mark boundaries of the segments derived by application of the routine onto the data trace.
  • the top portion graphs the donor signal; the raw signal is shown in light gray (light green in a color image) and the smoothed signal is shown in dark gray (dark green in a color image).
  • the gray area denotes 1.2 times its standard deviation as an evidence of the noise level associated with the signal.
  • the straight line graph (dark blue in a color image) plotted through the raw and smoothed data show averaged intensities for the segments.
  • the method For every segment, the method computes two parameters.
  • the parameters are the segment length or temporal duration and the average intensity of the signal in that segment. These two parameters are then used to decide, whether one or more adjacent segments should be joined into a single larger segment. This joining is typically done when two adjacent segments have close average intensities.
  • the term “close average intensities” means that adjacent segments have intensity values that differ by between 1 and 2 times the noise level. In certain embodiments, the term “close average intensities” means that the adjacent segments have intensity values that differ by less than 1.4 times the noise level. Segments are also be joined, if small data segment in interposed between to relatively long segments. Generally, a small data segment is a segment that extends over less than 40 frame or data samples.
  • the routine joins two segments if an intervening segment as a duration between about 20 and about 40 data samples. In other embodiments, the routine joins two segments if an intervening segment as a duration between about 30 data samples.
  • the routine consider segments separated by a short segment relatively long for the purpose of segment joining if the segments on each side of the short segment have durations or lengths 1 to 2 times larger than the short segment. In certain, embodiments, the two segments on each side of the short segment have durations or lengths 3 to 4 time larger than the short segment.
  • the segment optimization routine computes a segment length or duration and a segment average donor intensity. Based on these two parameters, several adjacent segments are joined into large one segments. Also, the routine determines, whether the donor signal is mostly at its unit level as evidence by horizontal and vertical lines through the data trace (blue lines in a color image). This segmentation representation of the data trace also include a horizontal line than represents when the fluorophore is a zero level (not emitting light) (red lines in a color image).
  • the optimization routine also distinguishes between segments, where the signal is mostly at the unit level, and the segments, where the signal is mostly at the zero level.
  • the unit level can be computed out of segment data alone, but for the latter, the unit level has to be derived out of its neighbors.
  • FIG. 16 aspects of the a donor model relating to final stage processing is illustrated graphically.
  • the unit segments, segments where the fluorophore is active, are best fitted to a polynomial function represented by a solid curve through the trace (blue in a color image).
  • the standard deviation (unit noise level) associated with the polynomial function is shown as a gray area with the curve centered therein.
  • the dark gray horizontal bars (dark green in a color image) at the bottom of the figure show segments where the donor signal has a high intensity value; while light gray horizontal bars (light green in a color image) show segments, where the donor signal has a low intensity value.
  • the final step in the process is to fit all unit segments, segments where the fluorophore signal stays at the unit level most of the time, with a polynomial function that follows the variable unit level of the signal intensity.
  • the standard deviation associated with polynomial function is also computed, and serves as a measure of noise level around the unit level. For all zero segments, the unit level is assumed to be constant, and equal to unit level value computed at the previous step, and the noise level is assumed to be equal to the background noise level.
  • the donor trace at a particular location in the viewing field is represented by a set of zeros and ones through the frames.
  • the value of 1 over a segment of the donor trace signifies that the donor is in a high state and is simply determined by comparing the trace segment to the local unit level less local noise level—if the signal is above this value, the unit level value is set at 1 (donor is in a high state); otherwise, the unit level value of this donor is set a 0 (donor is in a low state).
  • a donor segment may not fall to a value below local noise level, but is situated between two much higher intensity peaks; in such as case, the segment is also assigned a zero value.
  • a low-pass filter is usually applied to signals that are variable, that is both slowly varying and corrupted by random noise. In such case, it is sometime useful to replace each actual data point with a local average of surrounding data points. Because nearby points measure very nearly the same underlying signal value, averaging over these surrounding data points can and often does reduce the level of noise without much biasing of the averaged signal value obtained.
  • the present invention utilizes a particular lo-pass or smoothing filter sometimes referred toas a “Savitzky-Golay” lo-pass filter, “least-squares” lo-pass filter, or DISPO (“Digital Smoothing Polynomial”) lo-pass filter.
  • the lo-pass filter operates by replacing a value of every input data point with a value derived from a polynomial fitted to that input data point and several nearby, generally adjacent, input data points.
  • a Savitzky-Golay, lo-pass smoothing filter is illustrated graphically.
  • the filter For a data point f i represented by a large square DP (green in a color image) in the figure, the filter then fits a polynomial of order M represented by the solid line curve (blue in a color image) to all data points from i-nL to i+nR (green dots), then replaces the value of the data point f i with the value of the polynomial at position i represented by a large square PV (red in a color image).
  • the coefficients of a fitted polynomial are themselves linear in the values of the data.
  • the i th derivative value of the data trace is replaced not by the value of the fitting polynomial, but by the value of the derivative of the polynomial at the i th data position.
  • the coefficients for the polynomial can be performed in advance, by pre-computing coefficients Ci-nL . . . Ci,nR.
  • the fitting polynomials is at least of order 4 .
  • the parameters of the Savitzky-Golay, lo-pass smoothing filter are:
  • the simulated data comprises a constant signal interrupted by progressively narrower gaps. The size of gaps in data is shown above as numbers.
  • the simulated data is shown with simulated white Gaussian noise added having a standard deviation of about 0.25.
  • the horizontal gray bar represents the noise level, computed as/2 times the standard deviation of the noise (about 0.3 in this case).
  • a set of 3 coefficients c i , C i , and C; + are determined to have the values 1 ⁇ 3, 1 ⁇ 3, and 1 ⁇ 3, respectively, which is identical to the three point averages of the smoothing filter.
  • DAC Derivative Anti-Correlation.
  • DAC is a function operates by deriving a value of a parameter mdash. If at any point both donor and acceptor derivatives have the same sign, then the value of DAC is set to zero (0). If at any point both the donor and acceptor derivatives have opposite signs, then the value of DAC is set as the product of the acceptor derivative value and the absolute value of the donor derivative value at the point.
  • FIG. 19 a an example of the derivative anti-correlation methodology is illustrated graphically for ideal, non-noisy anti-correlated data.
  • a simulated donor trace having an intensity dip in the middle of the trace is shown.
  • a simulated acceptor trace having an intensity bump, anti-correlated with the donor dip is shown.
  • the DAC values for the above signals are shown. The positive peak marks the start of an anti-correlated event, and the negative peak marks the end of the anti-correlated event.
  • FIG. 19 b an example of the derivative anti-correlation methodology is illustrated graphically for moderately noisy data.
  • the peaks are well above the standard deviation of the DAC function, so the DAC helps to detect even short anti-correlated events, that would be otherwise undetected.
  • FIG. 19 c an example of the derivative anti-correlation methodology is illustrated graphically for heavily noisy data. If the noise level is too high, the DAC is unable to detect anti-correlated events, because the peaks are comparable to the standard deviation of the noise level. Short events become very difficult to detect, while long events are detected by other means, such as heavy data smoothing and analyzing average signal intensities over long periods of time.
  • the DAC is effective even for short signals, provided that their shape is not too much distorted or attenuated by the noise level.
  • a standard Savitzky-Golay (S-G) smoothing filter (as described above) does not produce satisfactory results for heavily-noisy data, even if it contains some obvious long-lived signals.
  • An S-G filter designed for heavy smoothing e.g., larger number of samples, lower polynomial order
  • removing enough noise distorts the boundaries of the rectangular-shaped signals, making it nearly impossible to detect the correct boundaries. Also, the filter tends to lose shorter signals.
  • An S-G filter designed for fine smoothing (e.g., smaller number of samples, higher polynomial order), on the other hand, tends to leave a great deal of noise, which can break down large signals into series of smaller ones, and also create many false positives in between the real signals.
  • the principal idea of the smart smoother of this invention is to balance the two S-G filters so that on flat segments, the heavy smoother takes precedence, removing most of the noise, while in areas where the intensity is rapidly changing, the fine smoother is invoked, preserving the exact signal boundaries, critical for detecting anti-correlated spFRET signals.
  • the next step is to convert the derivative, a function that theoretically ranges from ⁇ ⁇ to + ⁇ , into a balance function, which ranges from zero (0) to one (1), where the balance function has the value of zero (0) when the derivative is zero, and the value of one (1) when the derivative goes to infinity in either direction.
  • the balance function b is computed as:
  • Var is the variance given by YFi 2 /n, where n is the total number of data samples.
  • the balance function is smoothed with the same “middle” S-G parameters, as the ones for derivative.
  • values of the balance function may be out of range zero to one at a few points, so an additional process is applied to force the values within the zero to one boundaries.
  • the resulting balance function is shown in the middle panel in FIG. 20 comprising a solid curve with a shaded area below the curve (light-red in a color image) and a shaded area above the curve (light-blue in a color image).
  • the top three panels represent a simulated data trace.
  • the top most panel comprising six high intensity bumps of different lengths with the length shown below the bumps having a SN of 1.35.
  • the next panel represents the simulated data trace with Gaussian noise.
  • the gray bar about the solid zero line denotes the noise level, computed as standard deviation of a separate noise-only trace, generated with the same settings as used with the original signal.
  • the solid horizontal bars below the gray area represent the data segments of the smoothed curve, i.e, the segments of the curve that have values above the gray bar.
  • the next panel ***; e) Red graph.
  • Grey area is the noise level, same as in (c). Red bars below is the lifetime, similar to (c); f) Green graph.
  • Smart-smoothed signal the combined signal, computed as b *Fs+(1-b)*Fr, where Fs is the light-smoothed data (e),Fr is the heavy-smoothed data (c),b is the balance function (d). Grey area is the noise level (same as above), green bars below is the lifetime.
  • Fs is the fine smoothed data
  • Fr is the heavy smoothed data
  • Fsm is the smart smoothed data

Abstract

A method to be implemented on or in a computer is disclosed, where the method includes data collection, calibration, candidate selection, and analysis of data streams associated with each candidate to classify single molecule fluorescence resonance energy transfer events. Once classified, the classification can be related to the nature of the events, such as the identification of dNTP incorporation during primer extension to obtain a base read out of an unknown template.

Description

    RELATED APPLICATIONS
  • This application claims priority as a continuation under 35 U.S.C. §120 to U.S. patent application Ser. No. 11/671,956, filed Feb. 6, 2007, which in turn claims priority as a continuation-in-part to U.S. patent application Ser. No. 09/901,872, filed Jul. 7, 2001, as a continuation-in-part to U.S. patent application Ser. No. 10/007,621, filed Dec. 3, 2001, now U.S. Pat. No. 7,211,414, and to U.S. Provisional Patent Application Ser. No. 60/765,693 filed 6 Feb. 2006. The disclosures of the above-identified applications are incorporated herein by reference as if set forth in full.
  • GOVERNMENTAL INTEREST
  • Some of the subject matter disclosed in this application was funded to some degree by funds supplied by the United States Government under NIH grant no. 5 R01 HG003580
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a method for characterizing signals generated from molecular events at the single molecule level, such as donor-acceptor fluorescent resonance energy transfer events, of dynamic systems or static systems over a period of time, where the event data can be collected continuously, periodically, or intermittently and analyzed continuously, periodically or intermittently. The data collection and analysis, thus, can be in real time or near real time, while analysis can be any time post collection. A dynamic system means that the data is collected on the system in real time over the period of time as the system undergoes detectable changes in one or more detectable properties, while a static system means that the data is collected for a given period of time and the system is unchanging during that period of time.
  • More particularly, the present invention relates to a method for characterizing signals generated from detectable molecular events at single molecule level, where the method includes the steps of collecting and storing data from a viewing field associated with a detector, where the viewing field includes a plurality of molecules or molecular assemblies capable of being detected directly and undergoing a detectable event or a plurality of detectable events, where direct detection involves monitoring at least one detectable property associated with the molecule or molecular assembly and where the detectable events involve interactions associated with or occurring at the molecule or molecular assembly. Data associated with the viewing field is collected into one data channel or a plurality of data channels, where each data channel corresponds to an attribute of the detected events, such as intensity, frequency or wavelength, duration, phase, attenuation, etc. The method also includes the step of reading the stored data and spatially registering or calibrating the data channels so that a given location within the viewing field in one channel corresponds to the same location in the other channels—the data is registered relative to the viewing field. After registering, candidate molecules or molecular assemblies are identified. The candidate identification is generally designed to minimize locations within the viewing field that include more than a single directly detected molecule or molecular assembly to simplify data analysis. Next, an nxm array of data elements such as pixels is selected for each candidate so that the array includes all data elements having a detection value above a definable threshold originating from or associated with each candidate such as a definable intensity threshold value. Then, a plurality of “dark” data elements or pixels in an immediate neighborhood of the array associated with each candidate are selected to improve background removal. Once the array and background elements have been selected, a hybrid dataset for each candidate is constructed derived from data from two or more data channels. The hybrid dataset is then smoothed and differentiated. After smoothing and differentiating, non-productive events are separated from productive events based on a set of criteria, where the criteria are dependent on the detectable property and events being detected. The productive events are then placed in time sequence. For donor-acceptor systems, the method includes determining anti-correlated donor and acceptor fluorescent signals. For monomer sequencing (nucleotide, amino acid, saccharide, etc.), the criteria are designed to separate binding and mis-incorporation events from true incorporation events, and when placed in time order, evidence a sequence of monomers in a target sequence of monomers.
  • 2. Description of the Related Art
  • With the increase in single molecular analytical techniques, there have been developed many software routines for analyzing the resulting data. However, each single molecule analytic technique gives rise to many unique problems and normal analytical software is ill suited to analyze data from very specific single molecule data detection systems.
  • Thus, there is a need in the art for data processing processes that can help researchers understand and characterize data corresponding to detectable events arising at the single molecule level especially in the area of single molecule fluorescence detection such as fluorescent resonance energy transfer signals originating from interactions between a donor or plurality of donors and an acceptor or a plurality of acceptors.
  • DEFINITIONS USED IN THE INVENTION
  • The term “single molecule level” means any individual system capable of undergoing detectable chemical or physical events that can be detected and analyzed independently. For example, systems of isolated atoms, molecules, ions, or assemblages of atoms, molecules and/or ions that have a detectable property that changes during a chemical or physical event capable of individual detection and analysis satisfy the definition. Such systems include, without limitation, any isolated reactive system having a detectable property that undergoes as change before, during or after a chemical and/or physical event or reaction. Exemplary examples of such systems including, again without limitation, DNA replication complexes, protein translation complexes, transcription complexes, any other isolated or isolatable biological system, quantum dots, catalysts, cellular sites, tissue sites, domains on chips (groove, lines, channels, pads, etc.), or any other system having a detectable property that undergoes a change before, during and/or after a chemical and/or physical event. Although the isolated single reactive systems simplify analysis, images including overlapping or multiply occupied sites can be analyzed as well, but with greater difficulty.
  • The term “detection at the single molecule level” means that chemical events are being detected at the single molecule level.
  • The term “anti-correlated” means that changes in a value of a first detected response are opposite to changes in a value of a second detected response.
  • The term “correlated” means that changes in a value of a first detected response are coincide (same direction) to changes in a value of a second detected response.
  • The term “data channel or data quadrant” means data that has a particular attribute such as data within a given frequency range of light derived from a given detector or imagining system. A quadrant more specifically is terminology relating to a data channel of a particular type of imaging apparatus such as a charge coupled device (CCD) imaging apparatus.
  • The term “slide” means an actual sample, which is often disposed on the surface of a treated or untreated surface such as the surface of a cover slip.
  • The term “viewing field” or “viewing volume” means the actual portion of the sample that is being observed by the imagining or detecting system. Often this volume is considerably smaller than the actual sample and is dependent on the exact nature of the imagining or detection system being used.
  • The term “frame” means an image of the viewing field taken over a short period of time within the imagining or detecting system prior to being outputted to the processing system. The size and time span of the frame depends on the memory, buffering, outputting speed and receiving speed of the imagining system and of the processing system.
  • The term “stack” or “stream” means a set of frames. Thus, frames from a single slide are collected as a stack of frames or a stream of frames.
  • The term “trace” means data for a particular data element or pixel over all the frames in a stack or over a given number of frames in a stack.
  • The term “related data” means data from other data channels that are related to data from a selected data channel. The data can be spatially related, temporally related, network related, etc. or related through a combination of these relationship types.
  • The term “data calibration or registration” means transforming data in one data channel so all locations within that data channel are matched to corresponding locations in other data channels.
  • The term “assemblage” means a collection of atoms, molecules and/or ions to form an isolated or isolatable system. For example, a DNA replication complex is an assemblage and a ribosome translation complex is an assemblage. The collection can be of a single atomic or molecular type (atom clusters, molecular cluster, etc.) or a collection of mixtures of atoms, molecules, and/or ions. Assemblages can also be constructed of assemblages. The main criterion in the definition is that the assemblage be capable of being isolated or formed in an isolated manner so that detectable events occurring at each individual assemblage can be separately detected and analyzes.
  • The term “spot” means a location within a viewing field of an imaging apparatus that evidence fluorescent light from one or more atoms, molecules, ions or assemblages. Although the method have focused on fluorescent light, the method can be applied to any detectable property that corresponds to one or more atoms, molecules, ions or assemblages within a viewing field.
  • SUMMARY OF THE INVENTION
  • The present invention provides a method implemented on a computer for collecting data in real or near real time, at the single molecule level corresponding to detectable chemical and/or physical events and analyzing the collected data to identify the events and classify the events as to their intrinsic nature. The method can be used to collect and analyze data from monomer additions, polymerase extension reactions, protein biosynthesis at ribosomal machinery, (translation reactions), saccharide polymerization reactions, kinase phosphorylation reactions, or any other reaction that involves interactions between atoms, ions, molecules or assemblages having at least one detectable that undergoes a change before, during or after the reaction being monitored.
  • The present invention also provides a method implemented on a computer including the step of collecting data representing values of an attribute or attributes of a detectable property or detectable properties of an atom, an ion, a molecule or an assemblage of atoms, ions and/or molecules or a plurality of atoms, ions, molecules or assemblages of atoms, ions and/or molecules within a viewing volume or field over a period of time. The collected data includes data derived directly from the atom(s), molecule(s) and/or assemblage(s) and data derived from events evidencing interactions between the atom(s), ion(s), molecule(s) or assemblage(s) and other atomic, ionic, molecular, and/or assemblage species or between different parts of the ion(s), molecule(s) or assemblage(s). If the data is collected simultaneously in a plurality of data channels, then after data collection, the data in the data channels are calibrated or registered to align the data within the channels spatially and temporally. After data registration, data in one data channel, often times a primary data channel corresponding to the directly detected data, are scanned and an atom, ion, molecule or assemblage candidate or atom, ion, molecule, or assemblage candidates within the viewing volume or field that meet a set of detection criteria are selected. After candidate selection, the candidate data is smoothed, hybridized and differentiated. After or simultaneously, data from other data channels are scanned and related data are selected from these other channels, where the related data is data that evidences changes in a detectable property or an attribute or attributes thereof spatially, temporally, or otherwise related to the candidate data. Generally, the related data is data that evidences changes in a detectable property or an attribute or attributes thereof occurring within a neighborhood of each candidate. This related data is then analyzed, smoothed, hybridized and differentiated. The candidate data and their related data are then analyzed together to produce events. If the interactions are anti-correlated, then the candidate data and their related data are analyzed for anti-correlated events. Anti-correlation means that changes in the detectable property(ies) of the atom(s), ion(s), molecule(s) or assemblage(s) and opposite changes in the detectable property(ies) of the other atomic, ionic, molecular or assemblage species, such as a reduction in a donor intensity and a corresponding increase in acceptor intensity. After anti-correlation analysis, the anti-correlated events are classified as relating to one of a set of event types, such as a productive event type, a non-productive event type, a binding event type, a pre-binding event type, a group release event type, a mis-incorporation event type, a complexing event, a transition event, etc. For example, if the method is directed toward nucleic acid sequencing, the classification scheme includes a correct base incorporation event type, a mis-match or incorrect base incorporation event type, a binding event type, a pre-base incorporation event type, a proximity event type, a pyrophosphate release event, etc.
  • The present invention also provides a method implemented on a computer including the step of collecting data including a plurality of data channels representing fluorescent data from a plurality of fluorophores within a viewing volume or field. After data collection, the data within the data channels are calibrated or registered to align the data spatially and temporally, i.e., locations within the viewing field are matched between the channels. After data alignment, the data in a primary channel is scanned for the candidate fluorophores within the viewing volume that meet a set of candidate criteria. For example, if the system is a donor-acceptor system, then the primary channel is the donor channel. After candidate selection, the data associated with each candidate is smoothed, hybridized and differentiated. After or simultaneously, related data from the other channels are selected, where the related data is data within a neighborhood of each donor candidate that undergoes a change over time. After selection of the related data, the related data is smoothed, hybridized and differentiated. The candidate and related data are then analyzed together to identify events. The events are then classified. If the system is a donor-acceptor system, the related data is acceptor data and the donor data and the acceptor data are analyzed for anti-correlated events evidence by anti-correlated intensity shifts. After identification of anti-correlated intensity events, the identified anti-correlated events are classified as relating to one of a set of event types, such as a productive binding event, a pre-binding event, a non-productive binding event, etc. For example, if the method is directed toward determining base incorporation events, the classification scheme includes a correct base incorporation event, a mis-match or incorrect base incorporation event, a non-productive base binding event, a pre-base incorporation event, a proximity event, etc.
  • The present invention provides a system for characterizing events at the single molecule level, including a sample subsystem and optionally an irradiating subsystem for irradiating a sample in the sample subsystem. The system also include a detector subsystem for detecting and collecting data evidencing changes in a detectable property associated with an atom, ion, molecule or assemblage within the sample subsystem or within a region of the sample subsystem. The system also includes a processing subsystem that stores and processes the data collected by the detector. The processing subsystems uses methods of this invention to identify event and to classify the identified events. The classification is then related to aspects of the dynamic system being detected. For DNA, RNA or DNA/RNA hybrid sequencing, the classification permits identification of the base sequence of an unknown nucleic acid molecule. Although the system collects data in real time, the data processing can occur in real time, near real time or it can be processed later or both.
  • The present invention also provides a system for characterizing donor-accept fluorescent resonance energy transfer events at the single molecule level, including a TIRF or similar sample assembly, a detector system for irradiating the sample assembly with an incident light having a wavelength range designed to excite the donor fluorophores within a sample viewing volume and detecting fluorescent light emitted by emitters within the volume, where the emitters are the donors, acceptors activated by a donor via fluorescent resonance energy transfer (FRET), and background or nor donor/acceptor emitters. The system also includes a processing subsystem that stores and processes the data collected from the detector. The processing subsystems uses methods of this invention to produce a classification of detected fluorescent events. The classification is then related to aspects of the dynamic system being detected. For DNA, RNA or DNA/RNA hybrid sequencing, the classification permits identification of the base sequence of an unknown nucleic acid molecule. Although the system collects data in real time, the data processing can occur in real time or it can be processed later or both.
  • The present invention also provides a method for characterizing signals generated from molecular events at the single molecule level, dNTP or nucleotide incorporation fluorescent resonance energy transfer (dNTPFRET) events at the single molecule level, where the method includes the steps of collecting and storing pixelated data in a plurality of data fluorescent channels of a plurality of dNTPFRET events, reading the stored data, spatially registering or calibrating the data channels, identifying candidate single polymerase/primer/template complexes, selecting an n×n array of pixels including each identified candidate, selecting a plurality of “dark” pixels in the immediate neighborhood of the pixel array associated with each identified candidate for background removal, constructing a hybrid dataset for each candidate, smoothing the hybrid dataset, differentiating the hybrid dataset, determining anti-correlated donor and acceptor fluorescent events, separating true incorporation event from mis-incorporation and non-productive binding events and identifying one or a plurality of incorporated dNTPs corresponding to sequencing information associated with an unknown nucleic acid sequence.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The invention can be better understood with reference to the following detailed description together with the appended illustrative drawings in which like elements are numbered the same.
  • FIG. 1 depicts a graphical illustration of certain of the parameters that are used to define an event.
  • FIG. 2 depicts spot candidates displayed on an overlay picture of the viewing filed, where the accepted candidates are shown as large dots sometimes with gray boxes (green in a color image) and the very faint dots represent candidates rejected by staged filtering (in a color image, blue spots are candidates eliminated by the stage 1 filter and red dots are candidates rejected by the stage 2 and 3 filters).
  • FIG. 2′ is a black and white version of FIG. 2, where ⊕ represents accepted spots, ▴ represents spots rejected at stages 2 and 3 and represents spots rejected at stage 1.
  • FIG. 3 a depicts the intensity of the candidate pixel is below 3-na, the candidate is rejected.
  • FIG. 3 a′ is a black and white version of FIG. 3 a, where + represents the brightest pixel, ▴ represents background pixels selected for computing c and na, dashed square represent the 7×7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 3 b depicts the intensity of the candidate pixel is equal to or above 3-na, the candidate is accepted.
  • FIG. 3 b′ is a black and white version of FIG. 3 b, where + represents the brightest pixel, ▴ represents background pixels selected for computing c and na, dashed square represent the 7×7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 4 depicts a “poor” spot candidate passed through stage 1 filter.
  • FIG. 4′ is a black and white version of FIG. 4, where + represents the brightest pixel, ▴ represents background pixels selected for computing c and na, dashed square represent the 7×7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 5 depicts stage 2 filter.
  • FIG. 5′ is black and white version of FIG. 5, where the dotted line represents doubt* avgna value and the dashed dotted line represents the minc * avgna value.
  • FIG. 6 a depicts graphically the spot candidate filtering process of the stage 1 filter.
  • FIG. 6 a′ is black and white version of FIG. 6 a, where + represents the brightest pixel, ▴ represents background pixels selected for computing c and na, dashed square represent the 7×7 pixel area around the spot, dotted line represents the 3na cutoff level, dashed line represents the brightest pixel intensity.
  • FIG. 6 b depicts graphically the spot candidate filtering process of the stage 2 filter.
  • FIG. 6 b′ is black and white version of FIG. 6 b, where the dotted line represents doubt * avgna value and the dashed dotted line represents the minc * avgna value.
  • FIG. 6 c depicts graphically the spot candidate filtering process of the stage 3 filter.
  • FIG. 6 c′ is black and white version of FIG. 6 c, where the dashed dotted line represents minc2* avgna value.
  • FIG. 7 a depicts pixel values (9×9 neighborhood) after voting over average donor image.
  • FIG. 7 b depicts selection of single spots in an average donor image after voting.
  • FIG. 7 c depicts snapshot of grouped spots after voting and selection of the donor pixel.
  • FIG. 8 depicts histogram of an average intensity stack image.
  • FIG. 9 depicts donors detected using dynamic threshold and consolidated donors.
  • FIG. 10 a depicts the noise pixel traces are averaged into a single averaged noise trace (top graph), then its polynomial approximation is computed using least squares algorithm, and finally, the value of the polynomial is subtracted from every individual pixel trace.
  • FIG. 10 b depicts the value of the approximating polynomial is subtracted from donor signal pixels (above graph), the result is shown on the below graph—the horizontal line now represents the zero-level (mean of the background noise intensity distribution).
  • FIG. 10 c depicts the noise pixel traces from an acceptor channel are averaged into a single averaged noise trace (top graph), then its polynomial approximation is subtracted from every individual acceptor pixel trace.
  • FIG. 10 d depicts the value of the approximating polynomial is subtracted from acceptor signal pixels (above graph), the result is shown on the below graph—the horizontal line now represents the zero-level (mean of the background noise intensity distribution).
  • FIGS. 10 a′-d′ are black and white version of FIGS. 10 a-d, where the left panel represents the donor data, the middle panel represents the acceptor 1 data, the right panel represents acceptor 2 data, represent signal pixels, represents noise pixels, the solid square represents the 3×3 pixel area for donor signal pixels and the dashed square represents the 7×7 pixel area for donor noise pixels.
  • FIGS. 11 a-d depicts donor pixel selection.
  • FIGS. 11 a′-d′ are black and white version of FIGS. 11 a-d, where the top trace in each graph represents the original (non-background subtracted) signal and the bottom trace in each graph represent the signal after background subtraction—11 a presents the donor noise signals, 11 b presents the donor signal, 11 c presents acceptor noise signals, and 11 d presents the acceptor signals.
  • FIG. 12 depicts the intensity-based donor pixel selection algorithm.
  • FIG. 12′ is black and white version of FIG. 12 showing donor pixel selection, where the top panel represents the hybrid trace, the bar right below represents the donor lifetime, the remaining 9 panels represent individual donor pixel traces, and the grayed ones represent pixels rejected by pixel selection process. In the overlayed image, the symbol represents accepted pixels, the symbol represents rejected pixels, and the + symbol represent noise pixels.
  • FIG. 13 a depicts the intensity-based acceptor pixel selection algorithm.
  • FIG. 13 b depicts the derivative-based acceptor pixel selection algorithm.
  • FIGS. 13 a′-b′ is black and white version of FIGS. 13 a-b showing acceptor pixel selection, where 13 a′ represents intensity based selection and 13 b′ represents DAC-based selection. From top to bottom: Donor (with donor lifetime bar), Acceptor hybrid (with lifetime), 9 individual pixel traces, the grayed ones rejected by the selection process. In the overlayed image, the ⊕ symbols represent accepted pixels, the + symbols represent rejected pixels and the ⋄ symbols represent noise pixels
  • FIG. 14 depicts graphically the results of the donor and acceptor pixel selection process showing donor—acceptor 1—acceptor 2 overlays after pixel selection, where the ⊕ symbols represent accepted pixels, the + represent rejected pixels, the ⋄ symbols represent noise pixels.
  • FIG. 15 depicts donor model represent initial segments.
  • FIG. 15′ is black and white version of FIG. 15 showing donor model-initial stage selection, where the top panel represents donor, the darker curve is smoothed donor signal, the lighter represents original; grayed area represent donor noise level. In the middle panel, donor derivatives are shown, grayed area is its standard deviation. The bottom bar represents donor derivative “lifetime” used to set segment boundaries (vertical lines).
  • FIGS. 16 a-c depicts donor model represent optimizing segments.
  • FIGS. 16 a′-c′ is black and white version of FIGS. 16 a-c showing the donor model optimization.
  • FIG. 17 depicts donor model representing final stage optimization.
  • FIG. 17′ is black and white version of FIG. 17 showing the donor model final stage. The segmented curve represents suggested ‘donor high’ level, gray area around it represents ‘noise level in donor high state’. Bottom represents donor lifetime computed based on the donor model.
  • FIG. 18 depicts a numeric experiment using 17-point Savitzky-Golay smoothing filter.
  • FIG. 18′ is black and white version of FIG. 18, where the dark circle represents the middle sample, the dark squares represents samples being used together with the middle one to compute the polynomial (curve), the light squares represent samples not in use, and represents the value of the polynomial at the middle data sample location (smoothed value).
  • FIG. 19 depicts simulated data, simulated data after addition of noise and after smoothing of the noisy data to shown the veracity of the smoother.
  • FIG. 19′ is black and white version of FIG. 19, where the top panel represent a simulated signal (numbers showing duration in data samples), middle panel represents the simulated signal with added Gaussian noise, and the bottom panel represents smoothed signal.
  • FIG. 20 a depicts derivative anti-correlation for simulated non-noisy data.
  • FIG. 20 b depicts derivative anti-correlation of simulated moderately noisy data.
  • FIG. 20 c derivative anti-correlation of simulated heavily noisy data.
  • FIG. 20 a′-c′ are black and white version of FIGS. 20 a-c, where the top panel represents the donor signal, the middle panel represents the acceptor signal, and the bottom panel represents the DAC function—1 0 a′—no noise, 1 0 b—low noise (high S/N), 1 0 c—high noise (low S/N).
  • FIG. 21 depicts a smart smoothing process.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The inventors have found that a system including a method implemented on a computer can be constructed that is capable of collecting data corresponding to changes in a detectable property of one or more atoms, molecules, ions or assemblages within a viewing volume or field of an imaging apparatus such as a charge coupled device of a viewing volume or field. The method processes the single molecule level image data to identify and classify chemical events occurring at the atoms, molecules, ions or assemblages within a viewing volume. The inventors have found that the system and method are ideally well suited for collecting and analyzing DNA extension data derived from single molecule fluorescent events, especially single molecule fluorescent resonance energy transfer event between a donor associated with a replication complex and acceptors on incorporating nucleotides. Although the inventors have focused primarily on the use of the system and method for DNA sequence data collection and analysis, the system and method are capable of being applied to any single molecule level data corresponding to events occurring at atomic, molecular, ionic or assemblage sites. The inventors have found that the system and method are also well suited for detection formats with limited viewing fields such as TIRF limiting viewing field, wave-guide limited viewing field, channel limited viewing fields, or any other method of restricting the volume or field being detected by the detector or imaging apparatus.
  • The methods of this invention are well suited for detecting fluorescent resonance energy transfer (FRET) fluorescent events between a donor and an acceptor or plurality of acceptors, especially FRET fluorescent events associated with nucleic acid sequencing complexes including a donor labeled polymerase and an acceptor labeled nucleotide. For further details of sequencing technologies involving FRET strategies, the reader is directed to United States patent and patent application Ser. Nos. 6,982,146, and 7,056,661 and pending patent application Ser. Nos. 09/901,782 and 11/648,723, and abandoned patent application Ser. Nos. 11/007,642 and 11/648,107, incorporated herein by reference.
  • In certain embodiments, the inventors have applied the system and method to the identification and analysis of spots (fluorescent light) derived from individual DNA replicating complexes within a viewing field of an imaging apparatus. The method and associated software is designed to:
      • 1) correctly identify a position or location of each fluorescently active species in each data channel or quadrant view of a viewing volume or field. The identities are based on single molecule fluorescent properties including:
        • a) intensity of the fluorescent signal relative to background, and
        • b) size of an area associated with signal, e.g., number of pixels or data elements containing the signal, for each identified molecule, where the size can be fixed or adjustable,
          • where the background is determined locally from an average intensity of pixels or data elements surrounding each area, e.g., a ring of 2 pixels removed from the area that define the “core” of each signal, where the background element selection criteria can be fixed or adjustable;
        • 2) correlate or register positions of molecules in each quadrant or data channel to determine whether a molecule in one quadrant is the same molecule observed in another quadrant. The correlation or registration of molecules within each quadrant or data channel is facilitated by placing a grid on the viewing volume for overlap and proper correlation or registration. Correction algorithms such as rubber sheeting software can be used to correct for image distortions in the different quadrants or channels;
        • 3) track and graphically present information about:
          • a) a length of time a fluorophore is detected, and
          • b) an intensity of the fluorophore over the time period (length of time the fluorophore is detected);
        • 4) plot intensities ratios between molecules observed in each quadrant or channel (signal intensities observed in each quadrant for an individual location in the viewing field, which corresponds to fluorescing species associated with the location such as a donor labeled replication complex and incorporating labeled nucleotides). This step really starts the base identity analysis of this method. The ratios are used to determine a confidence of a base call, i.e., each base call is assigned a confidence value.
        • 5) time correlate spot data. For a spot to be a TRUE sequencing complex, there should be a connection of data points over time, producing a line of data associated with a single active replicating complex or sequencing source. Timing associated with the data line creation is generally an adjustable feature of the software, but can be fixed for system run under substantially similar conditions, conditions that generate data the is consistent and substantially reproducible. Timing generally depends on reaction conditions such as buffer, substrate concentration, enzyme concentration, temperature, viscosity, template and primer sequences, etc.). Timing of modified or labeled nucleotide or monomer incorporations will also be used to assign a confidence value to a base call. For example, when the donor can move out of the viewing volume during base extension, e.g., a system where the primer or template is immobilized on a surface or confined in a structure, then a penetration depth of light via TIRF (100 nm) generally permits detection of about 300 incorporation events per site, but for other systems, the number of detectable events may be in the thousands.
        • 6) identify evidence of true incorporation events. Depending on the fluorophore or linker-fluorophore-nucleotide combination used and on the detection system configuration, a TRUE incorporation event is evidenced by wavelength shifts and intensity changes in the donor and acceptor channels (e.g., intensity increases for acceptors and intensity decreases for donors) during nucleotide incorporation and pyrophosphate (PPi) release. The donors are monitored and serve to punctuate an incorporation event. During FRET, the donor intensity is decreased (or be eliminated—decreased to zero). Thus, FRET events between a donor and acceptor result in a decrease in donor fluorescence and an anti-correlated increase in acceptor fluorescence.
        • 7) determine and map localized signals. As the nascent DNA strand grows, its signal are NOT extended beyond the original 4 pixel area (assuming a 16 pm pixel size). Thus, the program may compare positional information between early and late data. Similarly, movement from an immobilized elongating molecule are not spread across more pixels.
        • 8) substrate bursts of light not associated with a sequencing complex from the data file to reduce analysis time.
        • 9) classify backgrounds. For certain sequencing systems, the background, data from pixels or data elements in the solution surrounding a replicating complex, may become fairly standard or known. Thus, for a given system, the background may eventually become a known or standardized quantity. The background signal can then be used to set starting values and less computational time will need to be expended in determining localized background.
  • The present invention broadly relates to a system for collecting and analyzing chemical and/or physical event data occurring at one or a plurality of locations withing a viewing volume or field of an imagining apparatus. The system including a sample subsystem for containing a sample of be detected an analyzed, where the sample includes one atom, molecule, ion and/or assemblage or a plurality of atoms, molecules, ions and/or assemblages, at least one having detectable property that undergoes a change before, during or after one or a sequence of chemical and/or physical events involving the atom, molecule, ion or assemblage. The system also includes a detection apparatus having a viewing field that permits the detection of changes in the detectable property of one atom, molecule, ion and/or assemblage or a plurality of atoms, molecules, ions and/or assemblages within the viewing field. The system also includes a data processing subsystem connected to the imagining for collecting, storing and analyzing data corresponding to the chemical and/or physical events occurring at definable locations in the viewing field involving one or more atoms, molecules, ions and/or assemblages within the viewing field of the imagining subsystem. The data processing subsystem converts the data into classifications of events according the event type determined by a set of parameters defining or characterizing each event type.
  • The method broadly includes the step of receiving data from the detection apparatus comprising one or a plurality of data channels. The data channels can represent data associated with different parts of the viewing field of can represent data from the same viewing field, but separated by attributes such as frequency, intensity, phase, attenuation, flux density, any other detectable property, and mixtures thereof. Once the data is received, the data from each channels is stored. After and simultaneous with storage, the data in each data channel is registered or calibrated. This process matches locations in one data channel to corresponding locating in the other channels. Often times, the data in different channels does not directly line up, i.e., a location in the data in one data channel is not coincident with its corresponding location the another data channel. This distortion may occur over the entire image, in portions of the image, or may vary across the image. The registration process makes sure that all locations are registered between the channels—each location in one channel directly corresponds to the same location in all the other channels. If one data channel is a primary channel, then the primary channel data is analyzed to identify localized areas or regions—spots—within the viewing field that evidence a given value of the detected property. For example, if the primary channel represents immobilized or confined components of a reaction system such as a DNA replication complex, then the data in the primary channel is analyzed to locate the confined or immobilized components within the viewing field. Simultaneously or subsequently, data in the other channels is analyzed to determine if data in the other channels can be related to the spots in the primary data. If a spot is associated with a reactive species, then the other channels should include data evidencing reactions involving the identified reactive species. Otherwise, each data channel is analyzed for such localized areas or regions—spots, and locations are identified in which data in some or all of the channels evidence reactions—changes in detectable properties over time at each spot. Once the active spots and related data have been identified, then the event data is classified into a set of event types. After classification, a time profile of events occurring at each active site is determined. The time profile of events is then output to the user. This time profile can evidence a single event or a sequence of events. For sequences of events, the sequence can correspond to a sequence of monomer additions, a sequence of catalytic reactions, a sequence structural changes, a sequence of monomer removals, etc.
  • In certain embodiments, the present invention broadly relates to a method for analyzing fluorescent resonance energy transfer (FRET) events corresponding to interactions between a donor fluorophore associated with a first molecule or assemblage and an acceptor fluorophore associated with a second molecule or assemblage, e.g., a donor fluorophore associated with a member of a polymerase/template/primer complex and acceptor fluorophores associated with nucleotides for the polymerase. The method includes the step of collecting or receiving data from a viewing volume of an imagining apparatus such as an CCD or iCCD detection system, in real time or near real time. The data can be in a single data channel or a plurality of channels. In most embodiments, the data is collected in a plurality of data channels, each data channel representing a different frequency range of emitted fluorescent light, e.g., one channel can include fluorescent light data emitted by a donor, a donor channel, while other channels include fluorescent light data emitted by an acceptor channel, an acceptor channel, or by another donor, a second donor channel channel. In certain embodiments, a channel will exit for each different fluorophore being detected simultaneously. For DNA sequencing and in certain embodiments of the methodology of this invention, the number of data channels monitored is five (5). In other embodiments, the number of data channels monitored is four (4). In other embodiments, the number of data channels monitored is three (3), where three generally represents a minimally configured system. However, two (2) channels can be used provided that the acceptors are selected so that they can be separately identified based on detectable attributes of their signals e.g., intensity, frequency shifts, signal duration, attenuation, etc.
  • After data collection, the separate data channels are spatially correlated within the viewing volume so that active fluorophores can be spatially and temporally related, called calibration or registration. The goal of calibration is to determine the pixel coordinates in each quadrant that correspond to a single position on the slide or a single location within the viewing field—to make sure that the data in each channel is spatially coincident over the viewing field and through time of detection. For most of the data collected on the imaging systems used by the inventors, the inventors have been able to determine empirically that location distortions between channels comprises almost exclusively translations and rotations. In other systems, the distortions may be translations, rotations, shearing, stretching, compressing, screwing, twisting, etc. and the calibrating process must be able to register the data between the channels so that locations within one channel correspond to the same locations in the other channels.
  • The calibration procedure includes two principal components. Both components utilize image files comprising an average over a set of frames of a data stream from a data channel, where the set of frames can be the entire data stream collected or any subset thereof. A frames is data collected by the imagining apparatus over a given short period of time that is received by the processing unit and assembly into a temporal data set for each data channel. The frames generally represent average data over the collection time period depending on the imagining apparatus data collection and transmission speeds.
  • The first component is a visual tool that allows the quadrants or data channel averaged data or cumulated image to be overlaid with transparency to quickly check data alignment. This tool was constructed using standard MATLAB libraries.
  • The second component is an automated tool based on maximizing mutual information across the quadrants or data channels. Mutual information quantifies the predictive power that one image has for another. For example, knowing there is a bright spot in one quadrant should mean that there is a corresponding bright spot in one or more of the other quadrants or data channels. The component determines and outputs the rotation and translation operators that when applied to the data in one or more the channels produces the greatest mutual information between the quadrants.
  • This calibration process produces improved data calibration or registration. The process avoids the effects of individual pixels having poor brightness, spurious or missing data or other noise. The program encoding this second component was written in C++ and includes libraries from the standard ITK project libraries.
  • The method then includes the step of reading a configuration file and a corresponding open log file. After reading the configuration file and the open log file, calibrations, if any, are loaded from the command line. After loading the calibration information, a corresponding directory is read as specified in the command line with all subdirectories, for each one. This read step includes: (1) scanning for calibration stacks, and if there are some not matched by the available calibrations, generate new calibrations out of them; (2) scanning for stacks; if there are some, assume this directory is a slide; and (3) scanning the directory path for a date and slide name comprising reaction conditions such as donor identity, acceptor identity, buffers, etc.
  • The method also includes the step of looping over all stacks for every slide. The looping step includes: (1) finding calibration data by date and frame dimensions; (2) averaging all the donor frames in the stack or averaging the donor frames over an adjustable number of frames in the stack; (3) finding spots in the averaged donor data or quadrant; (4) applying the calibration data to the acceptor channels to find acceptor coordinates corresponding to each found donor spot; (5) identifying a 3×3 pixel array associated with each found donor spot in the donor and acceptor channels (although the method has been tuned to use a 3×3 array, the method can use smaller and larger array and the array size will depend on the detector system and on the system being detected); (6) collecting traces for each pixel in the array over the frames in the averaged data; (7) applying a pixel selection algorithm to the pixels in the array to select pixels that have a value above a threshold value; (8) averaging the selected pixels to form hybrid traces (signals); and (9) checking the donor traces for minimal requirements on lifetime and average intensity; and (10) discharging any found donor spots and associated acceptor data that does not meet these criteria.
  • The method also includes the step of computing the acceptor “lifetimes” for each found donor spot using two different smoothing algorithms, a regular Savitzky-Golay smoother, which is adapted to identify short-lived, sharp signals, and a smart smoother, which is adapted to identify long-lived, weak signals and “broken” signals.
  • The method also includes the step of creating lists of acceptor events from the identified acceptor lifetimes.
  • The method also includes the step of adjusting boundaries of the acceptor events using numeric derivatives using a similar Savitzky-Golay process to achieve maximum correlation/anticorrelation with the donor.
  • The method also includes the step of computing a set of parameters for every acceptor event and assigning the every acceptor event a score based on these parameters as described below.
  • The method also includes the step of joining adjacent segments from the acceptor event lists, and find and resolve overlaps (if any) as describe in detail below. For instance, if there is a long event overlapped by several shorter events, check their scores as to decide which case describes the data better: one large event or a series of smaller ones.
  • The method also includes the step of using the resulting acceptor event list as a list of FRET event candidates: for every candidate, compute a set of FRET event parameters, such as FRET efficiency, acceptor and donor signal to noise ratios, probabilities, boundary anti-correlation coefficients, etc. as described in more detail below. The method determines if these parameters meet minimal criteria (specified in the configuration file), and if they do, accept this candidate as a FRET event for output.
  • The method also includes the step of sorting spots of the current stack by how “event-reach” they are, and output an event-list for the whole stack. Also, add the detected events to the slide's event list. The method also includes the step of after finishing with all the stacks in the slide, generating the combined report containing results from every spot of every stack in the slide.
  • Another embodiment of the methodology of this invention is described below.invention
  • Main Routine
  • The process states with the construction of workspace and data structures to support the analysis. The workspace includes configurational data, current state information such as slide/stream information stored in a separate structure, data result structures, etc.
  • Next, the process reads the default configuration file, if present in the same directory. The configuration file includes a set of configurational parameter data, which are throughout the process by the routine to find needed configurational data. The process then scans a command line for a log file of options. If a log file is present, then the process opens the specified log file. If the log file is not present, then the process attempts to open a log file in the directory specified by the configurational parameter data. If no log file is found in this directory, then the process attempts to open a log file in the current working directory. If that fails, the process exits with an error message. The log file is opened with shared reading options, which is required for proper inter routine communications and proper interactions with Windows operating system routines.
  • The process then checks the command line for the first argument, which is supposed to be a sub-directory in the source root directory, specified by the configurational parameter data. If not present, the process prompts the user to enter the sub-directory from the standard input (generally a keyboard).
  • If the command line has more than one argument, parse the extra arguments. The extra arguments can be either additional configurational files, an user-specified log file, or a no calibration flag. The last option overrides configurational parameter data, and specifies whether the routines in the process are allowed to use the cached calibrations either found in the calibration directory given in the configurational parameter data or default calibrations given in the configurational parameter data separately for each frame size. If the configurational parameter data or the command line sets no calibration flag on, instructs the process not to use the cached calibration data. In this case, original calibration stacks must be present in the directory starting with date of the slide, and a new calibration is generated every time subsequent routines require calibration. If the calibration stacks are not present, the process fails with the error message “No calibration present”.
  • If the first command argument (or user input) is a valid and corresponds to an existing subdirectory in the source data structure directly, then the process recursively scans the subdirectory for stacks/slides data. The process then clean up and exits.
  • Process Directory Routine
  • This routine scans the directories for calibration data and slide information. The routine then constructs corresponding output directory names. Assuming the current directory correspond to data derived from a slide, the routine reads the list of stack files contained in the directory. If the list is not empty, the routines processes each stack file. The routine then reads the list of FITS files (FITS files stand for Flexible Image Transport System files) and generates slide wide statistics for reporting purposes. The routine then reads the list of associated sub-directories, and call processes the subdirectories recursively extracting the data contained in the subdirectories.
  • Scan for Calibration Data Routine
  • If directory name start with the proper date pattern, then the routine reads the date pattern from the directory name; otherwise, the routine returns control to the calling routine. The routine then scans the directory configurational parameters data for calibration data matching the date pattern and downloads any matches found. The routine next scans the current directory for stack and fit data files containing no more than 3 planes or frames of data. The routine then checks if calibration data for the given frame size and date is present. If the calibration data is not present, then the routine queues the file for generation of new calibration data. A queue is necessary because there can be more than one calibration stack so that the routine implemented in add calibration data can chose the best calibration stack by comparing the number of donor spots detected in each stack. The calibration data is generated in context and is represented by a data structure containing overlays and spot lists from each quadrant, generated by the find spot routine described herein. The routine then checks the calibration queue, and generates calibration data via a generate calibration routine that determines the transformation needed to register pixel locations in one channel with corresponding pixel locations in the other channels. The transform is generally comprised of simply a translation and a simply rotation. However, the transformation can be much more complex and is constructed to map pixels from one channel into corresponding pixels in other channels.
  • Generate Calibration Data Routine
  • The routine starts by opening a stack file. The routine then applies non-standard geometry settings if specified. The routine then checks to ensure that the file is valid, i.e., the file includes 16-bit non-compressed data, has a known frame size, has enough frames, and has an ok integration cycle time. Search for calibration data associated with the frame size and the date/time of the file collection. The calibration is cached as defined above. If all conditions are met and the calibration is found, then allocate the data structures needed for detection processing and forward control to the stack processing routines.
  • Process Stack Routine
  • The stack processing routine reads and averages frames from the stack file to generate an overlay. The routine then generates an overlay picture for the donor quadrant and searches for donor spots in the donor quadrant using the find spot routines. The routine then uses the existing picture object to mark the initial donor spots.
  • The routine then creates signal to noise structures for individual pixel traces, one per channel per spot. The routine then applies the calibration transform to register the acceptor pixel coordinates to the donor channel pixel coordinate system. The routine then reads the stack file again, collecting data samples at each frame for the identified pixel traces. For each spot, the routine applies the hi-pass filter to the donor traces and performs the donor pixel selection and generates the donor hybrid traces.
  • Next, the routine applies the hi-pass filter to acceptor traces, and performs the Acceptor pixel selection and generates acceptor hybrid traces. The acceptor hybrid trace routine is repeated for each acceptor channel. The routine then stores the hybrid traces into a signal structures, which is stored as part of the signal to noise structures.
  • The routine then filters out spots that do not satisfy the donor lifetime and the donor S/N ratio conditions from the initial data file. The routine then generates an overlay picture of the donor quadrant with spots found/filtered out. The routine then writes the results as the list of donor spots.
  • The routine then sends the list of donor spots to the FRET analysis routines. Next, the routine generates an overlay picture of the donor quadrant with active spots, and outputs text data files related to the current stack.
  • FRET Analysis Routine
  • The FRET analysis routine first allocates structures to keep the results from the analysis. The routine then, for each spot in the donor spot list, makes a separate array of signal structures by copying the signal data structure from the input signal to the noise data structure previously stored. The FRET analysis routine then calls the create donor model routine. The create donor model routine then adds a dynamic list of acceptor data traces from corresponding pixels in the acceptor channels. The FRET analysis routine then generates a list of FRET event candidates from the donor spot list. The routine then stores the resulting event list into previously allocated data structures. The routine then counts the number of high probability events and low probability events in the list, and determines the highest probability to set a spot efficiency entry on the current spot. The routine then sorts the arrays based on spot efficiency entry, the number of high probability events, the number of low probability events, and the highest probability. The index within this sorted array becomes the spot ranking.
  • For each spot, the routine creates a list of donor events by calling a construct donor events routines. This routine computes adjusted donor lifetimes by calling a compute adjusted lifetime routine. The routine then stores all the data such as event lists, noise level, donor lifetime, adjusteddonor lifetime, etc. into a previously allocated entry in the spot list structure, associated with current slide. The stored information becomes persistent across the whole slide, while the rest of data is deallocated.
  • For each spot, the routine detects donor around events stuff, and store it into a slidewide persistent area and generates signal and FRET detection trace pictures if necessary. The routine then generate as rich spot file that contains spot info for so-called rich spots. A rich spot is a spot that contain at least one FRET event. The routine also generates an activity picture, with the rich spots colored.
  • SIGNAL Data Structure
  • The signal data structures is a data structure containing hybrid traces of one of the channels, donor, acceptor 1, acceptor 2, etc. The elements of the data structure include:
  • accno channel number - 0 for the donor channel, 1 for the first acceptor
    channel, 2 for the second acceptor channel, etc.
    x, y spot coordinates - coordinates of the middle pixel of the 3 × 3 pixel
    spot array
    mask bit mask indicating which individual pixels from the 3 × 3 area were
    included in constructing the hybrid trace
    nsamp number of data samples in the trace (same as number of frames in
    the stack file)
    nlvl noise level computed as standard deviation (sometimes scaled by a
    factor) of the noise channel
    *sigbuf buffer containing hybrid trace data samples
    *noise buffer containing hybrid noise data samples
    ACC-DETECTOR *first first element in the list of additional data structures, usually related
    to a particular detection algorithm
  • ACC-DETECTOR Data Structure
  • The ACC-DETECTOR data structure containing additional information about a hybrid trace, such as intermediate data from different types of detectors, simulation data or donor model. The data structure includes the following elements:
  • struct tag_ACC_DETECTOR *next a pointer to the next ACC_DETECTOR object in the
    list, or NULL if this is the last object
    detector detector type, one of the following:
    0 undefined detector type
    DETECTOR_LONG long lived event candidate detector
    DETECTOR_SHORT short lived event candidate detector
    DETECTOR_DONOR_MODEL Donor Model
    DETECTOR_SIMULATION Simulation data (such as original trace before
    blending with noise)
    nlvl noise level used in particular computations (usually
    is the nlvl from SIGNAL scaled by a
    *sigsmooth Hybrid trace data after smoothing
    *sigder Digital derivative
    *life Lifetime buffer indicating which data samples
    represent on or off state of the channel.
    double stdac Standard deviation of the derivative
    void (*destructor)(struct-tag-ACC- pointer to a function which is called wnen the object
    DETECTOR *ad) is deallocated. An actual implementation of
    ACC_DETECTOR object may contain some extra
    data, which is sometimes allocated dynamically.
    Since the control logic is not aware of such data, an
    implementation-specific code must be provided to
    handle that. When the standard delete_acc_detector( )
    function is called, it checks whether this pointer is
    not NULL, and if so, calls that function, which is
    supposed to take care of any implementation-specific
    de-initialization.
  • When a routine (such as a detection routine) needs to associate some extra data with a given signal, the routine constructs an ACC-DETECTOR object, and adds it to the list of ACC-DETECTOR objects, pointed to by ‘->first’ member of the SIGNAL data structure.
  • Construct Donor Model
  • The model constructs a Smart Smoother object for subsequent operations via construct smart smoother routine. The routine allocates a donor model object. The model smart smoothes the original donor trace, and then compute its first derivative using a Savitzky-Golay (SG) fitting routine. The model then computes a standard deviation of the derivative and stores it in the donor model object. This derivative will be used to detect slow changes in the donor trace.
  • The model then calls a donor lifetime routine to compute the donor's derivative lifetime. It computes another “finer” derivative of the original trace using a different SG smoother to detect fast changes in the donor trace. The model then computes segments, where both derivatives go outside their standard deviations either way (positive or negative), and then combines detected segments from both processes.
  • The model then the results representing segments, where fast donor changes were detected (high derivative value) are stored in life time buffer.
  • The SG-smoothed original donor trace is stored in signal smoothed buffer for subsequent operation using the SG-smoother from the Smart Smoother object.
  • The model then calls a routine to create initial static segments, which examines each segment having a high-derivative value, to find the sample index at which the change is highest (max/min derivative), and to break down the entire donor trace into segments with the boundaries set at those ‘high-change’ points.
  • The model typically creates a large set of tiny segments, which need certain types of optimization to determine if neighbor or adjacent donor segments (i.e., donor segments to the immediate right or left of a particular donor segment) are substantially different. If adjacent segments are not substantially different, the adjacent donor segments are joined into a single larger segment. The term substantially different is determined by applying a variety of criteria, such as close enough average value, a tiny segment in between two larger ones with close averages, etc. In addition, the model decides whether each segment represent donor on state or donor off state.
  • Finally, the model iteratively calls a finalize donor model routine a few times (each time the routine iteratively improves the segment joining process) to compute final donor lifetimes and to construct a best polynomial fit of the appropriate donor segments.
  • Detect Acceptor Events Routine
  • For each acceptor channel, the routine calls a subroutine to generate a list of long lived acceptor event candidates using the long lived event detection algorithm, an algorithm optimized to identify long lived events. Next, the routine calls a subroutine to generate a list of short lived event candidates using the short lived event detection algorithm, an algorithm optimized for to identify short lived events. The routine then join all the event candidate lists into a single event candidate list, where the total number of candidates in the list is 2 times number of acceptor channels—long lived events and short lived events per channel. The routine then calls a subroutine adapted to exclude conflicting entries in the joint list of event candidates as describe below. The routine then returns list of event candidates to its calling routine.
  • Detect Long Lived Acceptor Events Routine
  • This routine constructs a Smart Smoother object. The routine first checks to determine whether ACC-DETECTOR objects of type DETECTOR-LONG are already attached to both donor and acceptor SIGNAL objects. If not, the routine create new ones, fills them with smoothed data, and attaches the objects the SIGNAL objects. The routine operates by calling a static routine to determine rough acceptor lifetimes to fill the lifetime buffer. Zero values in the lifetime buffer represent signal in the channel that are in an OFF state, while non-zero values in the lifetime buffer represent signal in the channel that are in an ON state. The routine then reads the acceptors events from the lifetime buffer to create an initial array of event candidates stored as ACC-EVENT objects by scanning for non-zero segments in the lifetime buffer. The routine then optimizes the acceptor event segments by joining adjacent segments iteratively based on a set of joining criteria to form joined acceptor event segments. This process is a more thorough test to determine whether adjacent ‘on’-segments should be joined together because they belong to a single event, accidentally broken apar by noise spikes. The routine then calls a subroutine to determine and adjust event boundaries, where the subroutine uses the Derivative Anti-correlation (DAC) function to adjust boundaries of the event candidates.
  • For each event candidate, the routine also computes a variety of event parameters like average intensities, signal to noise ratios, etc., and compute an event, which is used later to evaluate how “good” this event candidate is. The event score is computed in static according to the following formula:

  • f*sqrt(x1*x1+x2+x3)−0.5
  • where x1 is the acceptor signal to noise ratio, x2 is the product of differential acceptor and donor signal to noise ratios at the beginning and x3 is the product of differential acceptor and donor signal to noise ratios at the end of the event. If the product is negative, it is multiplied by −0.25. The coefficient f depends on the event duration and is computed according to the following formula:

  • 1.+2.*(1.−exp(−dl*dl))
  • where dl is the ratio of the event duration to a long scan distribution parameter in the configurational parameter data. The coefficient f is to provide a configurable boost to the score of longer lived events.
  • The routine then cleans up and return the resulting list of acceptor events to its calling routine.
  • Detect Short Lived Acceptor Events Routine
  • This routine constructs SG smoother objects for a signal trace (function) and its derivative. First, the routine checks whether acceptor detector objects of type short lived detector objects are already attached to both donor and acceptor SIGNAL objects. If not, the routine create new ones, fills them with smoothed data, and attaches them to the appropriate SIGNAL objects.
  • The routine operates by calling a static subroutine adapted to to fill in a lifetime buffer. Zero values in the lifetime buffer represent channel signals in an OFF state, while non-zero values in the lifetime buffer represent channel signals in a ON state. Next, the routine calls a subroutine adapted to join lifetime segments, which comprises segments separated by short interruptions, generally by noise.
  • The routine then calls a subroutine adapted to split up lifetime segments, which were unjustifiably joined by accidental noise or smoothing algorithm peculiarities. The routine then calls a subroutine to create and initial array of event candidates stored in an acceptor event objects by scanning for non-zero segments in the lifetime buffer. Next, the routine calls a subroutine to adjust short event boundaries, where the subroutine uses the Derivative Anti-correlation (DAC) function to adjust boundaries of the event candidates.
  • For each event candidate, the routine calls a subroutine adapted to compute a variety of event parameters like average intensities, signal to noise ratios, etc., and compute the event acceptor score, which is used later to evaluate how “good” this event candidate is.
  • Similar to a long lived event score, the acceptor event score is computed in according to the formula:

  • sqrt(x1*x1+x2+x3)−2.0
  • where x1 is the acceptor signal to noise ratio, x2 is the products of differential acceptor and donor signal to noise ratios at the beginning of the event and x3 is the product of differential acceptor and donor signal to noise ratios at the end of the event. If the product is negative, it is multiplied by −0.25. If the event is in the beginning of the trace, x2 is forced to the value of 2.0; likewise, if the event is at the end of the trace, x3 is forced to the value of 2.0. This forcing value process reflects the fact that the anti-correlation status is not known under these circumstances.
  • The routine then cleans up and return the resulting list of acceptor events to its calling routine.
  • Resolve Acceptor Event Overlap Routine
  • The purpose of this routine is to eliminate overlapping event candidates from the list of acceptor events. The routine first sorts the input array of event candidates in order of event starts. Next, the routine breaks down the array into sub-arrays containing conflicting areas. The routine operates by adding a first event to the current sub-list. The routine then iterates over subsequent events until no events overlap with any events in the sub-list, adding each overlapping event to the list. If no new overlapping event are found, the routine closes that sub-list, selects an event and creates a new sub-list of overlapping events. The routine repeats this process until all events have been processed, creating a set of sub-lists including overlapping events. The sub-lists contain a set of conflicting (overlapping) event candidates, but each sub-list is independent of events in any other sub-list, i.e., the sub-lists are distinct with no shared events.
  • For each conflicting or overlapping area sub-list, the routine calls a subroutine to find best rated non-conflicting sub-list of event candidates. The routine operates by sorting events in the conflicting sub-list by their acceptor event score. Next, for every event in the sub-list, the routine constructs a further sub-list containing only events, which do not conflict with the starting event. The routine then compute the resulting score of every sub-list as the sum of adjusted scores of their events, then selects the sub-list with the highest adjusted score.
  • The ‘adjusted score’ is computed according to the following formula:

  • score*2.0*bias
  • where score is the acceptor event score and bias is the configurational parameter data element biasN (N is the acceptor channel number) and is set to baisN for segments from the long lived routine or 1-biasN for segments from the short lived routine. Using this process, it is possible to manipulate scores and eligibility of events identified in the short lived detection routines versus events identified in the long lived algorithm by adjusting the value of the parameter biasN.
  • After resolving overlapping event data, the routing join the non-conflicting sub-lists into a single list of event candidates, and return control to its calling routine.
  • Detect FRET Events Routine
  • The purpose of this routine is to compute FRET event parameters for every input event candidate. The routine also applies certain basic criteria to filter out any obvious non-events or trash events.
  • The routine operates by computing DAC functions based on derivatives from the acceptor detector objects of type short lived events. The routine then creates a ‘finer’ SG-smoother/derivative, and compute DAC functions based on the smoother output.
  • Next, for every event candidate, the routine adjust event boundaries. If the resulting duration does not exceed a parameter maximum short event in the configurational parameter data, the routing repeats event boundary adjustments with the ‘finer’ DAC functions.
  • Using finer DAC functions to analyze short lived events is necessary to avoid problems such as the 6-frame problem. The six frame problem occurs with standard smoother used to analyze short lived signals. The DAC functions, which are based on donor and acceptor derivatives, have peaks at the event boundaries, and the peaks are not infinitely narrow, but have certain widths. If the event duration is less than or equal to about two times the boundary widths, then adjusting the event boundaries using the standard smoothing routines gives inaccurate results. As the event duration gets shorter, the adjusted duration does not, which creates certain errors. To reduce these errors to a tolerable level, ‘finer’ digital derivatives/DAC functions are used.
  • Next, for every event candidate, after basic FRET event parameters (e.g., start, duration, acceptor number) are set, the routine computes a whole set of parameters, associated with FRET events.
  • Then, for every FRET event, the routine determines if the computed probability is smaller than an desired or allowed minimum value given in the configurational parameter data as the low probability limit. If the probability of the event is less than the low probability limit, then the event is removed from the final FRET event list. The routine then compacts the FRET event list.
  • The routine then sets the parameters il and ir for each event. The parameter il is the acceptor intensity at the beginning of the event, while ir is the acceptor intensity at the end of the event. The routine sets the values of il and ir equal to the average acceptor intensity value during the event, if the duration or length of the event is less than 20 frames, set both values equal to average acceptor intensity. Otherwise, the routine first best fits the acceptor trace during the event with a straight line. The routine then set the value of il to the value of the straight line at the beginning of the event and the value of ir to the value of the straight line at end of the event. Of course, an ordinary artisan can recognize that the best fit routine can be to a polynomial of any dimension, provided that il and ir are set to the polynomial values at the beginning and end of the event, respectively.
  • Finally, the routines performs cleanup operations and returns the FRET event list to is calling routine.
  • The process of this invention utilize routine that in certain embodiment includes data structures having the following data.
  • Output File Format
  • slide events data
  • The following table tabulates the slide event data stored in the data structures.
  • Label Name Description
    Stream Stream ID. Normally, 2-digit number taken from the stack file name. For
    example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    Start Start of the event in ms.
    Length Duration of the event in ms.
    Acc Acceptor number of the acceptor causing the event. Currently can be either 1 or
    2, but in the future releases it will also take values 3 and 4.
    Prob Event probability. A value in the range 0 . . . 1.0, indicating how “good” is the
    event, that is, how reliably it is detected. The closer the value to 1, the more
    reliably the event is detected.
    FRETEff FRET Efficiency computed as AiSN/(AiSN + DSN), where AiSN is the
    acceptor signal to noise ratio AiInt/AiNL (i is the acceptor number, same as
    Acc), and DSN is the donor dark state signal to noise ratio, either DLR/DNLC
    or DRL/DNLC, depending on which difference is higher, DLL − DLR or DRR −
    DRL.
    Style Event style. Possible values are:
    0 - No correlation between donor and acceptor of any kind (both LACC and
    RACC are above −1 but below 2);
    1 - Positive correlation at least at one end (either LACC, or RACC, or both are
    below −1, while none of them is above 2);
    2 - Negative (anti-) correlation at one end (one of the LACC or RACC is above
    2, while the other is not);
    3 - Negative (anti-) correlation at both ends (both LACC and RACC are above
    2.)
    Hi Indicates whether the event is hi-prob. If Prob is greater than the value of the
    configurational parameter hi_probi, then Hi is 1, otherwise, 0.
    LACC Anti-correlation coefficient on the left (at the start of the event), calculated as
    product of acceptor signal to noise ratio AiInt/AiNL and donor differential
    signal to noise ratio (DLL − DLR)/DNLL.
    RACC Anti-correlation coefficient on the right (at the end of the event), calculated as
    product of acceptor signal to noise ratio AiInt/AiNL and donor differential
    signal to noise ratio (DRR − DRL)/DNLR.
    Dark Average donor intensity during the event.
    DonProb Donor “probability” computed as
    (1. − exp(−DSN:′ * WTD)) * (1. − exp(−2 * (DInt/DNL)2))
    where DSN is donor differential signal to noise ratio, either (DLL − DLR)/
    DNLL or (DRR − DRL)/DNLR, whichever is higher;
    WTD is a coefficient equal to 0.4 for short events (shorter that configurable
    max_short_event), or 0.71 for long events;
    DInt is either DLL or DRR, depending on which differential signal to noise
    ratio is higher.
    Ac1Prob Acceptor 1 “probability” computed as
    1. − exp(−(A1Int/A1NL)2 * WT1),
    where WT1 is a coefficient equal to the product of the configurable parameter
    wt_ac1 and a value of 0.4 for short events or 0.71 for long events.
    Ac2Prob Acceptor 2 “probability” computed as
    1. − exp(−(A2Int/A2NL)2 * WT2),
    where WT2 is a coefficient equal to the product of the configurable parameter
    wt_ac2 and a value of 0.4 for short events or 0.71 for long events.
    NL Number of donor data samples preceeding the start of the event, that were used
    to calculate DLL (see below.)
    DLL Donor Intensity right before the start of the event. If NL is large enough (larger
    than 20), an average is computed, otherwise, a peak value of fine-smoothed data
    less DNL/,/v2.
    DNLL Donor noise level right before the start of the event. Normally taken from the
    donor model, and is equal to the standard deviation from the polynomial fit at
    the corresponding donor segment.
    DLR Donor Intensity right after the start of the event.
    DNLC Donor noise level during the event. It is taken from the donor model, and
    frequently equal to DNL.
    DRL Donor Intensity right before the end of the event.
    NR Number of donor data samples following the end of the event, that were used to
    calculate DRR (see below.)
    DRR Donor Intensity right after the end of the event. If NR is large enough (larger
    than 20), an average is computed, otherwise, a peak value of fine-smoothed data
    less DNL/,/v2.
    DNLR Donor noise level right after the end of the event. Normally taken from the
    donor model, and is equal to the standard deviation from the polynomial fit at
    the corresponding donor segment.
    DNL Donor Background Noise Level. Computed as the standard deviation of the
    donor “noise” hybrid trace.
    A1Int Average (for long events, longer than max_short_event) or peak acceptor 1
    intensity during the event.
    A1L Acceptor 1 Intensity at the start of the event. Computed by modeling acceptor
    with a straight line best fit.
    A1R Acceptor 1 Intensity at the end of the event. Computed by modeling acceptor
    with a straight line best fit.
    A1NL Acceptor 1 background Noise Level. Computed as the standard deviation of the
    acceptor 1 “noise” hybrid trace.
    A2Int Average (for long events, longer than max_short_event) or peak acceptor 2
    intensity during the event.
    A2L Acceptor 2 Intensity at the start of the event. Computed by modeling acceptor
    with a straight line best fit.
    A2R Acceptor 2 Intensity at the end of the event. Computed by modeling acceptor
    with a straight line best fit.
    A2NL Acceptor 2 background Noise Level. Computed as the standard deviation of the
    acceptor 2 “noise” hybrid trace.
  • Referring now to FIG. 1, a graphical illustration of certain of the parameters that are defined for an event are shown. The parameters are defined in the table above.
  • donor spots data
  • Tab delimited file. The first line contains tab delimited text labels, the rest, data, one line per donor trace.
  • Label Name Description
    Stream Stream ID. Normally, 2-digit number taken from the stack file name. For
    example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    AvgInt Average Donor Intensity during Lifetime.
    LifeTm Donor Lifetime (ms).
    DE Ratio (Total Donor Event Duration)/(Total Trace Duration).
    DEAC Ratio (Total Anti-Correlated Donor Event Duration)/(Total Trace Duration).
    Cnt Number of Donor Events detected.
    CntAC Number of Anti-Correlated Donor Events (that have a FRET event match).
    NPDon Number of Donor pixel traces selected by Pixel Selection and averaged into
    Hybrid Donor Trace.
    NPAc1 Number of Acceptor 1 pixels selected by Pixel Selection for averaging into
    Acceptor 1 Hybrid Trace.
    NPAc2 Number of Acceptor 2 pixels selected by Pixel Selection for averaging into
    Acceptor 2 Hybrid Trace.
  • donor events data
  • Tab delimited file. The first line contains tab delimited text labels, the rest, data, one line per donor event. A Donor Event is defined as a temporary switch to dark state of limited duration, which happens in the middle of the trace (that is, there is always excited donor before and after that event.)
  • Label Name Description
    Stream Stream ID. Normally, 2-digit number taken from the stack file name.
    For example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    DonProb Donor “probability”, computed in a way similar to slide_events:DonProb.
    Start Start time of the Donor Event (ms).
    Length Duration of the Donor Event (ms).
    AC Anti-Correlation. If ‘Y’, the Donor Event has a match of a detected FRET Event.
  • donor segments data
  • Tab delimited file. The first line contains tab delimited text labels, the rest, data, one line per donor segment.
  • Label Name Description
    DSegId Slidewise unique number, identifying a Donor Segment.
    Stream Stream ID. Normally, 2-digit number taken from the stack file name. For
    example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    Length Duration of the Donor Segment (ms).
    Excited 1 - excited, 0 - dark.
    Int Average Intensity.
    Dev Deviation of the polynomial approximation from the average intensity. Valid
    only for large (80 frames or more) excited segments.
    NL Noise Level within the segment. Based on standard deviation of the actual
    intensity from the polynomial approximation (or average intensity if no PA).
  • donseg events data
  • Label Name Description
    DSegId Slidewise unique number, identifying a Donor Segment.
    Stream Stream ID. Normally, 2-digit number taken from the stack file name. For
    example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    Start Start time of the Donor Segment (ms).
    Length Duration of the Donor Segment (ms).
    Excited 1 - excited, 0 - dark.
    Int Average Intensity.
    Dev Deviation of the polynomial approximation from the average intensity. Valid
    only for large (80 frames or more) excited segments.
    NL Noise Level within the segment. Based on standard deviation of the actual
    intensity from the polynomial approximation (or average intensity if no PA).
  • Tab delimited file. A Donor Segment Event is defined as a temporary change in the donor behaviour within a defined Donor Segment. If the Donor Segment is dark (Excited=0), the event is a temporary switch to excited state. If the Donor Segment is excited (Excited=1), the event is a temporary switch to dark state.
  • There can be zero to many Donor Segment Events in each Donor Segment.
  • Label Name Description
    DSegId Donor Segment ID of the Donor Segment where
    this event belongs in.
    Start Start time of the Donor Segment Event (ms).
    Length Duration of the Donor Segment Event (ms).
    Int Average Intensity during the event.
  • donor around event data
  • Tab delimited file. The first line contains tab delimited text labels, the rest, data, one line per FRET Event. Line-to-line match with slide_events.dat.
  • Label Name Description
    Stream Stream ID. Normally, 2-digit number taken from the stack file name. For
    example, if the stack name is Stream05, the stream ID is 05.
    Rank Spot trace rank within the slide based on how event-rich is the spot. Lower
    number means richer spot.
    DonCol Donor X-coordinate of the spot.
    DonRow Donor Y-coordinate of the spot.
    Start Start time of the FRET Event (ms).
    Length Duration of the FRET Event (ms).
    LDur Duration of portion of the Donor Segment immediately preceeding the FRET
    Event (ms).
    LDInt Average Intensity of the Donor Segment on the left (same as
    donor segments:Int).
    LDDev Deviation of the polynomial approximation from the average intensity of the
    Donor Segment on the left (same as donor _segments:Dev).
    LDNL Noise Level within the Donor Segment on the left (same as
    donor _segments:NL).
    RDur Duration of portion of the Donor Segment immediately following the FRET
    Event (ms).
    RDInt Average Intensity of the Donor Segment on the right (same as
    donor_segments:Int).
    RDDev Deviation of the polynomial approximation from the average intensity of the
    Donor Segment on the right (same as donor_segments:Dev).
    RDNL Noise Level within the Donor Segment on the right (same as
    donor_segments:NL).
  • BRIEF SUMMARY OF SEQUENCING TECHNOLOGY
  • The sequencing technology utilized for analysis in this application produces fluorescence events at multiple wavelengths in a large number of individual sequencing complexes (polymerase/template/primer/nucleotides). The primary analysis centers around identifying positions of the individual sequencing complexes generally within a small viewing volume or field associated with an experimental sample. That is, the actual sample volume may be disposed over a fairly large area of a surface of a substrate or in a fairly large volume of a container and the system is adapted to only view a small volume or field of the actual sample volume. However, in certain embodiments of sequencing systems, the viewing field could be the entire small volume if the sample is sufficiently confined to restrict its overall volume. The technology is adapted to follow fluorescence intensity at multiple wavelengths over time within the viewing volume, and extracting sequence information from the coordinated, time-dependent changes in fluorescence at each wavelength (base calling). Although the imager used specifically in this application is a frame-based CCD camera, data acquisition can be considered a parallel array of single detectors, each monitoring one sequencing complex. The inherently parallel nature of simultaneous sequencing (estimated to be several hundred up to 1000 individual sequencing complexes) occurring within the viewing field demands efficient use of computational resources, particularly where our goal is to have a near real-time output. While the inventors have not yet needed to rely on parallel computing to produce results quickly, the technology lends itself to straightforward parallelization—pipeline or matrix processing. Computationally intensive routines were implemented in C++ in conjunction with standard functions in MatLab as well as MPI libraries (Gropp et al., 1994). The routines can be run on any acceptable computer operating system platform such as Windows, Linux, Macintosh OS X, or other windowing platforms.
  • BRIEF OVERVIEW OF SIGNAL PROCESSING METHODOLOGY Calibration
  • Each sequencing complex produces fluorescence signals at multiple wavelengths or frequencies. Individual fluorophores produce signals in specific wavelength or frequency ranges or bands of the electromagnetic spectrum. Thus, each sequencing complex will include more than one fluorophore, at least one donor and at least one acceptor. Each wavelength band is independently monitored. In certain detection systems, the optical system splits the spectrum and directs various wavelength or frequency bands to different quadrants of a single CCD imager. Calibration is needed to determine pixel coordinates within each quadrant or data channel of the CCD that correspond to a single sequencing complex, i.e., the calibration permits the individual quadrants to be spatially correlated or registered—locations in one quadrant correspond to locations in the other quadrants. The necessary transformation is primarily a translation operation, however, a small amount of rotation may also occur requiring correction due to misalignments in the optical system. Although in the CCD system being currently used translation and rotation are the major components of the calibration operation, in other systems, the calibration may have to correct for many other types of data distortion such as twisting, stretching, compressing, skewing, etc. In the imager used in this application, the inventors have found that light emitted from each sequencing complex is generally localized within a single pixel or a small array of contiguous pixels within the frames and quadrant rotations of even a fraction of a degree are sufficient to mis-align pixel positions at the ends of the sensors. Additionally, small deviations in the optical system over time requires that the system be calibrated on a daily basis. Of course, the system can be calibrated more frequently if desired. While it is desirable to minimize these errors inherent in the hardware, the inventors believe that all systems will have some type of errors, such as alignment errors, that require calibration. To determine the correct image transformations, the inventors currently use a calibration program that adjusts translation and rotation of each image until multi-wavelength emitting fluorescent beads and/or grids (Molecular Probes) are brought into alignment. Automated calibration routines are based on maximizing mutual information (MI; Viola and Wells, 1997; National Library of Medicine Insight toolkit). The MI approach appears to work very well for data having small errors in alignment. The inventors believe that the mutual information approach allows them to tweak the calibration using the fluorescence captured during sequencing itself, because the errors in alignment are small and develop slowly. Using the actual sequencing data for registration should eliminate the need for a separate calibration step (i.e., with beads), and thus allow constant updating during sequencing, but is not absolutely necessary.
  • Spot Identification
  • Fluorescence within the viewing field is continuously monitored by the CCD imager. The first step in the analysis is to identify sequencing complexes within the viewing volume of the imaging device. Computationally, this process must be highly efficient because it is carried out for each pixel or data element in the imager (i.e., millions of pixel positions). Once the sequencing complexes are found, more complex and time consuming analyses can be directed at this subset of pixel positions. The inventors have been successful using a simple averaging approach to identify potential sequencing complexes. By observing an image formed by averaging pixel intensity values over all the collected data frames or over a subset of the collected data frames, pixels localions that have fluorescence values greater than background fluorescent can be identified, particularly under conditions of static FRET. In situations where FRET is more dynamic, the inventors have found that this approach still works, but requires a running average over fewer frames.
  • Filtering/Denoising
  • The fluorescent signals are recorded by the CCD imager by counting the number of photons that arrive at a given pixel during a fixed integration time, an adjustable parameter of the imagining device. Estimating the fluorescent state of each fluorophore in a sequencing complex requires two interrelated processes. First the instantaneous fluorescence intensity emitted in each band of the spectrum, donor fluorescence and acceptor fluorescence, must be extracted from background noise. Second, the fluorescence state must be estimated using this multi-band information (see below). It is clear at this point that there is considerable variance in the fluorescence intensity both from the coming together of the sequencing reagents and from instrumentation noise such as laser intensity fluctuations and camera readout noise. The signals can be smoothed by standard techniques such as averaging, fast Fourier transform techniques, and wavelet techniques (Donoho and Johnstone, 1994; Cooley and Tukey, 1965; Frigo and Johnson, 1998). However, before rationally applying these techniques to yield an optimal signal that does not lose valuable information, the inventors have or are systematically characterizing the statistical properties of each of the noise sources. This characterization involves performing controlled experiments where each noise source, alone and in combination, is isolated as much as possible and characterized. These experiments are used to determine instrumentation noise, and characteristics of each of the fluorescent indicators. Next, controlled experiments are used to characterize dynamic spFRET. This data have been and is being used to classify FRET signatures for different event types such as true nucleotide incorporation event, mis-incorporation events, a nonproductive binding events, random collisions FRET events, etc. For example, to characterize signals due to a random collision of the labeled nucleotides, sample runs can be preformed in the absence of donors. To characterize mis-incorporation events, the inventors observe samples where only a mismatched base is available. To characterize nonproductive binding events, reactions are performed in conditions that incorporation cannot occur, e.g., in the presence of a 3′ dideoxy-terminated primer. Other similar controlled reaction conditions can be used to characterize other event types.
  • Signal Estimation
  • Signal estimation is the process of assigning a fluorescent state to each of the molecules of interest. A molecule can be at the base state (non-emitting), the excited state (emitting), triplet blinking, or bleached. Additionally the molecule may be in FRET with another fluorophore, or in partial FRET, where it transfers energy to another molecule, but continues to emit light, but at a lower intentity level. In addition, certain fluorophores emit light in more than one band of the spectrum. Under some conditions where the signal-to-noise ratio is relatively high, this assignment is easily accomplished. However, in general, the ability to assign the correct state of each of the fluorophores at each time point in a trace ultimately determines the sensitivity of the system and will determine whether specific sequencing strategies are feasible. For example, FRET efficiency decreases rapidly with distance. The maximum usable distance is that in which the fluorescence of the acceptor molecule can still be distinguished reliably from background noise.
  • It is not necessary that this estimation function be fully distinct from the filtering functions described above. The inventors apply model-based estimation routines such as Kalman filtering, where each sequencing complex is considered to be in one of a series of internal states. A set of observables is defined (in this case fluorescence intensity of the various molecules). The observables are also analyzed for how their values vary as a function of the internal state and how their values are influenced, corrupted or degraded by various noise sources. The Kalman filter then produces a maximum likelihood estimate of the state of the model given the observables. This filtering represent a powerful approach, is well developed and has been applied to a variety of areas from satellite position detection to stock market prediction. Although the basic Kalman filter is limited in our application by a number of assumptions on linearity, extensions of this process such as extended Kalman filtering and particle filtering (Arulampalam et al., 2002) relax these assumptions (at the cost of additional computational requirements). The success of these algorithms for our purposes depends in large part on the ability to define statistics for different noise sources, and on available computational resources.
  • Base Assignment
  • Once the fluorescence states of the sequencing complexes have been assigned, the time-dependent changes in the states are then interpreted as or related to sequencing events occurring at the observed sequencing complexes. This interpretation depends on the specific configuration of reagents. For example, if an acceptor molecule on a labeled nucleotide travels into a FRET volume surrounding a donor, such as a donor-labeled enzyme, FRET may occur, where the FRET volume surrounding a donor is the volume in which a donor can transfer energy to an acceptor at a rate to be observed by the imagining system. Because of the nature of a FRET event, FRET events are characterized by a decrease in a donor fluorescent signal and a corresponding and simultaneous increase in an acceptor signal—the signals are anti-correlated. This time-dependent pattern of fluorescence at different wavelengths may represent or be interpreted as an incorporation event. If the fluorescence data are relatively clean, this step is very straightforward. One simply looks for specific patterns in the fluorescence signals. However, depending on the signal-to-noise ratio, it may be difficult or impossible to decide whether a specific set of changes in fluorescence is just noise. Thus, the inventors developed a set of criteria based on studying sequencing reactions subjected to a set of specific controls so that each assignment is accompanied by a numerical indicator of confidence. Such criteria includes the strength or clarity of the FRET signal, and the specific base being incorporated (characteristic patterns and/or lifetimes associated with fluorescence throughout incorporation).
  • DETAILED DESCRIPTION OF THE SIGNAL PROCESSING METHODOLOGY Spot Find Process I
  • The process starts by looking for pixels in the donor channel or quadrant that have a ‘local maximum’ donor intensity value in an averaged image, an image formed from averaging all or some of the frames in a stack for a given slide. For every value a of a pixel located at [col,row] in the image, the process determines whether the value a is greater than or equal to adjacent pixel values, and greater than 0.95 times diagonal neighbor pixel values. The condition ‘greater than or equal to’ is chosen to resolve the situation when two or more adjacent pixels have equal intensity, then the first one is picked as a candidate.
  • If the above conditions are met, the pixel at [col,row] is taken as a spot candidate. Because the number of candidates can be huge (typically around 3000 on an 360×360 overlay), several filters are applied to limit the number of spot candidates that are passed on for subsequent processing.
  • Referring now to FIGS. 1 and 1′, spot candidates on an overlay image are shown as large and small dots (large dotes are green and small dots are blue and red in a color image). The small dots represent candidates rejected by the stage 1 filter and by the stage 2 and 3 filters (blue and red, respectively).
  • Stage 1 Filter
  • The stage 1 filter estimates background noise level around each candidate pixel, then compares it to the pixel value a. The stage 1 filter determines these levels by selects 15 least bright pixels in a 5×5 area [col-2,row-2 . . . col+2,row+2] and computes a mean c and a standard deviation na of their intensity distribution. The signal to noise ratio (a-c)/na is a measure of how much a candidate pixel intensity value is above local background noise. If this ratio is less than a signal-to-noise threshold value, then the candidate is rejected. The signal-to-noise threshold value is generally between about 1.5 and about 5. In certain embodiments, the signal-to-noise threshold value is 3.
  • Referring now to FIGS. 2 a&b and 2 a′&b′, the methodology for candidate pixel rejection and acceptance is shown. Looking at FIGS. 2 a and 2 a′, candidate rejection is shown, where pixel candidates are rejected if their intensity values are below (less than) the signal-to-noise threshold value of 3 or equivalently, where the intensity a is below (less than) 3 na. Looking at FIGS. 2 b and 2 b′, candidate acceptance is shown, where pixel candidates are accepted if their intensity values are greater than or equal to the signal-to-noise threshold value of 3 or equivalently where the intensity a is greater than or equal to 3 na.
  • In the figures, a cross (red in a color image) marks the candidate pixel in the left hand portion of the averaged image. A gray square (blue in a color image) surrounds that candidate pixel and is a 5×5 surrounding pixel area. 15 least bright pixels within the 5×5 surrounding pixel area are marked with dots (green in a color image).
  • The graph on the right in the figures plots the intensity distribution of the 15 selected pixels represented by the dots inside the square. A gray area in the plot shows the standard deviation of the background noise level. A black vertical line marks the mean value c of the distribution. A dark grey vertical line (red in a color image) is 3 times standard deviation na (same as the threshold signal to noise ratio) away from the mean. A light grey vertical line (green in a color image) is the intensity value a of the candidate pixel. If the light gray (green) line is to the left of the dark gray (red) line, the candidate is filtered out.
  • This filter typically eliminates about ⅔ of the pixel candidates, leaving about 1000 out of ˜3000 spot candidates. The inventors have found that about ¾ of the remaining candidates also do not represent a true candidate. Thus, this stage 1 filter is not real efficient at candidate elimination. The principal reason for the stage 1 filters lack of robustness is that it uses a local noise level, computed on statistically insufficient data. Referring now to FIGS. 3 and 3′, an example of a “poor” spot candidate that passed through the stage 1 filter is shown.
  • Stage 2 Filter
  • The stage 2 filter was designed to compensate for the lack of robustness of the stage 1 filter. The stage 2 filter works in a very similar way from the stage 1 filter. The stage 2 filter uses a global noise level, which is an average avgna of the local noise levels na of all spot candidates from the previous step.
  • Note that the global noise level cannot be easily obtained by just computing statistical parameters of low-intensity pixels from the entire overlay area, because the mean of their distribution is not constant, it slowly changes across the quadrant. However, an average of local noise levels around the spot candidates gives a fair approximation to the global noise level (average deviations from variable local pixel intensity means).
  • Referring to FIGS. 4 and 4′, the stage 2 filter is illustrated graphically. The graph shows a horizontal slice of the overlay area around the candidate pixel shown in FIGS. 3 and 3′. The dark grey (green in a color image) bars represent pixel intensity values around and including the candidate pixel, which is the middle bar. A black horizontal line marks a local ‘zero’ level, the mean c of the intensity distribution of low-intensity pixels, which passes through most of the bars. The gray area with the black horizontal line centered in the middle represents the global noise level avgna, an avarage of standard deviations na derived from all the spot candidates as explained above. A bell curve (green in a color image) represents an estimated intensity model of the spot candidate, having its maximum at the brightest (middle) pixel. The maximum is also shown as a horizontal (green in a color image) line touching the top of the bell curve.
  • A dark (red in color image) line represents a level of minc times avgna, where minc is a parameter having value between about 3 and about 12. In certain embodiments, the parameter is 7. A light gray (brown in color image) line represents a level of doubt times avgna, where doubt is a parameter having value between about 5 and about 20. In certain embodiments, the parameter is 12. The signal to noise (SN) ratio is re-computed for every spot candidate as (a−c)/avgna. If the candidate SN ratio is below (less than) the value of minc, the candidate is rejected. If the candidate SN ratio is greater than or equal to the value of doubt, the candidate is accepted with no further checking. If the candidate SN ratio is in between the value of minc and doubt, which typically happens in approximately 50 to 100 cases, the candidate is passed onto the stage 3 filter.
  • The stage 2 filter effectively eliminates almost all candidate pixels found by the inventors to not represent spots for further analysis, leaving only good spots (typically, 250 out of −1000) with a relatively small amount of doubtful spot candidates.
  • Stage 3 Filter
  • The stage 3 filter is applied only to the doubtful spot candidates from the stage 2 filter. The stage 3 filter starts by computing a more precise spot model by best-fitting spot pixel intensities in the 5×5 area [col-2,row-2.col+2,row+2] according to the formula:

  • I(col,row)=C+A−exp(−((col−Xm)2+(row−Ym)2)/R 2)
  • where C, A, Xm, Ym, and R are computed as to satisfy least squares condition. The adjusted signal to noise ratio A/avgna is then compared to the value of a parameter minc2 (ranging from about 5 to about 12, and in certain embodiments having the value 9). If the adjusted signal to noise ratio A/avgna is below (less than) minc2, then the doubtful spot candidate is finally rejected.
  • Referring now to FIGS. 5 a-c and 5 a′-c′, the stage 3 filter is depicted graphically. Looking as FIGS. 5 a and 5 a′, a spot candidate that passed through the stage 1 filter is shown, while looking at FIGS. 5 b and 5 b′, a spot candidate that passed through the stage 2 filter is shown. Looking at FIGS. 5 c and 5 c′, the best-fitted pixel intensity model is shown as a bell curve (blue in a color image) used in a stage 3 filter rejection. A curve horizontal line (green line in a color image) represents a maximum intensity of the model; the line contacts a top of the bell curve. A dark (red) horizontal line represent the level of minc2 times avgna. If the curve horizontal line is below the dark minc2 times avgna line, the spot candidate is finally rejected. The stage 3 filter typically eliminates 10 to 20 percent of the doubtful spot candidates.
  • The remaining spot candidate objects are stored in an array and returned to the caller. They are shown as green dots on the FIGS. 1 and 1′.
  • Average Stack Over Different Intervals
  • In some cases, potential donor candidates are not identified due to averaging over too large of a set of frames in a stack. This missing of potential donor is especially apparent when averaging is performed on an average including all frames of a stack. The potential reasons for missing acceptable candidates is that certain active sequencing complexes may not have donor that have detectable lifetimes that span all of frames or a significant amount of the total frames to be selected in an averaging over too large a frame set. Thus, these potential donor candidates generally have shorter lifetimes, and the average donor intensity is consequently too low for the site to be selected as a donor candidate.
  • Dynamic Binning
  • To address this problem, a dynamic binning process (adjusting the number of frames to average over) was implemented to determine whether the process changed the number of donor candidates. The user enters the number of the bins as a parameter, e.g., 1, 2, 4, 8 and 10 as number of bins. The parameter is modifiable based on the observed experimental donor lifetimes results. After implementing the dynamic binning in candidate identification, the inventors found an increase in the number of the donor candidates. The inventors also found that the number of candidates increased with decreasing binning number.
  • Consolidation of Donors
  • Once the stack image is averaged over various intervals, the process generates multiple average images requiring consolidation of the donor spots. For each averaged image, the spot find process I is applied to identify initial spots. After the spot identification, the process performs voting of the donor spots. Voting involves adding the binary value associated with each spot across the averaged images and that value is stored the new master image. For example, if the stack include 1000 frames, which were imaged in 250 frame bins, then the voting would have a maximum value of 4 for each spot and an minimum value of 1. FIG. 6 a depicts pixel values after voting over average donor images.
  • After the voting operation, we use a neighborhood criterion to obtain a consolidated donor image. All pixels which have a value greater or equal to 1 are considered donor candidates. In the consolidated donor image, first the spots with highest votes are selected, with consecutive selections proceeding on decreasing vote values. Any donor candidate within the 3×3 neighborhood of a previously selected candidate is rejected. This is a recursive operation performed until all pixels with votes greater than or equal to 1 (donor candidates) have been considered. In the case of a tie in vote value, the pixel with higher intensity is selected as a donor spot. The process identifies both single spots and grouped spots. Only the grouped spots undergo the consolidation operation. FIG. 6 b depicts single spot selection in an average donor image after voting, while FIG. 6 c depicts a snapshot of grouped spots after voting and selection of the donor pixel.
  • Dynamic Threshold
  • Dynamic thresholding is an alternate process for identifying or find spots (pixels location for which fluorescence is above background and may represent active sequencing complexes). The pre-selection stage of the selection of donor candidates sometimes overestimates the donors and can be seen as redundant. Alternatively to stage 1 filtering, initial donor candidates can be estimated by computing a dynamic threshold. The user can enter expected donors (default is set to an experimental obtained value). Using histogram analysis, the brightest spots on the image are selected using intensity information as shown in FIG. 7. An accurate threshold value is generally determined from the intensity data alone, but can also be based on intensity and lifetime data.
  • Clustering
  • Thresholding is a global operation and may result in donor candidates that are actually with the closed 3×3 neighborhood of a previously identified donor candidate. The candidate identification process keeps track of single spots and grouped stops or clusters, by using morphological operations (single pixels in the 3×3 neighborhood matrix are separated from grouped pixels). FIG. 6 b and FIG. 6 c depict single spots and group spots identified after voting and selected donor pixels after conkoidation. To determine which pixels are real donors in a cluster, the process uses an approach similar to the approach used for consolidation of donors as described before, where the process analyzed the distance (3×3 neighborhood information) between candidates, votes and intensity information. Referring to FIG. 8, the thresholding gives rise to several instances of a donor candidate within the 3×3 neighborhood of another donor candidate. These occurrences are resolved into real donor candidates using vote and intensity information as discriminators.
  • Initial Pixel Selection
  • For every spot (donor) at [col,row], the process selects nine brightest pixels for the donor signal and up to eight pixels around the nine brightest pixels as donor noise data. At first, the process sorts pixels in a 7×7 area [col-3,row-3 col+3,row+3] surrounding a spot by decreasing intensity. Then, the process selects nine (9) pixels in a 3×3 array or area [col-1,row−1 . . . col+1,row+1] with the candidate pixel in the middle of the 3×3 area. After that, the process randomly selects up eight (8) pixels having the lowest intensity from the set of pixels outside of the 3×3 array or in the second part of the 7×7 area as noise pixels for each 3×3 array including a bright pixel. Again, the method was tuned used 3×3 and 7×7, but the method can equally well work with larger and smaller arrays, n×n and n×m array where m and n are integers and m>n, with the array size being a function of the detection system and the system being detected.
  • Next, the donor quadrant coordinates [col,row] are transformed into acceptor quadrant coordinates [colA,rowA] by applying the coordinate transform obtained from the calibration data. That is, the data in the acceptor channels are transformed by the calibration transform so that locations in the acceptor channels correspond to location is the donor channel. Then, the nine (9) pixels in a 3×3 area or array including a pixel location [colA,rowA] in the acceptor channel corresponding to each of the selected donor pixel location [col, row] are selected as candidates from acceptor channel. Because at this stage of the analysis, there is no way to a prior discriminate between good and poor acceptor pixels, all nine pixels are selected in the 3×3 array including a donor corresponding acceptor pixel. The coordinates of acceptor noise pixels are obtained by applying the coordinate transform to donor noise pixels.
  • Referring now to FIGS. 9 a-d and 9 a′-d′, four examples of the initial pixel selection methodology are depicted graphically. In the left most images, an inner square (green in a color image) delimits the 3×3 area [col-1,row-1 col+1,row+1] from which the 9 donor signal pixels are selected. An outer square (blue in a color image) delimits the 7×7 area [col-3,row-3 . . . col+3,row+3] from which the 8 donor noise pixels are selected shown as gray dots (cyan dots in a color image). In the middle images, dark dots (red in a color image) represent the 9 selected acceptor pixels in acceptor channel 1 and gray dots represent the 8 selected acceptor 1 noise. In the right images, the dark dots (blue in a color image) represent the 9 selected acceptor pixels in acceptor channel 2 and gray dots represent the 8 selected acceptor 2 noise. The exact location of the acceptor pixels are determined by the application of the calibration transformation derived calibration routines.
  • After all relevant pixel coordinates for all candidates spots have been identified and selected, the process reads the stack file again, frame by frame, and collects individual pixel traces, i.e., data associated with a given pixel location in each frame through all the frames in the entire stack or that portion of the stack that includes potentially relevant sequencing data. Thus, if the above analysis was directed to whole stack averages, then the candidates would represent pixels that have values above a threshold. If the above analysis was directed to partial stack averages, then the candidates would represent pixels that have values above a threshold as well, but the average would be over less than all the frames. Again, if binning is used, then the candidates signals may extend from one bin to the next bin so the trace would extend until the relevant data is collected into the trace.
  • Hi-Pass Filter
  • Every signal trace can be considered as a useful signal to which an amount of random (chaotic) noise is added. The zero-point of the signal intensity can be defined as the mean of the noise intensity distribution. This zero-point is not constant as it has been found to slowly change over time. This slowly changing portion of the intensity is computed as a polynomial approximation (using a least squares fitting approach) of the averaged noise trace, which is a simple arithmetic average of all noise pixel traces in a channel. Although least squares fitting has been used, other fitting approaches can be used as well as a hi-pass filter for the pixel traces. The value of the approximating polynomial is then subtracted from every individual pixel trace in a channel to remove this slowly varying noise.
  • Referring now to FIGS. 10 a-d and 10 a′-d′, the operation of the hi-pass filter is graphically illustrated. Looking at FIGS. 10 a and 10 a′, the noise pixel traces are averaged into a single averaged noise trace (top graph), then its polynomial approximation is computed using a least squares algorithm. Next, the value of the polynomial is subtracted from every individual pixel trace. Looking at FIGS. 10 b and 10 b′, the value of the approximating polynomial is subtracted from donor signal pixels as shown in the top graph with the result of the subtraction shown in the bottom graph. The horizontal line (blue in a color image) represents the zero-level, the mean of the background noise intensity distribution for the donor data. Looking at FIGS. 10 c and 10 c′, the noise pixel traces from an acceptor channel are averaged into a single averaged noise trace shown in the top graph. Next, its polynomial approximation is subtracted from every individual acceptor pixel trace. Looking at FIGS. 10 d and 10 d′, the value of the approximating polynomial is subtracted from acceptor signal pixels as shown in the top graph with the result of the subtraction shown on the bottom graph. Again, the horizontal line (blue is a color image) represents the zero-level, mean of the background noise intensity distribution for the acceptor data.
  • This procedure is performed separately on the traces from each channel, donor and acceptors. As a result, for every identified spot object, a set of channel objects is created. Every channel object contains 9 signal pixel traces, and up to 8 noise pixel traces that were picked from around the signal pixels. Not all of the 9 signal traces are retained in the final data output, since not all of them contain useful signal information. Lower intensity signal traces are eliminated by subsequent processing of donor and acceptor pixel selection methodology described herein.
  • At this point for every spot, a set of pixel traces is accumulated, from the donor channel and from each acceptor channel. A pixel trace set typically includes 9 signal pixel traces and up to 8 noise pixel traces. The process described below constructs single hybrid traces from the donor channel and from each acceptor channel for every spot. The hybrid traces are constructed to optimize or maximize the signal to noise ratio of the data from every channel.
  • Donor Pixel Selection
  • Every individual donor pixel trace is smoothed with a Smart Smoother as described below, then compared to the noise level in order to determine segments, where the signal goes above thenoise level (lifetime). The noise level NL is computed as a square root of a square average of all noise samples across all noise pixel traces, assuming that the mean of the noise intensity distribution is zero after application of the hi-pass filter.
  • Next, a score of every pixel trace is computed as an average of original (non-smoothed) data during the lifetime. If the lifetimes of individual traces differ significantly, the traces with short lifetimes (shorter than half of the longest lifetime in the set) are rejected.
  • The remaining traces are sorted by score. Then those traces having a score higher than half of a highest score are selected for averaging into the hybrid trace. However, if the number of traces having a score greater than half the score of the highest score is greater than 5, then only five traces are selected so that the five have the highest score and their score is greater than half the score of the highest scored pixel.
  • Referring now to FIG. 11, donor pixel selection process is illustrated graphically. The figure includes an overlaid data image and ten panels that include pixels traces. In the figure, the nine bottom panels show the individual donor pixel traces in the 3×3 donor pixel array. The traces that do not include solid segment lines below the trace represent traces rejected by the analysis and are not used in producing the average donor trace shown in the top panel. The rejected donor pixels are shown as dots in the pixel image box. Each trace having a solid segment line below the trace is graphed with its original, non-smoothed data (light green in a color image) shown as fine line about a solid thicker line (dark green in a color image) representing its smoothed data generated using the Smart Smoother of this invention. The horizontal bars (green in a color image) below the accepted traces are the lifetime segments used in calculating the hybrid donor trace.
  • The top panel in the figure is the hybrid trace, an average of the selected traces. The gray horizontal strip centered about a zero line evidences the final noise level, computed as the standard deviation centered at 0 of the hybrid noise trace. The solid bar (green in a color image) underneath the trace shows the donor's hybrid lifetime. The overlaid data image shows the spacial position of the donor signal pixels and noise pixels. The selected traces are shown as large boxes, while rejected traces are shown as small boxes. In this example, four traces were selected and five traces were rejected. An equal number of noise traces randomly picked from the 8 available are averaged into a single hybrid noise trace. From this averaged noise, the final noise level is computed as the standard deviation from 0 of the hybrid noise pixels.
  • On the final hybrid donor signal trace, a few general parameters are computed: (1) a lifetime LT representing the number of data samples (frames) above the noise level (convertable to secondsby multiplying by time between samples), (2) average donor intensity during the lifetime Int, and (3) donor signal to noise ratio S/N, computed as Int/NL.
  • At this point of the analysis a few spots from the initial list may be rejected. The rejection criteria is based on the computed average lifetime and signal to noise ratio computed during the donor lifetime compared to the configurable minima of these values. The minimum lifetime parameter contained in the parameter bad lifetime, which is adjustable and is currently set to 20 data samples or frames, and a signal to noise minimum parameter designated bad_dsn, which is also adjustable and is currently set to 1.5. The configurable minima were chosen based on empirical evidence that it is practically impossible to reliably detect anything at all in traces that do not meet these criteria.
  • Acceptor Pixel Selection
  • The discrimination between good and not so good acceptor pixel traces is more tricky, because the acceptor signals are typically short and week. The inventors currently use two competing methods to analyze the acceptor signals. These two methods can and often do produce different results. The inventors then use special logic to choose the method that yield the best results.
  • The first method is an intensity-based method and was optimized to detect long-living events. The method applies a Smart Smoothing routine (described below) to each pixel trace, then computes lifetimes as segments in the acceptor traces, where the smoothed data values are above the noise level. The method then assigns a score to the computed lifetimes as the ratio of standard deviation during lifetime to standard deviation outside lifetime. FIG. 12 a shows the score scaled by the factor 1000 next to each pixel trace. The factor 1000 is chosen solely for presentation, it has no meaning in the application of the method.
  • The traces are then sorted by score in descending order, and a cut-off value is defined as half the average of the two highest scores. The cut-off at 50% is chosen because adding lower intensities to the final hybrid trace does not improve signal to noise ratio, which has been confirmed experimentally on both simulated and real data. The traces that have lower scores, are rejected.
  • An additional routine is applied to check whether the lifetimes of individual traces match each other at least half of the time. If the lifetime of a trace has a significant (more than 50% of the longest lifetime) mismatch with the others, the trace is also rejected.
  • Finally, a spacial configuration of the pixel cluster is checked to ensure that non-adjacent pixels were not included in the cluster, because non-adjacent pixels cannot be from the samereplication or sequencing complex.
  • Referring now to FIG. 12 a, the intensity-based acceptor pixel selection method is illustrated graphically. In the figure, the nine bottom graphs show individual acceptor pixel traces. The grayed graphs are the traces that have been rejected by the logic. The top (green) graph shows donor hybrid trace, and the graph right below it, the hybrid acceptor trace obtained by averaging selected (non-grayed) individual acceptor pixel traces. The overlay picture shows spacial location of all nine candidates, selected pixels shown in bold, and individual noise pixels.
  • An alternative algorithm (derivative-based) is optimized for short-living events, if any. It works in a very similar way, but instead of smoothed function of the trace itself, it takes the product of donor and acceptor derivatives, then computes “noise level” as the standard deviation, “lifetime” when the derivative product is above the noise level, “scores” of the traces, and so on.
  • Referring now to FIG. 12 b, a derivative-based acceptor pixel selection process is illustrated graphically. The graphs below the time line show individual acceptor pixel traces. The grayed one(s) have been rejected, and did not contribute to the average (red) graph at the top. Below each graph the product of its derivative and donor's derivative is shown. The green graph at the top is the hybrid donor signal.
  • After the intensity-based algorithm is applied, the logic checks whether it has produced satisfactory results. That means, it detected one or more acceptor lifetime segments, comparable in duration to the S-G parameters nL and nR, and if the signal to noise ratio of these segments is higher than minimal signal to noise ratio, which can range from about 1.5 to about 2, the current preferred value is 0.7. If the above conditions are not met, the logic applies derivative-based algorithm. Finally, the logic averages selected acceptor traces into a single hybrid trace, then averages an equal number of noise traces, to create a hybrid acceptor noise channel, which is expected to have a compatible noise level.
  • Referring now to FIG. 13, the results of the filtering and hybridizing operations are shown graphically for the donor, acceptor 1 and acceptor 2.
  • Signal File Format
  • At this point, the result may be saved into a signal file in the following format:
  • spotdata (donCol,donRow) nsamples
  • delta stack stack_name
  • directory stack_directory
  • spot spotname col row mask
  • spot . . .
  • start data
  • spot0sample[0] spot1sample[0] . . . spot0sample[1]
  • spot1sample[1] spot0sample[nsamples-1]
  • spot1sample[nsamples-1]
  • stack_name—file name of the stack file (normally, without extension);
  • stack_directory—path to the directory of stack file;
  • nsamples—number of data samples in every trace, equal to the number of frames in the stack file;
  • delta—delta time in milliseconds between samples;
  • donCol,donRow—coordinates of the central donor pixel;
  • spotname—trace name, one of the following:
  • don—cumulative donor signal trace
  • donn—cumulative donor noise trace
  • ac1—cumulative acceptor 1 signal trace
  • ac1n—cumulative acceptor 1 noise trace
  • ac2—cumulative acceptor 2 signal trace
  • ac2n—cumulative acceptor 2 noise trace
  • col,row represents the coordinates of the signal center pixel. The parameter mask is a bit mask that shows which of the 9 pixels in the 3×3 area around the center pixel have contributed to the cumulative signal. Bit 0 is set when the pixel at (col-1,row-1) has been selected, bit 1 for (col,row-1), and so on. The value is an hexadecimal sum of one or more bit values represented in the table below.
  • col − 1 col col + 1
    row − 1 001h 002h 004h
    row 008h 010h 020h
    row + 1 040h 080h 100h

    The value of mask is meaningless for noise traces.
  • A fragment of such a file is shown below:
      • spotdata (196,266) 1000 25
      • stack Stream05
      • directory D:\Dteam\Detection Data\05-10-05\16pCg-QTLAA-PiW-25 ms spot don 196 266 030
      • spot donn 196 266 1FF
      • spot ac1 23 89 1B8
      • spot ac1n 23 89 1FF
      • spot ac2 23 266 0BA
      • spot ac2n 23 266 1FF
      • start data
      • 305-107 33 106-1-21
      • 276 62-25 10 17-39
      • 233 13 146-7-42-9
      • 504 86 170-64-25 45
      • . . .
    The Donor Model
  • At this point in the analysis, the signals are analyzed in a digital format. Thus, a signal can be considered as transitioning between a digital zero state and a digital unit state, i.e., transitioning between 0 and 1. While the digital zero level can be established fairly well by examining the noise channel, the digital unit level poses a problem, because it is not stable.
  • For acceptor channels, the task seems to be relatively easy and straightforward, because the acceptors are normally at their zero level, well established and fixed by the hi-pass filter. That is, the acceptors are in a dark state unless or until they receive sufficient energy from a source to fluoresce. Although some background acceptor emissions are seen, the principal pathway to acceptor fluorescence is via energy transfer from an excited donor as the sample is being irradiated with light the only the donor can accept. Therefore, the process simply assumes that an acceptor is at zero level as long as its intensity does not go above the noise level.
  • On the other hand, the donor data is more difficult to digitize. From a chemical view point, the donor signal can be on—it is being irradiated by a light source on a continuous basis. The donor can be transferring energy to an acceptor. The donor can inter-system cross from a singlet manifold to a triplet manifold, which is observed experimentally as blinking. The donor can non-radiatively lose excitation energy, also observed as blinking. The donor can temporarily photobleach or permanently photobleach. Additionally, the donor intensity has been found to fluctuate around its unit level and its unit level has been found not to remain constant over time. Thus, this routine is designed to find donor unit levels at different moments in time.
  • Because the donor signal may not only slowly change around its supposed unit level, but swiftly go up and down as well, a simple technique like a hi-pass filter is an ineffective processing filter. Before applying a polynomial fitting routine to the donor traces, the process breaks down the entire donor signal into segments, on which no swift and rapid changes occur. This segmentation of the signal is done by computing the signal's derivative and finding its outstanding extrema, that is, where the derivative goes above or below 1.2 times its own standard deviation. The value of 1.2 times was experimentally established to give the best overall results, but the parameter can range from about 0.8 to about 2.0. Every such extremum defines a segment boundary. The area between two consecutive extrema is a segment. At this point, there are too many segments, and most of them are too small.
  • Referring now to FIG. 14, aspects of the donor model relating to initial signal segmentation are illustrated graphically. The bottom portion graphs the derivative of the donor signal (red in a color image). The gray area denotes 1.2 times its standard deviation as an evidence of the noise level associated with the signal. The vertical lines (cyan in a color image) in the bottom graph mark boundaries of the segments derived by application of the routine onto the data trace. The top portion graphs the donor signal; the raw signal is shown in light gray (light green in a color image) and the smoothed signal is shown in dark gray (dark green in a color image). Again, the gray area denotes 1.2 times its standard deviation as an evidence of the noise level associated with the signal. The straight line graph (dark blue in a color image) plotted through the raw and smoothed data show averaged intensities for the segments.
  • For every segment, the method computes two parameters. The parameters are the segment length or temporal duration and the average intensity of the signal in that segment. These two parameters are then used to decide, whether one or more adjacent segments should be joined into a single larger segment. This joining is typically done when two adjacent segments have close average intensities. The term “close average intensities” means that adjacent segments have intensity values that differ by between 1 and 2 times the noise level. In certain embodiments, the term “close average intensities” means that the adjacent segments have intensity values that differ by less than 1.4 times the noise level. Segments are also be joined, if small data segment in interposed between to relatively long segments. Generally, a small data segment is a segment that extends over less than 40 frame or data samples. In certain embodiments, the routine joins two segments if an intervening segment as a duration between about 20 and about 40 data samples. In other embodiments, the routine joins two segments if an intervening segment as a duration between about 30 data samples. The routine consider segments separated by a short segment relatively long for the purpose of segment joining if the segments on each side of the short segment have durations or lengths 1 to 2 times larger than the short segment. In certain, embodiments, the two segments on each side of the short segment have durations or lengths 3 to 4 time larger than the short segment.
  • Referring now to FIG. 15, aspects of the donor model relating to segment optimization is illustrated graphically. A series of successive optimizations is applied to the initial list of segments. For every segment, the segment optimization routine computes a segment length or duration and a segment average donor intensity. Based on these two parameters, several adjacent segments are joined into large one segments. Also, the routine determines, whether the donor signal is mostly at its unit level as evidence by horizontal and vertical lines through the data trace (blue lines in a color image). This segmentation representation of the data trace also include a horizontal line than represents when the fluorophore is a zero level (not emitting light) (red lines in a color image).
  • The optimization routine also distinguishes between segments, where the signal is mostly at the unit level, and the segments, where the signal is mostly at the zero level. For the former, the unit level can be computed out of segment data alone, but for the latter, the unit level has to be derived out of its neighbors.
  • Referring now to FIG. 16, aspects of the a donor model relating to final stage processing is illustrated graphically. The unit segments, segments where the fluorophore is active, are best fitted to a polynomial function represented by a solid curve through the trace (blue in a color image). The standard deviation (unit noise level) associated with the polynomial function is shown as a gray area with the curve centered therein. The dark gray horizontal bars (dark green in a color image) at the bottom of the figure show segments where the donor signal has a high intensity value; while light gray horizontal bars (light green in a color image) show segments, where the donor signal has a low intensity value.
  • The final step in the process is to fit all unit segments, segments where the fluorophore signal stays at the unit level most of the time, with a polynomial function that follows the variable unit level of the signal intensity. The standard deviation associated with polynomial function is also computed, and serves as a measure of noise level around the unit level. For all zero segments, the unit level is assumed to be constant, and equal to unit level value computed at the previous step, and the noise level is assumed to be equal to the background noise level.
  • Now, the donor trace at a particular location in the viewing field is represented by a set of zeros and ones through the frames. The value of 1 over a segment of the donor trace signifies that the donor is in a high state and is simply determined by comparing the trace segment to the local unit level less local noise level—if the signal is above this value, the unit level value is set at 1 (donor is in a high state); otherwise, the unit level value of this donor is set a 0 (donor is in a low state). In certain traces, a donor segment may not fall to a value below local noise level, but is situated between two much higher intensity peaks; in such as case, the segment is also assigned a zero value. Lo-pass Filtering Algorithm
  • A low-pass filter is usually applied to signals that are variable, that is both slowly varying and corrupted by random noise. In such case, it is sometime useful to replace each actual data point with a local average of surrounding data points. Because nearby points measure very nearly the same underlying signal value, averaging over these surrounding data points can and often does reduce the level of noise without much biasing of the averaged signal value obtained.
  • The present invention utilizes a particular lo-pass or smoothing filter sometimes referred toas a “Savitzky-Golay” lo-pass filter, “least-squares” lo-pass filter, or DISPO (“Digital Smoothing Polynomial”) lo-pass filter. The lo-pass filter operates by replacing a value of every input data point with a value derived from a polynomial fitted to that input data point and several nearby, generally adjacent, input data points.
  • Referring now to FIG. 17, a Savitzky-Golay, lo-pass smoothing filter is illustrated graphically. For a data point fi represented by a large square DP (green in a color image) in the figure, the filter then fits a polynomial of order M represented by the solid line curve (blue in a color image) to all data points from i-nL to i+nR (green dots), then replaces the value of the data point fi with the value of the polynomial at position i represented by a large square PV (red in a color image). In this example, nL=8, nR=8, and M=6.
  • Because the process of least-squares fitting involves only a linear matrix inversion, the coefficients of a fitted polynomial are themselves linear in the values of the data. Thus, all the polynomial fitting can be done in advance resulting in a set of coefficients which do not depend on the particular data point values. Therefore, the polynomial or smoothed value is computed simply as a linear combination of the coefficients ˜Cjfj(j=i−nL . . . i+nR) of these pre-computed coefficients and the data samples around the ith point.
  • A similar technique is used to obtain of smoothed data values of the derivative of a data trace. In this case, the ith derivative value of the data trace is replaced not by the value of the fitting polynomial, but by the value of the derivative of the polynomial at the ith data position. As is true with the application of the lo-pass filter to the trace data, the coefficients for the polynomial can be performed in advance, by pre-computing coefficients Ci-nL . . . Ci,nR. In most embodiments of this filtering process for computing replacement derivative values, the fitting polynomials is at least of order 4.
  • The parameters of the Savitzky-Golay, lo-pass smoothing filter are:
      • nL—number of nearby pixels to the left of the i-th pixel.
      • nR—number of nearby pixels to the right of the i-th pixel.
      • M—order of the fitting polynomial.
      • Id—order of the derivative (if 0, the function itself).
  • Referring now to FIG. 18, a numeric experiment using a 17-point Savitzky-Golay smoothing filter is illustrated graphically. In the top panel, the simulated data comprises a constant signal interrupted by progressively narrower gaps. The size of gaps in data is shown above as numbers. In the center panel, the simulated data is shown with simulated white Gaussian noise added having a standard deviation of about 0.25. In the bottom panel, the noisy data of the center panel is shown after applying a Savitzky-Golay, lo-pass smoothing filter with nL=8, nR=8, M=6, and ld=0. The horizontal gray bar represents the noise level, computed as/2 times the standard deviation of the noise (about 0.3 in this case).
  • For example, for a lo-pass filter represented by the set of input parameters nL=1, nR=1, M=1, and ld=0, a set of 3 coefficients ci, Ci, and C;+, are determined to have the values ⅓, ⅓, and ⅓, respectively, which is identical to the three point averages of the smoothing filter.
  • Derivative Anti-Correlation
  • Several parts of the detection software use a concept, which the inventors call DAC—Derivative Anti-Correlation. DAC is a function operates by deriving a value of a parameter mdash. If at any point both donor and acceptor derivatives have the same sign, then the value of DAC is set to zero (0). If at any point both the donor and acceptor derivatives have opposite signs, then the value of DAC is set as the product of the acceptor derivative value and the absolute value of the donor derivative value at the point.
  • Referring now to FIG. 19 a, an example of the derivative anti-correlation methodology is illustrated graphically for ideal, non-noisy anti-correlated data. In the top panel, a simulated donor trace having an intensity dip in the middle of the trace is shown. In the center panel, a simulated acceptor trace having an intensity bump, anti-correlated with the donor dip is shown. In the bottom panel, the DAC values for the above signals are shown. The positive peak marks the start of an anti-correlated event, and the negative peak marks the end of the anti-correlated event.
  • Referring now to FIG. 19 b, an example of the derivative anti-correlation methodology is illustrated graphically for moderately noisy data. In the data having a moderate noise level, the peaks are well above the standard deviation of the DAC function, so the DAC helps to detect even short anti-correlated events, that would be otherwise undetected.
  • Referring now to FIG. 19 c, an example of the derivative anti-correlation methodology is illustrated graphically for heavily noisy data. If the noise level is too high, the DAC is unable to detect anti-correlated events, because the peaks are comparable to the standard deviation of the noise level. Short events become very difficult to detect, while long events are detected by other means, such as heavy data smoothing and analyzing average signal intensities over long periods of time.
  • Because the final goal of the detection software is to detect ant;-correlated events, when a dip in the donor signal intensity occurs synchronously with a bump in an acceptor signal intensity, the DAC is effective even for short signals, provided that their shape is not too much distorted or attenuated by the noise level.
  • Smart Smoothing Algorithm
  • A standard Savitzky-Golay (S-G) smoothing filter (as described above) does not produce satisfactory results for heavily-noisy data, even if it contains some obvious long-lived signals. An S-G filter designed for heavy smoothing (e.g., larger number of samples, lower polynomial order), while removing enough noise, distorts the boundaries of the rectangular-shaped signals, making it nearly impossible to detect the correct boundaries. Also, the filter tends to lose shorter signals.
  • An S-G filter, designed for fine smoothing (e.g., smaller number of samples, higher polynomial order), on the other hand, tends to leave a great deal of noise, which can break down large signals into series of smaller ones, and also create many false positives in between the real signals.
  • The principal idea of the smart smoother of this invention is to balance the two S-G filters so that on flat segments, the heavy smoother takes precedence, removing most of the noise, while in areas where the intensity is rapidly changing, the fine smoother is invoked, preserving the exact signal boundaries, critical for detecting anti-correlated spFRET signals.
  • The balance function b is computed out of the derivative D of the original data, computed with an S-G filter with the settings somewhere in the middle between the settings for heavy smoother and fine smoother. For example, if the heavy smoother has nL=nR=32 and M=2 and the fine smoother has nL=nR=8 and M=6, then the derivative filter would have nL=nR=16 and M=4.
  • The next step is to convert the derivative, a function that theoretically ranges from ˜˜ to +−, into a balance function, which ranges from zero (0) to one (1), where the balance function has the value of zero (0) when the derivative is zero, and the value of one (1) when the derivative goes to infinity in either direction.
  • The balance function b is computed as:

  • b i=1−exp(−F i 2 /Var),
  • where Var is the variance given by YFi2/n, where n is the total number of data samples.
  • After that, the balance function is smoothed with the same “middle” S-G parameters, as the ones for derivative. After the smoothing, values of the balance function may be out of range zero to one at a few points, so an additional process is applied to force the values within the zero to one boundaries. The resulting balance function is shown in the middle panel in FIG. 20 comprising a solid curve with a shaded area below the curve (light-red in a color image) and a shaded area above the curve (light-blue in a color image).
  • Looking at FIG. 20, the top three panels represent a simulated data trace. The top most panel comprising six high intensity bumps of different lengths with the length shown below the bumps having a SN of 1.35. The next panel represents the simulated data trace with Gaussian noise. The next panel represent the noisy data trace after Savitzky-Golay filter having nL=32,nR=32, M=4. The gray bar about the solid zero line denotes the noise level, computed as standard deviation of a separate noise-only trace, generated with the same settings as used with the original signal. The solid horizontal bars below the gray area represent the data segments of the smoothed curve, i.e, the segments of the curve that have values above the gray bar. The next panel is a graph of the balance function ranging from 0 to 1, computed from the derivative of the noisy signal, second panel from top, obtained by a Savitzky-Golay process with nL=16, nR=16, M=4. The next panel ***; e) Red graph. Noisy signal (b) after Savitzky-Golay filter with nL=8,nR=8, M=6. Grey area is the noise level, same as in (c). Red bars below is the lifetime, similar to (c); f) Green graph. Smart-smoothed signal, the combined signal, computed as b *Fs+(1-b)*Fr, where Fs is the light-smoothed data (e),Fr is the heavy-smoothed data (c),b is the balance function (d). Grey area is the noise level (same as above), green bars below is the lifetime.
  • The last step is just to compute “balanced” function as:

  • Fsm;=Fs;*b+Fr;*(1−b;),
  • where Fs is the fine smoothed data, Fr is the heavy smoothed data, and the result Fsm is the smart smoothed data.
  • All references cited herein are incorporated by reference. While this invention has been described fully and completely, it should be understood that, within the scope of the appended claims, the invention may be practiced otherwise than as specifically described. Although the invention has been disclosed with reference to its preferred embodiments, from reading this description those of skill in the art may appreciate changes and modification that may be made which do not depart from the scope and spirit of the invention as described above and claimed hereafter.

Claims (11)

1. A method for detecting and analyzing events at the single molecule level, where the method comprising the steps of:
collecting data corresponding to changes in a detectable property of a detectable entity in a sample over time within a viewing volume or field of a detection system, where the data comprises a collection of data frames associated with a plurality of data channels, where the data channels represent different features of the detectable property, and where each frame is an image of the viewing field over a data collection interval comprising a set of data elements representable in a column row matrix format, and where the detectable entity is selected from the group consisting of an atom, a molecule, an ion, an assemblage of atoms, molecules and/or ions, a plurality of atoms, a plurality of molecules, a plurality of ions, and/or a plurality of assemblages,
forwarding the data frames to a processing unit, where the data frames are stored along with data associated with the detection of the detectable property including sample data, time/data and detector data,
generating a calibration transformation adapted to register data elements in one data channel with corresponding data elements in the other data channels,
averaging a value of the detectable property for each data element over all of the frames from one data channel to produce a averaged image, where each data element in the averaged image includes the average value of detectable property across all the frames,
identifying data elements in the averaged images having a value of the detectable property above a threshold value to produce a list of potential active entity candidates,
retrieving and storing candidate data traces, one trace for each data element in a n×n data element array centered at each identified candidate,
retrieving and storing noise data traces from a plurality of data elements within an m×m data element array centered at each identified candidate excluding the data elements of the n×n array, where the noise data traces represent local noise associated with each candidate,
filtering the candidates to find candidates that satisfy a set of selection criteria or passing a set of rejection criteria,
retrieving and storing other channel data traces, one trace for each data element in a n\n data element array centered at data element of the other data channels corresponding to the candidate,
retrieving and storing other channel noise data traces from a plurality of data elements within an m×m data element array centered at data element of the other data channels corresponding to the candidate excluding the data elements of the n×n array, where the noise data traces represent local noise associated with other data channels,
smoothing the traces and forming hybrid traces, one hybrid trace for each candidate, for each candidate noise, for each other channel corresponding candidate data and for each other channel noise data,
identifying hybrid traces that evidence correlated or anti-correlated changes in the detectable property for the candidate traces and the corresponding other channel traces to produce an event list, and
classifying the event list into a class of events, and
storing the classified list of events.
2. The method of claim 1, wherein at least one component of the entities include a fluorophore and the detectable property is fluorescence.
3. The method of claim 1, wherein at least one component of the entities include a donor fluorophore, at least one component of the entities include an acceptor fluorophore and the detectable property is fluorescence derived from fluorescence resonance energy transfer.
4. The method of claim 1, wherein each detectable entity comprises replication complex including a polymerase, a template, a primer and nucleotides for the polymerase, where the polymerase, template, and/or primer includes a donor fluorophore and at least one nucleotide type including an acceptor fluorophore forming a FRET pair and the detectable property is fluorescence derived from fluorescence resonance energy transfer.
5. The method of claim 4, where the identified hybrid traces are anti-correlated.
6. The method of claim 1, wherein each detectable entity comprises replication complex including a polymerase, a template, a primer and nucleotides for the polymerase, where the polymerase, template, and/or primer includes a donor fluorophore and at least two nucleotide types including acceptor fluorophores forming a FRET pair, where the acceptor fluorophores are the same or different, and the detectable property is fluorescence derived from fluorescence resonance energy transfer.
7. The method of claim 6, where the identified hybrid traces are anti-correlated.
8. The method of claim 1, wherein each detectable entity comprises replication complex including a polymerase, a template, a primer and nucleotides for the polymerase, where the polymerase, template, and/or primer includes a donor fluorophore and at least three nucleotide types including acceptor fluorophores forming a FRET pair, where the acceptor fluorophores are the same or different, and the detectable property is fluorescence derived from fluorescence resonance energy transfer.
9. The method of claim 8, where the identified hybrid traces are anti-correlated.
10. The method of claim 1, wherein each detectable entity comprises replication complex including a polymerase, a template, a primer and nucleotides for the polymerase, where the polymerase, template, and/or primer includes a donor fluorophore and each nucleotide type including acceptor fluorophores forming a FRET pair, where the acceptor fluorophores are the same or different, and the detectable property is fluorescence derived from fluorescence resonance energy transfer.
11. The method of claim 10, where the identified hybrid traces are anti-correlated.
US12/686,372 2001-07-09 2010-01-12 Method for analyzing dynamic detectable events at the single molecule level Abandoned US20100235105A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/686,372 US20100235105A1 (en) 2001-07-09 2010-01-12 Method for analyzing dynamic detectable events at the single molecule level

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US09/901,872 US6681682B2 (en) 2000-01-17 2001-07-09 Hydraulic cylinder
US10/007,621 US7211414B2 (en) 2000-12-01 2001-12-03 Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US76569306P 2006-02-06 2006-02-06
US11/671,956 US7668697B2 (en) 2006-02-06 2007-02-06 Method for analyzing dynamic detectable events at the single molecule level
US12/686,372 US20100235105A1 (en) 2001-07-09 2010-01-12 Method for analyzing dynamic detectable events at the single molecule level

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/671,956 Continuation US7668697B2 (en) 2001-07-09 2007-02-06 Method for analyzing dynamic detectable events at the single molecule level

Publications (1)

Publication Number Publication Date
US20100235105A1 true US20100235105A1 (en) 2010-09-16

Family

ID=38620529

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/671,956 Expired - Lifetime US7668697B2 (en) 2001-07-09 2007-02-06 Method for analyzing dynamic detectable events at the single molecule level
US12/686,372 Abandoned US20100235105A1 (en) 2001-07-09 2010-01-12 Method for analyzing dynamic detectable events at the single molecule level

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/671,956 Expired - Lifetime US7668697B2 (en) 2001-07-09 2007-02-06 Method for analyzing dynamic detectable events at the single molecule level

Country Status (1)

Country Link
US (2) US7668697B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012047417A1 (en) * 2010-10-07 2012-04-12 Thermo Finnigan Llc Learned automated spectral peak detection and quantification
WO2012103423A2 (en) * 2011-01-28 2012-08-02 Cornell University Computer-implemented platform for automated fluorescence imaging and kinetic analysis
US9146248B2 (en) 2013-03-14 2015-09-29 Intelligent Bio-Systems, Inc. Apparatus and methods for purging flow cells in nucleic acid sequencing instruments
US9591268B2 (en) 2013-03-15 2017-03-07 Qiagen Waltham, Inc. Flow cell alignment methods and systems
US20210310055A1 (en) * 2017-12-04 2021-10-07 Wisconsin Alumni Research Foundation Systems and methods for identifying sequence information from single nucleic acid molecule measurements

Families Citing this family (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1354064A2 (en) 2000-12-01 2003-10-22 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US7668697B2 (en) 2006-02-06 2010-02-23 Andrei Volkov Method for analyzing dynamic detectable events at the single molecule level
JP3931214B2 (en) * 2001-12-17 2007-06-13 日本アイ・ビー・エム株式会社 Data analysis apparatus and program
US7311794B2 (en) 2004-05-28 2007-12-25 Wafergen, Inc. Methods of sealing micro wells
WO2007109659A2 (en) * 2006-03-21 2007-09-27 Metabolon, Inc. A system, method, and computer program product for analyzing spectrometry data to indentify and quantify individual components in a sample
US11339430B2 (en) 2007-07-10 2022-05-24 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale FET arrays
AU2007334393A1 (en) 2006-12-14 2008-06-26 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale FET arrays
US8262900B2 (en) 2006-12-14 2012-09-11 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale FET arrays
JP5335280B2 (en) * 2008-05-13 2013-11-06 キヤノン株式会社 Alignment processing apparatus, alignment method, program, and storage medium
US20100301398A1 (en) 2009-05-29 2010-12-02 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
WO2010111674A2 (en) 2009-03-27 2010-09-30 Life Technologies Corporation Methods and apparatus for single molecule sequencing using energy transfer detection
US8673627B2 (en) * 2009-05-29 2014-03-18 Life Technologies Corporation Apparatus and methods for performing electrochemical reactions
US20120261274A1 (en) 2009-05-29 2012-10-18 Life Technologies Corporation Methods and apparatus for measuring analytes
US8776573B2 (en) 2009-05-29 2014-07-15 Life Technologies Corporation Methods and apparatus for measuring analytes
WO2011014811A1 (en) 2009-07-31 2011-02-03 Ibis Biosciences, Inc. Capture primers and capture sequence linked solid supports for molecular diagnostic tests
WO2011019993A2 (en) * 2009-08-14 2011-02-17 Epicentre Technologies Corporation METHODS, COMPOSITIONS, AND KITS FOR GENERATING rRNA-DEPLETED SAMPLES OR ISOLATING rRNA FROM SAMPLES
US8965076B2 (en) 2010-01-13 2015-02-24 Illumina, Inc. Data processing system and methods
WO2011108369A1 (en) * 2010-03-01 2011-09-09 オリンパス株式会社 Optical analysis device, optical analysis method, and computer program for optical analysis
JP2013527769A (en) 2010-05-06 2013-07-04 アイビス バイオサイエンシズ インコーポレイティッド Integrated sample preparation system and stabilized enzyme mixture
WO2012003368A2 (en) 2010-06-30 2012-01-05 Life Technologies Corporation Transistor circuits for detection and measurement of chemical reactions and compounds
TWI580955B (en) 2010-06-30 2017-05-01 生命技術公司 Ion-sensing charge-accumulation circuits and methods
WO2012003359A1 (en) 2010-06-30 2012-01-05 Life Technologies Corporation Methods and apparatus for testing isfet arrays
US11307166B2 (en) 2010-07-01 2022-04-19 Life Technologies Corporation Column ADC
CN103168341B (en) 2010-07-03 2016-10-05 生命科技公司 There is the chemosensitive sensor of lightly doped discharger
WO2012036679A1 (en) 2010-09-15 2012-03-22 Life Technologies Corporation Methods and apparatus for measuring analytes
EP2667183A4 (en) 2011-01-20 2017-05-10 Olympus Corporation Photoanalysis method and photoanalysis device using detection of light from single light-emitting particle
US8319181B2 (en) 2011-01-30 2012-11-27 Fei Company System and method for localization of large numbers of fluorescent markers in biological samples
US20130024349A1 (en) * 2011-05-10 2013-01-24 Incredible Signals Inc. Method for trading stocks
US10190986B2 (en) 2011-06-06 2019-01-29 Abbott Laboratories Spatially resolved ligand-receptor binding assays
WO2013031439A1 (en) 2011-08-26 2013-03-07 オリンパス株式会社 Optical analyzer using single light-emitting particle detection, optical analysis method, and computer program for optical analysis
JP6010034B2 (en) 2011-08-30 2016-10-19 オリンパス株式会社 Target particle detection method
US9970984B2 (en) 2011-12-01 2018-05-15 Life Technologies Corporation Method and apparatus for identifying defects in a chemical sensor array
EP2794927B1 (en) 2011-12-22 2017-04-12 Ibis Biosciences, Inc. Amplification primers and methods
US9803231B2 (en) 2011-12-29 2017-10-31 Ibis Biosciences, Inc. Macromolecule delivery to nanowells
EP2800969B1 (en) * 2011-12-30 2019-06-19 DH Technologies Development Pte. Ltd. Intelligent background data acquisition and subtraction
US9222115B2 (en) 2011-12-30 2015-12-29 Abbott Molecular, Inc. Channels with cross-sectional thermal gradients
CN108611398A (en) 2012-01-13 2018-10-02 Data生物有限公司 Genotyping is carried out by new-generation sequencing
US9732387B2 (en) 2012-04-03 2017-08-15 The Regents Of The University Of Michigan Biomarker associated with irritable bowel syndrome and Crohn's disease
ES2683707T3 (en) 2012-05-02 2018-09-27 Ibis Biosciences, Inc. DNA sequencing
US8786331B2 (en) 2012-05-29 2014-07-22 Life Technologies Corporation System for reducing noise in a chemical sensor array
JP6118512B2 (en) * 2012-07-04 2017-04-19 株式会社日立製作所 Biological light measurement device
JP5993237B2 (en) * 2012-07-25 2016-09-14 オリンパス株式会社 Fluorescence observation equipment
EP3699577B1 (en) 2012-08-20 2023-11-08 Illumina, Inc. System for fluorescence lifetime based sequencing
ES2701750T3 (en) 2012-10-16 2019-02-25 Abbott Molecular Inc Procedures for sequencing a nucleic acid
JP6105903B2 (en) * 2012-11-09 2017-03-29 キヤノン株式会社 Image processing apparatus, image processing method, radiation imaging system, and program
US9080968B2 (en) 2013-01-04 2015-07-14 Life Technologies Corporation Methods and systems for point of use removal of sacrificial material
US9841398B2 (en) 2013-01-08 2017-12-12 Life Technologies Corporation Methods for manufacturing well structures for low-noise chemical sensors
US8963216B2 (en) 2013-03-13 2015-02-24 Life Technologies Corporation Chemical sensor with sidewall spacer sensor surface
CN105378107A (en) 2013-03-14 2016-03-02 雅培分子公司 Multiplex methylation-specific amplification systems and methods
US20140264471A1 (en) 2013-03-15 2014-09-18 Life Technologies Corporation Chemical device with thin conductive element
US9890425B2 (en) 2013-03-15 2018-02-13 Abbott Molecular Inc. Systems and methods for detection of genomic copy number changes
US9835585B2 (en) 2013-03-15 2017-12-05 Life Technologies Corporation Chemical sensor with protruded sensor surface
US20140264472A1 (en) 2013-03-15 2014-09-18 Life Technologies Corporation Chemical sensor with consistent sensor surface areas
US20140336063A1 (en) 2013-05-09 2014-11-13 Life Technologies Corporation Windowed Sequencing
US10458942B2 (en) 2013-06-10 2019-10-29 Life Technologies Corporation Chemical sensor array having multiple sensors per well
CN105431759B (en) 2013-07-31 2018-04-13 奥林巴斯株式会社 Utilize the optical microphotograph lens device of single incandescnet particle detection technique, microscopic observation and computer program for micro- sem observation
US20150051088A1 (en) 2013-08-19 2015-02-19 Abbott Molecular Inc. Next-generation sequencing libraries
EP3161157B1 (en) 2014-06-24 2024-03-27 Bio-Rad Laboratories, Inc. Digital pcr barcoding
EP3578668B1 (en) 2014-07-24 2020-12-30 Abbott Molecular Inc. Methods for the detection and analysis of mycobacterium tuberculosis
JP6707520B2 (en) 2014-08-08 2020-06-10 クアンタム−エスアイ インコーポレイテッドQuantum−Si Incorporated Integrated device for time binning of received photons
US10077472B2 (en) 2014-12-18 2018-09-18 Life Technologies Corporation High data rate integrated circuit with power management
WO2016130025A1 (en) * 2015-02-13 2016-08-18 Auckland Uniservices Limited Optical detection of fluorescence
US10208339B2 (en) 2015-02-19 2019-02-19 Takara Bio Usa, Inc. Systems and methods for whole genome amplification
WO2016134342A1 (en) 2015-02-20 2016-08-25 Wafergen, Inc. Method for rapid accurate dispensing, visualization and analysis of single cells
CN107683340A (en) * 2015-05-07 2018-02-09 加利福尼亚太平洋生物科学股份有限公司 Multi-processor pipeline framework
US10174363B2 (en) * 2015-05-20 2019-01-08 Quantum-Si Incorporated Methods for nucleic acid sequencing
US10542961B2 (en) 2015-06-15 2020-01-28 The Research Foundation For The State University Of New York System and method for infrasonic cardiac monitoring
WO2017011538A1 (en) 2015-07-14 2017-01-19 Abbott Molecular Inc. Purification of nucleic acids using copper-titanium oxides
US9843739B2 (en) * 2015-10-07 2017-12-12 Mediatek Singapore Pte. Ltd. Method for correcting flickers in a single-shot multiple-exposure image and associated apparatus
EP3382439A4 (en) * 2015-11-27 2019-12-11 Nikon Corporation Microscope, observation method, and image processing program
WO2017098597A1 (en) 2015-12-09 2017-06-15 オリンパス株式会社 Optical analysis method and optical analysis device employing single light-emitting particle detection
EP3400298B1 (en) 2016-01-08 2024-03-06 Bio-Rad Laboratories, Inc. Multiple beads per droplet resolution
JP6902052B2 (en) 2016-02-08 2021-07-14 アールジーン・インコーポレイテッドRgene, Inc. Multiple ligase compositions, systems, and methods
TWI734748B (en) 2016-02-17 2021-08-01 美商太斯萊特健康股份有限公司 Sensor and device for lifetime imaging and detection applications
WO2018017892A1 (en) 2016-07-21 2018-01-25 Takara Bio Usa, Inc. Multi-z imaging and dispensing with multi-well devices
US11543417B2 (en) 2016-08-29 2023-01-03 Oslo Universitetssykehus Hf ChIP-seq assays
US10366501B2 (en) * 2016-11-07 2019-07-30 The Boeing Company Method and apparatus for performing background image registration
WO2018118971A1 (en) 2016-12-19 2018-06-28 Bio-Rad Laboratories, Inc. Droplet tagging contiguity preserved tagmented dna
JP7149275B2 (en) 2016-12-22 2022-10-06 クアンタム-エスアイ インコーポレイテッド Integrated photodetector with direct binning pixels
CN110446787A (en) 2017-03-24 2019-11-12 生物辐射实验室股份有限公司 General clamp primers
EP4180534A1 (en) 2017-11-02 2023-05-17 Bio-Rad Laboratories, Inc. Transposase-based genomic analysis
US20190241944A1 (en) 2018-01-31 2019-08-08 Bio-Rad Laboratories, Inc. Methods and compositions for deconvoluting partition barcodes
US11512002B2 (en) 2018-04-18 2022-11-29 University Of Virginia Patent Foundation Silica materials and methods of making thereof
WO2019237123A1 (en) * 2018-06-08 2019-12-12 Waters Technologies Corporation Techniques for handling messages in laboratory informatics
US11391626B2 (en) 2018-06-22 2022-07-19 Quantum-Si Incorporated Integrated photodetector with charge storage bin of varied detection time
US20200019840A1 (en) * 2018-07-13 2020-01-16 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for sequential event prediction with noise-contrastive estimation for marked temporal point process
WO2020041293A1 (en) 2018-08-20 2020-02-27 Bio-Rad Laboratories, Inc. Nucleotide sequence generation by barcode bead-colocalization in partitions
US10860197B1 (en) * 2019-07-31 2020-12-08 Microsoft Technology Licensing, Llc Multi-source trace processing in computing systems

Citations (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4654267A (en) * 1982-04-23 1987-03-31 Sintef Magnetic polymer particles and process for the preparation thereof
US4739044A (en) * 1985-06-13 1988-04-19 Amgen Method for derivitization of polynucleotides
US4800159A (en) * 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US4994373A (en) * 1983-01-27 1991-02-19 Enzo Biochem, Inc. Method and structures employing chemically-labelled polynucleotide probes
US4997928A (en) * 1988-09-15 1991-03-05 E. I. Du Pont De Nemours And Company Fluorescent reagents for the preparation of 5'-tagged oligonucleotides
US5001050A (en) * 1989-03-24 1991-03-19 Consejo Superior Investigaciones Cientificas PHφ29 DNA polymerase
US5079352A (en) * 1986-08-22 1992-01-07 Cetus Corporation Purified thermostable enzyme
US5079169A (en) * 1990-05-22 1992-01-07 The Regents Of The Stanford Leland Junior University Method for optically manipulating polymer filaments
US5091652A (en) * 1990-01-12 1992-02-25 The Regents Of The University Of California Laser excited confocal microscope fluorescence scanner and method
US5091310A (en) * 1988-09-23 1992-02-25 Cetus Corporation Structure-independent dna amplification by the polymerase chain reaction
US5188934A (en) * 1989-11-14 1993-02-23 Applied Biosystems, Inc. 4,7-dichlorofluorescein dyes as molecular probes
US5198543A (en) * 1989-03-24 1993-03-30 Consejo Superior Investigaciones Cientificas PHI29 DNA polymerase
US5200313A (en) * 1983-08-05 1993-04-06 Miles Inc. Nucleic acid hybridization assay employing detectable anti-hybrid antibodies
US5302509A (en) * 1989-08-14 1994-04-12 Beckman Instruments, Inc. Method for sequencing polynucleotides
US5403708A (en) * 1992-07-06 1995-04-04 Brennan; Thomas M. Methods and compositions for determining the sequence of nucleic acids
US5403707A (en) * 1993-05-14 1995-04-04 Eastman Kodak Company Diagnostic compositions, elements, methods and test kits for amplification and detection of retroviral DNA using primers having matched melting temperatures
US5492806A (en) * 1987-04-01 1996-02-20 Hyseq, Inc. Method of determining an ordered sequence of subfragments of a nucleic acid fragment by hybridization of oligonucleotide probes
US5494030A (en) * 1993-08-12 1996-02-27 Trustees Of Dartmouth College Apparatus and methodology for determining oxygen in biological systems
US5498523A (en) * 1988-07-12 1996-03-12 President And Fellows Of Harvard College DNA sequencing with pyrophosphatase
US5599675A (en) * 1994-04-04 1997-02-04 Spectragen, Inc. DNA sequencing by stepwise ligation and cleavage
US5599695A (en) * 1995-02-27 1997-02-04 Affymetrix, Inc. Printing molecular library arrays using deprotection agents solely in the vapor phase
US5601982A (en) * 1995-02-07 1997-02-11 Sargent; Jeannine P. Method and apparatus for determining the sequence of polynucleotides
US5605662A (en) * 1993-11-01 1997-02-25 Nanogen, Inc. Active programmable electronic devices for molecular biological analysis and diagnostics
US5707797A (en) * 1993-01-08 1998-01-13 Ctrc Research Foundation Color imaging method for mapping stretched DNA hybridized with fluorescently labeled oligonucleotide probes
US5706805A (en) * 1993-08-12 1998-01-13 Trustees Of Dartmouth College Apparatus and methodology for determining oxygen tension in biological systems
US5707804A (en) * 1994-02-01 1998-01-13 The Regents Of The University Of California Primers labeled with energy transfer coupled dyes for DNA sequencing
US5720928A (en) * 1988-09-15 1998-02-24 New York University Image processing and analysis of individual nucleic acid molecules
US5723332A (en) * 1993-11-26 1998-03-03 British Technology Group Limited Translational enhancer DNA
US5723298A (en) * 1996-09-16 1998-03-03 Li-Cor, Inc. Cycle labeling and sequencing with thermostable polymerases
US5725839A (en) * 1993-08-16 1998-03-10 Hsia; Jen-Chang Compositions and methods utilizing nitroxides in combination with biocompatible macromolecules for ERI or MRI
US5858671A (en) * 1996-11-01 1999-01-12 The University Of Iowa Research Foundation Iterative and regenerative DNA sequencing method
US5863727A (en) * 1996-05-03 1999-01-26 The Perkin-Elmer Corporation Energy transfer dyes with enhanced fluorescence
US5866336A (en) * 1996-07-16 1999-02-02 Oncor, Inc. Nucleic acid amplification oligonucleotides with molecular energy transfer labels and methods based thereon
US5869255A (en) * 1994-02-01 1999-02-09 The Regents Of The University Of California Probes labeled with energy transfer couples dyes exemplified with DNA fragment analysis
US5888792A (en) * 1997-07-11 1999-03-30 Incyte Pharmaceuticals, Inc. ATP-dependent RNA helicase protein
US6015714A (en) * 1995-03-17 2000-01-18 The United States Of America As Represented By The Secretary Of Commerce Characterization of individual polymer molecules based on monomer-interface interactions
US6020481A (en) * 1996-04-01 2000-02-01 The Perkin-Elmer Corporation Asymmetric benzoxanthene dyes
US6027709A (en) * 1997-01-10 2000-02-22 Li-Cor Inc. Fluorescent cyanine dyes
US6027890A (en) * 1996-01-23 2000-02-22 Rapigene, Inc. Methods and compositions for enhancing sensitivity in the analysis of biological-based assays
US6200748B1 (en) * 1984-01-16 2001-03-13 California Institute Of Technology Tagged extendable primers and extension products
US6207421B1 (en) * 1984-03-29 2001-03-27 Li-Cor, Inc. DNA sequencing and DNA terminators
US6207229B1 (en) * 1997-11-13 2001-03-27 Massachusetts Institute Of Technology Highly luminescent color-selective materials and method of making thereof
US20020025529A1 (en) * 1999-06-28 2002-02-28 Stephen Quake Methods and apparatus for analyzing polynucleotide sequences
US6355420B1 (en) * 1997-02-12 2002-03-12 Us Genomics Methods and products for analyzing polymers
US6362002B1 (en) * 1995-03-17 2002-03-26 President And Fellows Of Harvard College Characterization of individual polymer molecules based on monomer-interface interactions
WO2002044425A2 (en) * 2000-12-01 2002-06-06 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US6514829B1 (en) * 2001-03-12 2003-02-04 Advanced Micro Devices, Inc. Method of fabricating abrupt source/drain junctions
US6524829B1 (en) * 1998-09-30 2003-02-25 Molecular Machines & Industries Gmbh Method for DNA- or RNA-sequencing
US20030044781A1 (en) * 1999-05-19 2003-03-06 Jonas Korlach Method for sequencing nucleic acid molecules
US6534269B2 (en) * 2000-02-23 2003-03-18 City Of Hope Pyrophosphorolysis activated polymerization (PAP): application to allele-specific amplification and nucleic acid sequence determination
US20030055257A1 (en) * 1999-11-03 2003-03-20 Applera Corporation Water-soluble rhodamine dye conjugates
US20030059822A1 (en) * 2001-09-18 2003-03-27 U.S. Genomics, Inc. Differential tagging of polymers for high resolution linear analysis
US20040009612A1 (en) * 2002-05-28 2004-01-15 Xiaojian Zhao Methods and apparati using single polymer analysis
US20040015964A1 (en) * 2001-04-25 2004-01-22 Mccann Thomas Matthew Methods and systems for load sharing signaling messages among signaling links in networks utilizing international signaling protocols
US6696022B1 (en) * 1999-08-13 2004-02-24 U.S. Genomics, Inc. Methods and apparatuses for stretching polymers
US20040053399A1 (en) * 2002-07-17 2004-03-18 Rudolf Gilmanshin Methods and compositions for analyzing polymers using chimeric tags
US20050032076A1 (en) * 1998-05-01 2005-02-10 Arizona Board Of Regents Method of determining the nucleotide sequence of oligonucleotides and dna molecules
US20050042649A1 (en) * 1998-07-30 2005-02-24 Shankar Balasubramanian Arrayed biomolecules and their use in sequencing
US20050042633A1 (en) * 2003-04-08 2005-02-24 Li-Cor, Inc. Composition and method for nucleic acid sequencing
US20050042665A1 (en) * 2003-08-21 2005-02-24 U.S. Genomics, Inc. Quantum dots and methods of use thereof
US6869764B2 (en) * 2000-06-07 2005-03-22 L--Cor, Inc. Nucleic acid sequencing using charge-switch nucleotides
US6982146B1 (en) * 1999-08-30 2006-01-03 The United States Of America As Represented By The Department Of Health And Human Services High speed parallel molecular nucleic acid sequencing
US6982186B2 (en) * 2003-09-25 2006-01-03 Dongbuanam Semiconductor Inc. CMOS image sensor and method for manufacturing the same
US20060019267A1 (en) * 2004-02-19 2006-01-26 Stephen Quake Methods and kits for analyzing polynucleotide sequences
US20060024711A1 (en) * 2004-07-02 2006-02-02 Helicos Biosciences Corporation Methods for nucleic acid amplification and sequence determination
US6995274B2 (en) * 2000-09-19 2006-02-07 Li-Cor, Inc. Cyanine dyes
US7005518B2 (en) * 2002-10-25 2006-02-28 Li-Cor, Inc. Phthalocyanine dyes
US20060046258A1 (en) * 2004-02-27 2006-03-02 Lapidus Stanley N Applications of single molecule sequencing
US7008766B1 (en) * 1997-07-28 2006-03-07 Medical Biosystems, Ltd. Nucleic acid sequence analysis
US20060057565A1 (en) * 2000-09-11 2006-03-16 Jingyue Ju Combinatorial fluorescence energy transfer tags and uses thereof
US20060063173A1 (en) * 2000-06-07 2006-03-23 Li-Cor, Inc. Charge switch nucleotides
US20060063264A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Apparatus and method for performing nucleic acid analysis
US20060060766A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Apparatus and methods for optical analysis of molecules
US20070020772A1 (en) * 2002-04-16 2007-01-25 Princeton University Gradient structures interfacing microfluidics and nanofluidics, methods for fabrication and uses thereof
US7169560B2 (en) * 2003-11-12 2007-01-30 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
US20070036502A1 (en) * 2001-09-27 2007-02-15 Levene Michael J Zero-mode waveguides
US20070042398A1 (en) * 2005-06-30 2007-02-22 Li-Cor, Inc. Cyanine dyes and methods of use
US20070044538A1 (en) * 2005-09-01 2007-03-01 Li-Cor, Inc. Gas flux system chamber design and positioning method
US20070048748A1 (en) * 2004-09-24 2007-03-01 Li-Cor, Inc. Mutant polymerases for sequencing and genotyping
US20080032307A1 (en) * 2004-07-28 2008-02-07 Helicos Biosciences, Inc. Use of Single-Stranded Nucleic Acid Binding Proteins In Sequencing
US7329496B2 (en) * 1990-12-06 2008-02-12 Affymetrix, Inc. Sequencing of surface immobilized polymers utilizing microflourescence detection
US7329492B2 (en) * 2000-07-07 2008-02-12 Visigen Biotechnologies, Inc. Methods for real-time single molecule sequence determination
US20080076189A1 (en) * 2006-03-30 2008-03-27 Visigen Biotechnologies, Inc. Modified surfaces for the detection of biomolecules at the single molecule level
US20080076123A1 (en) * 2006-09-27 2008-03-27 Helicos Biosciences Corporation Polymerase variants for DNA sequencing
US7482120B2 (en) * 2005-01-28 2009-01-27 Helicos Biosciences Corporation Methods and compositions for improving fidelity in a nucleic acid synthesis reaction
US20090053705A1 (en) * 2006-04-14 2009-02-26 Helicos Biosciences Corporation Methods for increasing accuracy of nucleic acid sequencing
US7501245B2 (en) * 1999-06-28 2009-03-10 Helicos Biosciences Corp. Methods and apparatuses for analyzing polynucleotide sequences
US20090075252A1 (en) * 2006-04-14 2009-03-19 Helicos Biosciences Corporation Methods for increasing accuracy of nucleic acid sequencing
US7666593B2 (en) * 2005-08-26 2010-02-23 Helicos Biosciences Corporation Single molecule sequencing of captured nucleic acids
US7668697B2 (en) * 2006-02-06 2010-02-23 Andrei Volkov Method for analyzing dynamic detectable events at the single molecule level
US7678894B2 (en) * 2007-05-18 2010-03-16 Helicos Biosciences Corporation Nucleotide analogs

Family Cites Families (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4711955A (en) 1981-04-17 1987-12-08 Yale University Modified nucleotides and methods of preparing and using same
US5241060A (en) 1982-06-23 1993-08-31 Enzo Diagnostics, Inc. Base moiety-labeled detectable nucleatide
US5230781A (en) 1984-03-29 1993-07-27 Li-Cor, Inc. Sequencing near infrared and infrared fluorescence labeled DNA for detecting using laser diodes
US5366603A (en) 1984-03-29 1994-11-22 Li-Cor, Inc. Sequencing near infrared and infrared fluorescence labeled DNA for detecting useing laser diodes
US6004446A (en) 1984-03-29 1999-12-21 Li-Cor, Inc. DNA Sequencing
US4729947A (en) 1984-03-29 1988-03-08 The Board Of Regents Of The University Of Nebraska DNA sequencing
US5360523A (en) 1984-03-29 1994-11-01 Li-Cor, Inc. DNA sequencing
US6086737A (en) 1984-03-29 2000-07-11 Li-Cor, Inc. Sequencing near infrared and infrared fluorescence labeled DNA for detecting using laser diodes and suitable labels therefor
US5571388A (en) 1984-03-29 1996-11-05 Li-Cor, Inc. Sequencing near infrared and infrared fluorescense labeled DNA for detecting using laser diodes and suitable labels thereof
CA1338457C (en) 1986-08-22 1996-07-16 Henry A. Erlich Purified thermostable enzyme
US5606040A (en) * 1987-10-30 1997-02-25 American Cyanamid Company Antitumor and antibacterial substituted disulfide derivatives prepared from compounds possessing a methyl-trithio group
US5401847A (en) 1990-03-14 1995-03-28 Regents Of The University Of California DNA complexes with dyes designed for energy transfer as fluorescent markers
US6004744A (en) 1991-03-05 1999-12-21 Molecular Tool, Inc. Method for determining nucleotide identity through extension of immobilized primer
US5405747A (en) 1991-09-25 1995-04-11 The Regents Of The University Of California Office Of Technology Transfer Method for rapid base sequencing in DNA and RNA with two base labeling
US6048690A (en) 1991-11-07 2000-04-11 Nanogen, Inc. Methods for electronic fluorescent perturbation for analysis and electronic perturbation catalysis for synthesis
US5470705A (en) 1992-04-03 1995-11-28 Applied Biosystems, Inc. Probe composition containing a binding domain and polymer chain and methods of use
US5503980A (en) 1992-11-06 1996-04-02 Trustees Of Boston University Positional sequencing by hybridization
EP1262564A3 (en) 1993-01-07 2004-03-31 Sequenom, Inc. Dna sequencing by mass spectrometry
US5677196A (en) 1993-05-18 1997-10-14 University Of Utah Research Foundation Apparatus and methods for multi-analyte homogeneous fluoro-immunoassays
WO1995006138A1 (en) 1993-08-25 1995-03-02 The Regents Of The University Of California Microscopic method for detecting micromotions
US6448944B2 (en) * 1993-10-22 2002-09-10 Kopin Corporation Head-mounted matrix display
US5470710A (en) 1993-10-22 1995-11-28 University Of Utah Automated hybridization/imaging device for fluorescent multiplex DNA sequencing
US5512462A (en) 1994-02-25 1996-04-30 Hoffmann-La Roche Inc. Methods and reagents for the polymerase chain reaction amplification of long DNA sequences
US6593148B1 (en) 1994-03-01 2003-07-15 Li-Cor, Inc. Cyanine dye compounds and labeling methods
US5695934A (en) 1994-10-13 1997-12-09 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
US5961923A (en) 1995-04-25 1999-10-05 Irori Matrices with memories and uses thereof
US5856174A (en) 1995-06-29 1999-01-05 Affymetrix, Inc. Integrated nucleic acid diagnostic device
US5661028A (en) 1995-09-29 1997-08-26 Lockheed Martin Energy Systems, Inc. Large scale DNA microsequencing device
US5972603A (en) 1996-02-09 1999-10-26 President And Fellows Of Harvard College DNA polymerase with modified processivity
US5846727A (en) 1996-06-06 1998-12-08 Board Of Supervisors Of Louisiana State University And Agricultural & Mechanical College Microsystem for rapid DNA sequencing
US6403311B1 (en) 1997-02-12 2002-06-11 Us Genomics Methods of analyzing polymers using ordered label strategies
US6485944B1 (en) 1997-10-10 2002-11-26 President And Fellows Of Harvard College Replica amplification of nucleic acid arrays
CA2330673C (en) 1998-05-01 2009-05-26 Arizona Board Of Regents Method of determining the nucleotide sequence of oligonucleotides and dna molecules
US6263286B1 (en) 1998-08-13 2001-07-17 U.S. Genomics, Inc. Methods of analyzing polymers using a spatial network of fluorophores and fluorescence resonance energy transfer
US6210896B1 (en) 1998-08-13 2001-04-03 Us Genomics Molecular motors
US6280939B1 (en) 1998-09-01 2001-08-28 Veeco Instruments, Inc. Method and apparatus for DNA sequencing using a local sensitive force detector
US6221592B1 (en) 1998-10-20 2001-04-24 Wisconsin Alumi Research Foundation Computer-based methods and systems for sequencing of individual nucleic acid molecules
CA2355816C (en) 1998-12-14 2007-10-30 Li-Cor, Inc. A system and methods for nucleic acid sequencing of single molecules by polymerase synthesis
JP2002537858A (en) 1999-03-10 2002-11-12 エーエスエム サイエンティフィック, インコーポレイテッド Methods for direct sequencing of nucleic acids
US6399335B1 (en) 1999-11-16 2002-06-04 Advanced Research And Technology Institute, Inc. γ-phosphoester nucleoside triphosphates
US6329178B1 (en) 2000-01-14 2001-12-11 University Of Washington DNA polymerase mutant having one or more mutations in the active site
US20020168678A1 (en) 2000-06-07 2002-11-14 Li-Cor, Inc. Flowcell system for nucleic acid sequencing
US20070172866A1 (en) 2000-07-07 2007-07-26 Susan Hardin Methods for sequence determination using depolymerizing agent
US7597878B2 (en) 2000-09-19 2009-10-06 Li-Cor, Inc. Optical fluorescent imaging
EP1368497A4 (en) 2001-03-12 2007-08-15 California Inst Of Techn Methods and apparatus for analyzing polynucleotide sequences by asynchronous base extension
AU2002258997A1 (en) 2001-04-24 2002-11-05 Li-Cor, Inc. Polymerases with charge-switch activity and methods of generating such polymerases
US7118907B2 (en) 2001-06-06 2006-10-10 Li-Cor, Inc. Single molecule detection systems and methods
US20040161741A1 (en) 2001-06-30 2004-08-19 Elazar Rabani Novel compositions and processes for analyte detection, quantification and amplification
US20030064400A1 (en) 2001-08-24 2003-04-03 Li-Cor, Inc. Microfluidics system for single molecule DNA sequencing
US7041812B2 (en) 2001-08-29 2006-05-09 Amersham Biosciences Corp Labeled nucleoside polyphosphates
US7033762B2 (en) 2001-08-29 2006-04-25 Amersham Biosciences Corp Single nucleotide amplification and detection by polymerase
US7223541B2 (en) 2001-08-29 2007-05-29 Ge Healthcare Bio-Sciences Corp. Terminal-phosphate-labeled nucleotides and methods of use
DE60335144D1 (en) 2003-02-05 2011-01-05 Ge Healthcare Bio Sciences nucleic acid amplification
US7462452B2 (en) 2004-04-30 2008-12-09 Pacific Biosciences Of California, Inc. Field-switch sequencing
BRPI0511293A (en) 2004-05-21 2007-12-04 Halliburton Energy Serv Inc method for measuring a formation property
US7767394B2 (en) 2005-02-09 2010-08-03 Pacific Biosciences Of California, Inc. Nucleotide compositions and uses thereof
US7130041B2 (en) 2005-03-02 2006-10-31 Li-Cor, Inc. On-chip spectral filtering using CCD array for imaging and spectroscopy
EP1969153A2 (en) 2005-11-28 2008-09-17 Pacific Biosciences of California, Inc. Uniform surfaces for hybrid material substrates and methods for making and using same
CA2640441C (en) 2006-02-15 2015-11-24 Ahmed Bouzid Fluorescence filtering system and method for molecular imaging

Patent Citations (100)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4654267A (en) * 1982-04-23 1987-03-31 Sintef Magnetic polymer particles and process for the preparation thereof
US4994373A (en) * 1983-01-27 1991-02-19 Enzo Biochem, Inc. Method and structures employing chemically-labelled polynucleotide probes
US5200313A (en) * 1983-08-05 1993-04-06 Miles Inc. Nucleic acid hybridization assay employing detectable anti-hybrid antibodies
US6200748B1 (en) * 1984-01-16 2001-03-13 California Institute Of Technology Tagged extendable primers and extension products
US6207421B1 (en) * 1984-03-29 2001-03-27 Li-Cor, Inc. DNA sequencing and DNA terminators
US4739044A (en) * 1985-06-13 1988-04-19 Amgen Method for derivitization of polynucleotides
US4800159A (en) * 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US5079352A (en) * 1986-08-22 1992-01-07 Cetus Corporation Purified thermostable enzyme
US5492806A (en) * 1987-04-01 1996-02-20 Hyseq, Inc. Method of determining an ordered sequence of subfragments of a nucleic acid fragment by hybridization of oligonucleotide probes
US5498523A (en) * 1988-07-12 1996-03-12 President And Fellows Of Harvard College DNA sequencing with pyrophosphatase
US5720928A (en) * 1988-09-15 1998-02-24 New York University Image processing and analysis of individual nucleic acid molecules
US4997928A (en) * 1988-09-15 1991-03-05 E. I. Du Pont De Nemours And Company Fluorescent reagents for the preparation of 5'-tagged oligonucleotides
US5091310A (en) * 1988-09-23 1992-02-25 Cetus Corporation Structure-independent dna amplification by the polymerase chain reaction
US5198543A (en) * 1989-03-24 1993-03-30 Consejo Superior Investigaciones Cientificas PHI29 DNA polymerase
US5001050A (en) * 1989-03-24 1991-03-19 Consejo Superior Investigaciones Cientificas PHφ29 DNA polymerase
US5302509A (en) * 1989-08-14 1994-04-12 Beckman Instruments, Inc. Method for sequencing polynucleotides
US5188934A (en) * 1989-11-14 1993-02-23 Applied Biosystems, Inc. 4,7-dichlorofluorescein dyes as molecular probes
US5091652A (en) * 1990-01-12 1992-02-25 The Regents Of The University Of California Laser excited confocal microscope fluorescence scanner and method
US5079169A (en) * 1990-05-22 1992-01-07 The Regents Of The Stanford Leland Junior University Method for optically manipulating polymer filaments
US7329496B2 (en) * 1990-12-06 2008-02-12 Affymetrix, Inc. Sequencing of surface immobilized polymers utilizing microflourescence detection
US5403708A (en) * 1992-07-06 1995-04-04 Brennan; Thomas M. Methods and compositions for determining the sequence of nucleic acids
US5707797A (en) * 1993-01-08 1998-01-13 Ctrc Research Foundation Color imaging method for mapping stretched DNA hybridized with fluorescently labeled oligonucleotide probes
US5403707A (en) * 1993-05-14 1995-04-04 Eastman Kodak Company Diagnostic compositions, elements, methods and test kits for amplification and detection of retroviral DNA using primers having matched melting temperatures
US5494030A (en) * 1993-08-12 1996-02-27 Trustees Of Dartmouth College Apparatus and methodology for determining oxygen in biological systems
US5706805A (en) * 1993-08-12 1998-01-13 Trustees Of Dartmouth College Apparatus and methodology for determining oxygen tension in biological systems
US5725839A (en) * 1993-08-16 1998-03-10 Hsia; Jen-Chang Compositions and methods utilizing nitroxides in combination with biocompatible macromolecules for ERI or MRI
US5605662A (en) * 1993-11-01 1997-02-25 Nanogen, Inc. Active programmable electronic devices for molecular biological analysis and diagnostics
US5723332A (en) * 1993-11-26 1998-03-03 British Technology Group Limited Translational enhancer DNA
US5869255A (en) * 1994-02-01 1999-02-09 The Regents Of The University Of California Probes labeled with energy transfer couples dyes exemplified with DNA fragment analysis
US5707804A (en) * 1994-02-01 1998-01-13 The Regents Of The University Of California Primers labeled with energy transfer coupled dyes for DNA sequencing
US5599675A (en) * 1994-04-04 1997-02-04 Spectragen, Inc. DNA sequencing by stepwise ligation and cleavage
US5601982A (en) * 1995-02-07 1997-02-11 Sargent; Jeannine P. Method and apparatus for determining the sequence of polynucleotides
US5599695A (en) * 1995-02-27 1997-02-04 Affymetrix, Inc. Printing molecular library arrays using deprotection agents solely in the vapor phase
US6015714A (en) * 1995-03-17 2000-01-18 The United States Of America As Represented By The Secretary Of Commerce Characterization of individual polymer molecules based on monomer-interface interactions
US6362002B1 (en) * 1995-03-17 2002-03-26 President And Fellows Of Harvard College Characterization of individual polymer molecules based on monomer-interface interactions
US6027890A (en) * 1996-01-23 2000-02-22 Rapigene, Inc. Methods and compositions for enhancing sensitivity in the analysis of biological-based assays
US7179906B2 (en) * 1996-04-01 2007-02-20 Applera Corporation Asymmetric benzoxanthene dyes
US6020481A (en) * 1996-04-01 2000-02-01 The Perkin-Elmer Corporation Asymmetric benzoxanthene dyes
US5863727A (en) * 1996-05-03 1999-01-26 The Perkin-Elmer Corporation Energy transfer dyes with enhanced fluorescence
US5866336A (en) * 1996-07-16 1999-02-02 Oncor, Inc. Nucleic acid amplification oligonucleotides with molecular energy transfer labels and methods based thereon
US5723298A (en) * 1996-09-16 1998-03-03 Li-Cor, Inc. Cycle labeling and sequencing with thermostable polymerases
US5858671A (en) * 1996-11-01 1999-01-12 The University Of Iowa Research Foundation Iterative and regenerative DNA sequencing method
US6027709A (en) * 1997-01-10 2000-02-22 Li-Cor Inc. Fluorescent cyanine dyes
US6355420B1 (en) * 1997-02-12 2002-03-12 Us Genomics Methods and products for analyzing polymers
US5888792A (en) * 1997-07-11 1999-03-30 Incyte Pharmaceuticals, Inc. ATP-dependent RNA helicase protein
US7008766B1 (en) * 1997-07-28 2006-03-07 Medical Biosystems, Ltd. Nucleic acid sequence analysis
US6207229B1 (en) * 1997-11-13 2001-03-27 Massachusetts Institute Of Technology Highly luminescent color-selective materials and method of making thereof
US7645596B2 (en) * 1998-05-01 2010-01-12 Arizona Board Of Regents Method of determining the nucleotide sequence of oligonucleotides and DNA molecules
US20050032076A1 (en) * 1998-05-01 2005-02-10 Arizona Board Of Regents Method of determining the nucleotide sequence of oligonucleotides and dna molecules
US20050042649A1 (en) * 1998-07-30 2005-02-24 Shankar Balasubramanian Arrayed biomolecules and their use in sequencing
US6524829B1 (en) * 1998-09-30 2003-02-25 Molecular Machines & Industries Gmbh Method for DNA- or RNA-sequencing
US20060057606A1 (en) * 1999-05-19 2006-03-16 Jonas Korlach Reagents containing terminal-phosphate-labeled nucleotides for nucleic acid sequencing
US20030044781A1 (en) * 1999-05-19 2003-03-06 Jonas Korlach Method for sequencing nucleic acid molecules
US7485424B2 (en) * 1999-05-19 2009-02-03 Cornell Research Foundation, Inc. Labeled nucleotide phosphate (NP) probes
US7501245B2 (en) * 1999-06-28 2009-03-10 Helicos Biosciences Corp. Methods and apparatuses for analyzing polynucleotide sequences
US20020025529A1 (en) * 1999-06-28 2002-02-28 Stephen Quake Methods and apparatus for analyzing polynucleotide sequences
US6696022B1 (en) * 1999-08-13 2004-02-24 U.S. Genomics, Inc. Methods and apparatuses for stretching polymers
US6982146B1 (en) * 1999-08-30 2006-01-03 The United States Of America As Represented By The Department Of Health And Human Services High speed parallel molecular nucleic acid sequencing
US20030055257A1 (en) * 1999-11-03 2003-03-20 Applera Corporation Water-soluble rhodamine dye conjugates
US6534269B2 (en) * 2000-02-23 2003-03-18 City Of Hope Pyrophosphorolysis activated polymerization (PAP): application to allele-specific amplification and nucleic acid sequence determination
US6869764B2 (en) * 2000-06-07 2005-03-22 L--Cor, Inc. Nucleic acid sequencing using charge-switch nucleotides
US20060063173A1 (en) * 2000-06-07 2006-03-23 Li-Cor, Inc. Charge switch nucleotides
US7329492B2 (en) * 2000-07-07 2008-02-12 Visigen Biotechnologies, Inc. Methods for real-time single molecule sequence determination
US20060057565A1 (en) * 2000-09-11 2006-03-16 Jingyue Ju Combinatorial fluorescence energy transfer tags and uses thereof
US6995274B2 (en) * 2000-09-19 2006-02-07 Li-Cor, Inc. Cyanine dyes
US20060063247A1 (en) * 2000-09-19 2006-03-23 Li-Cor, Inc. Cyanine dyes
WO2002044425A2 (en) * 2000-12-01 2002-06-06 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US6514829B1 (en) * 2001-03-12 2003-02-04 Advanced Micro Devices, Inc. Method of fabricating abrupt source/drain junctions
US20040015964A1 (en) * 2001-04-25 2004-01-22 Mccann Thomas Matthew Methods and systems for load sharing signaling messages among signaling links in networks utilizing international signaling protocols
US20030059822A1 (en) * 2001-09-18 2003-03-27 U.S. Genomics, Inc. Differential tagging of polymers for high resolution linear analysis
US20070036502A1 (en) * 2001-09-27 2007-02-15 Levene Michael J Zero-mode waveguides
US20070020772A1 (en) * 2002-04-16 2007-01-25 Princeton University Gradient structures interfacing microfluidics and nanofluidics, methods for fabrication and uses thereof
US20040009612A1 (en) * 2002-05-28 2004-01-15 Xiaojian Zhao Methods and apparati using single polymer analysis
US20040053399A1 (en) * 2002-07-17 2004-03-18 Rudolf Gilmanshin Methods and compositions for analyzing polymers using chimeric tags
US7005518B2 (en) * 2002-10-25 2006-02-28 Li-Cor, Inc. Phthalocyanine dyes
US20050042633A1 (en) * 2003-04-08 2005-02-24 Li-Cor, Inc. Composition and method for nucleic acid sequencing
US20050042665A1 (en) * 2003-08-21 2005-02-24 U.S. Genomics, Inc. Quantum dots and methods of use thereof
US6982186B2 (en) * 2003-09-25 2006-01-03 Dongbuanam Semiconductor Inc. CMOS image sensor and method for manufacturing the same
US7491498B2 (en) * 2003-11-12 2009-02-17 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
US7169560B2 (en) * 2003-11-12 2007-01-30 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
US20060019267A1 (en) * 2004-02-19 2006-01-26 Stephen Quake Methods and kits for analyzing polynucleotide sequences
US20060046258A1 (en) * 2004-02-27 2006-03-02 Lapidus Stanley N Applications of single molecule sequencing
US20060024711A1 (en) * 2004-07-02 2006-02-02 Helicos Biosciences Corporation Methods for nucleic acid amplification and sequence determination
US20080032307A1 (en) * 2004-07-28 2008-02-07 Helicos Biosciences, Inc. Use of Single-Stranded Nucleic Acid Binding Proteins In Sequencing
US20060061754A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Arrays of optical confinements and uses thereof
US20060063264A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Apparatus and method for performing nucleic acid analysis
US20060060766A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Apparatus and methods for optical analysis of molecules
US20060062531A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Fabrication of optical confinements
US20060061755A1 (en) * 2004-09-17 2006-03-23 Stephen Turner Apparatus and method for analysis of molecules
US20070048748A1 (en) * 2004-09-24 2007-03-01 Li-Cor, Inc. Mutant polymerases for sequencing and genotyping
US7482120B2 (en) * 2005-01-28 2009-01-27 Helicos Biosciences Corporation Methods and compositions for improving fidelity in a nucleic acid synthesis reaction
US20070042398A1 (en) * 2005-06-30 2007-02-22 Li-Cor, Inc. Cyanine dyes and methods of use
US7666593B2 (en) * 2005-08-26 2010-02-23 Helicos Biosciences Corporation Single molecule sequencing of captured nucleic acids
US20070044538A1 (en) * 2005-09-01 2007-03-01 Li-Cor, Inc. Gas flux system chamber design and positioning method
US7668697B2 (en) * 2006-02-06 2010-02-23 Andrei Volkov Method for analyzing dynamic detectable events at the single molecule level
US20080076189A1 (en) * 2006-03-30 2008-03-27 Visigen Biotechnologies, Inc. Modified surfaces for the detection of biomolecules at the single molecule level
US20090053705A1 (en) * 2006-04-14 2009-02-26 Helicos Biosciences Corporation Methods for increasing accuracy of nucleic acid sequencing
US20090075252A1 (en) * 2006-04-14 2009-03-19 Helicos Biosciences Corporation Methods for increasing accuracy of nucleic acid sequencing
US20080076123A1 (en) * 2006-09-27 2008-03-27 Helicos Biosciences Corporation Polymerase variants for DNA sequencing
US7678894B2 (en) * 2007-05-18 2010-03-16 Helicos Biosciences Corporation Nucleotide analogs

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012047417A1 (en) * 2010-10-07 2012-04-12 Thermo Finnigan Llc Learned automated spectral peak detection and quantification
US8428889B2 (en) 2010-10-07 2013-04-23 Thermo Finnigan Llc Methods of automated spectral peak detection and quantification having learning mode
WO2012103423A2 (en) * 2011-01-28 2012-08-02 Cornell University Computer-implemented platform for automated fluorescence imaging and kinetic analysis
WO2012103423A3 (en) * 2011-01-28 2013-01-17 Cornell University Computer-implemented platform for automated fluorescence imaging and kinetic analysis
US9501822B2 (en) 2011-01-28 2016-11-22 Cornell University Computer-implemented platform for automated fluorescence imaging and kinetic analysis
US9146248B2 (en) 2013-03-14 2015-09-29 Intelligent Bio-Systems, Inc. Apparatus and methods for purging flow cells in nucleic acid sequencing instruments
US9591268B2 (en) 2013-03-15 2017-03-07 Qiagen Waltham, Inc. Flow cell alignment methods and systems
US10249038B2 (en) 2013-03-15 2019-04-02 Qiagen Sciences, Llc Flow cell alignment methods and systems
US20210310055A1 (en) * 2017-12-04 2021-10-07 Wisconsin Alumni Research Foundation Systems and methods for identifying sequence information from single nucleic acid molecule measurements
US11808701B2 (en) * 2017-12-04 2023-11-07 Wisconsin Alumni Research Foundation Systems and methods for identifying sequence information from single nucleic acid molecule measurements

Also Published As

Publication number Publication date
US20070250274A1 (en) 2007-10-25
US7668697B2 (en) 2010-02-23

Similar Documents

Publication Publication Date Title
US7668697B2 (en) Method for analyzing dynamic detectable events at the single molecule level
EP0681177B1 (en) Method and apparatus for cell counting and cell classification
US10235559B2 (en) Dot detection, color classification of dots and counting of color classified dots
EP2155855B1 (en) Methods and processes for calling bases in sequence by incorporation methods
US20030078703A1 (en) Cytometry analysis system and method using database-driven network of cytometers
US8077960B2 (en) Methods for altering one or more parameters of a measurement system
US11879829B2 (en) Methods and systems for classifying fluorescent flow cytometer data
JP2015500466A (en) Method for identifying microorganisms by mass spectrometry
US7136517B2 (en) Image analysis process for measuring the signal on biochips
US6944549B2 (en) Method and apparatus for automated detection of peaks in spectroscopic data
Forero-Vargas et al. Segmentation, autofocusing, and signature extraction of tuberculosis sputum images
CN112906740B (en) Method for removing batch-to-batch differences aiming at tissue mass spectrum imaging result
EP2824444A1 (en) Determination method, determination device, determination system, and program
US20230393066A1 (en) Information processing system and information processing method
CN114792383A (en) Method and device for identifying digital PCR (polymerase chain reaction) fluorescence image of microfluidic chip
CN113380318A (en) Artificial intelligence assisted flow cytometry 40CD immunophenotyping detection method and system
US6832163B2 (en) Methods of identifying heterogeneous features in an image of an array
US20210358566A1 (en) Resolution indices for detecting heterogeneity in data and methods of use thereof
Nanou et al. Training an automated circulating tumor cell classifier when the true classification is uncertain
US20220113253A1 (en) Method of intrinsic spectral analysis and applications thereof
US20230273191A1 (en) Information processing apparatus, information processing system, information processing method, and program
US20040052412A1 (en) Satellite image enhancement and feature identification using information theory radial-basis bio-marker filter
Stanley et al. Abnormal cell detection using the choquet integral
CN117274739A (en) Base recognition method, training set construction method thereof, gene sequencer and medium
Terentyeva et al. Dynamic Disorder in Single Enzyme Reactions: Facts and Artifacts

Legal Events

Date Code Title Description
AS Assignment

Owner name: VISIGEN BIOTECHNOLOGIES, INC., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VOLKOV, ANDREI;COLBERT, COSTA;PAN, IVAN;AND OTHERS;SIGNING DATES FROM 20070525 TO 20080609;REEL/FRAME:025761/0284

Owner name: LIFE TECHNOLOGIES CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VISIGEN BIOTECHNOLOGIES, INC.;REEL/FRAME:025761/0587

Effective date: 20090107

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION