nanoHUB-U Principles of Nanobiosensors/Lecture 5.2: Putting the pieces together Genome Sequencer – Part 2
========================================
>>
[Slide 1] Welcome back. We are talking about genome sequencing. And, if you remember that before-- until this point, we have had about 27 lectures. Talking about how the molecules diffuse around and land on the sensor surface. And how settling time is an important indicator of the performance of a biosensor. Then we talked about three types of sensors; right? We talked about potentiometric, amperometric and cantilever-based sensors. And we discussed the issue of our selectivity. And then last lecture we essentially put all of them together to show how this genome sequencer. The machines that can read the letters of a human genome, remember, is about 3.2, 3.5 billion letter long DNA. And how it can be read by the genome sequencer based on potentiometric sensors. It uses local generation too, so that the diffusion limits are beat in some way. And also uses the protons instead of-- the DNA diffusion it uses the protons as a proxy so that we don't have any problem with the diffusion limit. And so we have discussed those. Today I want to tell a little bit more about the subtlety associated with the genome sequencer.
[Slide 2] So I will start by talking about the signal amplification by PCR. Some of the issues related with selectivity and sequencing. Signal amplification issues. Signal degradation issues. And the problem of how pieces, all the pieces can be put together before concluding.
[Slide 3] So let's quickly remind ourselves where we were. You may remember that we started with a double-stranded DNA. And the raised-- temperature was raised. And that allowed us it make instead one-- the single-stranded DNA. And the single-stranded DNA was cut into pieces on the order of 100 to 1,000 base pair. That is indicated by different colors. Now, these individual ones were then multiplied. Each one of them 100-- 1,000 base pairs. But they were multiplied or amplified thousands of times. And on average about 10,000 copies were made. So these are like making Xerox copies. And then you remember we put them on a substrate and brought in one of the bases one at a time. And as they bound to individual base pairs along the DNA sequence, then we were able to read it by this method. Now, of course, I have discussed these five things in detail last time. So, therefore, I'm not going into any details. You may wish to review it if you-- if things are not clear. But I want to emphasize that how these amplifications are made. Because, you see, you're making photo copies, let's say 10,000 copies. And then think about it. If the whole genome is, let's say 4 billion base paired, you have cut it on the order of 100 base pairs or so. There will be millions of copies that you have to make, individual pieces. So there would be like 40 million different colors. Each one of them you have to have amplify 10,000 times. How are you going to do that? There's no simple way of sort of doing it. Or is there? That's what I want to show.
[Slide 4] And this is-- amplification is necessary, why? Because, remember, if we don't do amplification then my signal to noise ratio will not be good. This is something we discussed in Lecture 27. Yeah, and so you may want to refer to that. So let me explain how this amplification, signal amplification is done to enhance selectivity.
[Slide 5]You may recall that, well, I mentioned that one way of amplifying the number of DNA you have. Assume that this is 100 base pair. I want to make 10,000 copies of this. How should I do it? In the first cycle of PCR this polymerase chain reaction, it's almost like a nuclear chain reaction. So that's how the-- where the name comes from. So we can, in each cycle we can double it. So after first cycle, 4 copies. After second cycle, 8 copies and so on so forth. And, eventually, the number of cycles-- the number of cycle is N. Then the total number is 2 to the power N. So, for example, here by 35th cycle, you could have 68 million copies. We do not need that many. We need about 10,000. So we don't have to go that far. But, remember, this is ideal case. I told you non-ideal cases, where the bases or the primers may get depleted. Or you may not have allowed enough time for this replication process to complete. So this is the ideal case. So we'll stay with the ideal, understanding there are non-ideal effects.
[Slide 6] And how did each cycle go around that sort of doubled the number? Well, it had three steps, you may remember. One was that, raise the temperature so that it sort of gets divided into two single-stranded DNA from the double stranded DNA. And the temperature is above 92 degrees centigrade. Depends on the length, length of the DNA. And, if the DNA's longer then, of course, the temperature will be slightly more. If it's shorter, it will be slightly less. And then I told you that in the second step, will there be in each cycle, this is the second step within each cycle, that there'll be a primer which sort of comes in. This short segment you see here, that segment has to come in and get attached so that the rebuilding process can begin. And then what happens is that this rebuilding process starts. There is this ATCG, the four colors you see in the solution. One of them, each one of them is sort of coming. And along with the primer DNA, gradually building things back, both of them. And at the end of the cycle, the number will be double. And you go back, the number will be doubled again and so on so forth. And so this is how the classical amplification takes place. But, you see, here you just amplified one copy. But I need to amplify 40 million copies. Different segments, different colors, remember? How do I do that? Because, if I put it in a beaker, one simple beaker, there will be just one DNA will get multiplied many times. How do I get 40 million? Do I need 40 million different beakers to do the experiment?
[Slide 7] It turns out that it is not really necessary. The approach is something called an emulsion PCR. So it is sort of having 40 million beakers within a single beaker. This is how it works. Assume that you have a set of beads, the blue beads. Now, these beads may remind you about the bio barcode approach that we talked about. And each red DNA, although they are all colored the same, but really you should be thinking about as if these are the DNA's chopped from the 3 billion base pair long thing. And then 100 base pair. So this is different from this. This red is different from the other red. And so these are the various segments of the human DNA cut into different pieces. Now, so these are the beads. Beads are on the order of two or three micron, is relatively big. And these are the DNA about 100 to 1,000 base pair long. And remember, although all of them are red, they are, they have different 100 base pairs here, here. These are the parts of the total 3.2 billion DNA pairs-- the base pairs. Now, all the reaction will actually occur around the DNA. And I'll explain that in the next slide. So this is how it occurs. So you put the DNA in. Throw in-- mix in the beads. And then what happens is the following. You see, after a while the beads will capture the DNA. Now, of course, this is statistical process. So its within a solution. So, therefore, some of the beads will get nothing. Some of the beads will get one DNA. Some of the beads even may get two. Some may get three. But, of course, the probability of getting three is somewhat less than, let's say, probability of getting one. What is the formula for this? Well, we know thats the Poisson formula. That, if you wanted to know, if the bead to DNA ratio is lambda, the lambda could be 25 percent, 50 percent, 75 percent. If that ratio is lambda, then the probability of capturing k number of DNA by a bead is lambda to the power k, e to the power minus lambda and k factorial. I will show you the results in one second. I'll explain the implication in one second. But, if you wanted to know how many beads are empty, all you have to do is to set k equal zero. If you set k equal zero, then you will know that, how many beads are empty, for example.
[Slide 8] So before I get there, how many, let me first tell you how this process actually works. Here I have a test tube. Within the test tube, I have, let's say, millions-- Let's say 100 million beads in here. And let's say I have 30 million of DNA. So what is the ratio of the DNA to bead? It is about 0.3. So for every three DNA, there are ten beads. Let's say I take that ratio. Now, this is not a normal solution. What it has is oil and/or water. So what will happen that around each bead there will be a little surrounding of water. And then surrounded by oil everywhere. So each one of them will become a little reaction volume. Sort of like a little beaker on its own, each one of them. Now, after a while the DNA will float around in oil and eventually enter this surface. Now, eventually to land on the DNA, on the bead, one-- there may be one bead landing, one DNA landing. There may be two DNA landing. Or there maybe-- it maybe completely empty, no DNA landing. And so what we really want, as I'll explain in a second, is that there is no more than one DNA landing on a bead. If it's more, they'll be trouble. So, therefore, we always want one and no more. In the process, if you have to keep some empty, fine. But we must not have more than one. So let me explain how that works, how that can be achieved.
[Slide 9]
[Slide 10] So this is how it works. Here I have plotted the Poisson formula. For k equals 1, 1 bead, 1 DNA for part bead, that is the green formula, the green line. How did I get the green line? Essentially, I put k equals 1. And as a function of lambda, simply plotted this, plotted this curve. Similarly, for k equals 2, you can have another curve. And for k equals zero, the curve not plotted here will essentially be exponentially-- to exponentially dive. Because k equals zero will give you 1 here, 1 here, and it will be e to the power lambda. So the curve I have not shown here, it will essentially decay as the DNA to bead ratio increases. So let's say you do not want more than one DNA per bead. How can you make that work? This is how it works. For example, let's say I have a lambda of 0.2591. And I'll explain where it came from. If I put that thing in, then I will see about probability of getting the single-DNA loaded bead is about 20 percent. Now, that may look very low because you are wasting a lot of beads. How many beads are you wasting? Well, if the lambda is 0.2591, if you put it in, the probability's-- so remember, with k equals zero, it simply becomes e to the power lambda. 77 percent of the bead has nothing in order to get 20 percent of the bead, 1/5 of the bead loaded by one DNA. What about the probability of loading it two DNA, three DNA, four DNA and so on so forth? If you add these two, probability of getting one and probability of getting nothing and subtract from one, because everything must sum up to 100 percent, it's 0.02 percent. That means about 3 percent of the beads will not have-- will have more than one DNA. Now, what's wrong with having more than one DNA? You see, within each DNA-- each bead will become a PCR chamber. So it will begin to amplify, photocopy. If you have more than one, it is going to make photocopy of two things together. And, therefore, when you are trying to read it, it will be a jumbled up reading. You want one per bead so that, when you read it, the read is unique; right? And, in order to get there, it looks like you cannot have, you have to waste sort of 80 percent of the bead. And should not have the DNA to bead ratio more than 25 percent. And, in fact, this is theory. This is experiment. And, if you do it by experiment, you'll essentially get very similar loading statistics of the beads. So this is an important problem, statistical problem. If your bead take more than one DNA per bead, those you have to throw away at the end of the day because they are contaminated reading. They involve contaminated reading.
[Slide 11] Now, the second issue, so this is one problem with selectivity that has to be addressed by making sure your DNA to bead ratio is relatively small. The second important issue that has to do with selectivity is the following. You see, there are many wells, you know, millions of wells in here. Let's say you have randomly loaded all those beads. That each has 10,000 copies of individual base pair after the PCR amplification. And you put them in. You put them in the red well. Now, the red well, I'll explain in a second, must be surrounded by a set of empty wells. These beads must be-- these wells must be empty so that you can compare the signal coming from the red, which contains the bead, versus the signal, which is coming from the empty well. So that you can locally differentiate, you can locally differentiate the signals. So, therefore, this is very important that each loaded bead is surrounded by empty bead. So this will be like a local clock as the solution is running by, going over. It will generate a signal here. Will not generate any signal here. And so, when you take a local difference of the signals, then the signal to noise ratio will be very good, selectivity will be very good. Now, the question is, what is the probability that you can have a loaded well surrounded by a set of unloaded wells? What is the probability? Remember, it is, again, statistical. So something that we have to think about.
[Slide 12] In fact, it turns out that it's not a complicated problem. Let's say this is the top view. On the very left, this is the top view. And everywhere you see the black dots, these are the loaded wells. It has the bead. Each containing 10,000 copies of a single DNA, let's assume that. And then everywhere you see white, the whites are empty. And those can be used, potentially used as reference well. Now, remember, it's statistical, random. So, therefore, there's a random distribution of black and white. The question is, what is the relative fraction of black that you can have so that it is surrounded by white? Because you need the reference-- reference well. So we can take this particular configuration. Look in the green box, I have four neighbors empty and one occupied. Correspondingly we can have the red boxes also that has all these white things empty on the surrounding. But, essentially, a five-element local well which has loaded wells and so on so forth. So let's take a look at this one first. What is the probability that you will have a loaded well surrounded by empty wells? Well, let's look at that. If p is the probability that is occupied by black, 1 minus p to the power 4. Because four neighbors are empty. And so, therefore, once you can solve for the probability p to get p is equal to 0.2. That is the maximum probability that you can have. On the other hand, the number of wells, so the number of wells here is about 20 percent. So 20 percent of the well, if it contains the black, then you can be more or less certain that you will have many wells surrounded by empty wells, surrounding a loaded well. Now, on the other hand, if you're happy with the last configuration, where you don't want it to be surrounded by the empty wells one by one. But you can essentially have nine wells surrounded by this white well so that you can sort of one by one, you can have a reference and a loaded well reference and a loaded well. If you're happy with this configuration, then the probability that it is occupied p to the power 9. Because nine black wells are occupied. 1 minus p to the power 16, that's says that these are unoccupied. And then this, if you want to avoid this situation, then you subtract from 1 and that tells you the maximum that you can have, maximum loading you can have is only 36 percent. So what this says, that there is, you cannot have more than a certain fraction of the black. You can not load more than a certain fraction. If you want to have a reference electrode. You see, forget about all the math for one second. You see, if you have to have one loaded well with one reference well, then you can immediately see you cannot have more than 50 percent. Because one to one. And then all the extra constraints that you have to have for some neighbors open and so on so forth ensures that you have your loading probability. The fraction of black has to be even lower if you want this particular configuration to be viable. Okay, so what we'll do is at this point we'll take a few minutes break. And then once we come back, I'll tell you a little bit more about other techniques, other issues that also degrade the selectivity problem. I'll see you in a few minutes.