nanoHUB-U Biological Engineering: Cellular Despgn Principles/L2.4: Introduction to Biological Networks ======================================== [Slide 1 L2.4] Welcome back, I'm professor Rickus. In this lecture, we'll discuss an introduction to biological networks. We will go over the importance of networks and cells, and we'll talk about some basic terminology of networks as well as look at the structure of biological networks. And what the consequence of the structure can be on cell function and how we design and redesign cells. [Slide 2] So, let's think about genes, and proteins, and cells, and organisms. Interaction is a fundamental feature of life at all scales. So genes and proteins and cells do not function in isolation, but rather, they form networks of interacting units. And there's structure within these networks that can impact function. This structure occurs both globally, how the full network is organized, as well as local structure within how specific genes and cells are interacting with one another. [Slide 3] So let's first start with just some terminology and how we represent networks. So we'll define here a node, in our case, in biological networks, these nodes could be a gene, an individual protein, a cell, an organism or a metabolite. And we'll define an edge, labelled E, as the interaction between those two nodes here labelled I or J. So E sub IJ, will define the interaction between I and J. [Slide 4] So there are many types of biological networks, such as, protein protein networks. In protein protein networks, the nodes represent individual proteins and the links, or the edges, are the interactions, such as binding, or other interactions between those proteins. We can look at metabolic networks, where the nodes are small molecule metabolites, and the links or edges are reactions, such as enzymatic reactions, that would convert one metabolite node into another. We can also look at neuronal networks, where each node is a cell, in this case a neuron. And the links or edges represent the synaptic connections between those neurons. We'll also look at transcription networks, which we'll go into more detail in the next lecture. But in a transcription network, the nodes represent genes whose gene products function as transcription factors, themselves. And the links, or edges, represent the active function of that transcription factor by either activating or repressing the expression of the next gene that it's connected to in the network. [Slide 5] So let's take a look at a real biological network. Here's an example of a protein protein interaction network in yeast. Now in looking at this, just by our visual inspection, we can sort of look at this and say, do we see any structure in this network? An alternative would be, is there structure here compared to, if all of these proteins were just randomly connected or, randomly interacting? By visual inspection, there appears to be some structure, at least this is sort of what we see qualitatively just by looking at it. But, how do we determine if there is, in fact, structure that by eye seems to suggest, yeah, there might be some there. How do we do that quantitatively, and more rigorously answer that question? And then what is the significance of any structure that may exist in this network? [Slide 6] So let's first ask a question. What would a network look like if proteins were just randomly connected to one another? So there are different models of random networks. One model is the Erdos-Renyi model, which, basically, to create this random network, we would just define some N number of nodes and then connect those nodes at random with some probability P until a set number of edges or interactions are formed. And we look at this example here of one of this random networks, we can just sort of, again by eye, see that this looks qualitatively different than our yeast network that we've looked at before. But how do we really know for sure? And just to point out, so this network that you're looking at has a number of nodes of 130 nodes and 215 interactions here. And the red represent nodes with the most connections, and the black nodes are labeled that have the least number of connections. And this'll become important in a second, we'll see why. [Slide 7] So one of the things that we can quantitatively define about our network is to look at the node degree, which we'll call k. Now, the node degree k is the measure of the connectivity of a node. In other words, how many other nodes does a node connect to? In this example here, we've got A highlighted in red. And so, we simply can ask the question, how many other nodes? And in our case, A could be a protein, if we are looking at protein protein interactions. A could represent a gene or a cell, whatever it is we're looking at in our network. So if we look at the node degree, or connectivity of A, we see that A connects to B, it connects directly to C, it connects directly to D. And it connects directly to G, and then another one down here without a label. So we can see that the connectivity, or the node degree, of A is five. It connects to five other nodes. So we can go in here now and define our connectivity for all the nodes in our network. We looked at A as five. If we look at G, for example, it connects to two other nodes, F and A. So its degree or connectivity would be two. The connectivity of B, for example, is three, connects to F, A, and C directly, and so on. So now that we've defined this feature of our nodes, we can now look at the distribution of the connectivity k for all the nodes in our network. [Slide 8] So if we go back to our random network model here, we can see that in this case, these models of random networks, and we look at the distribution, which we plot here now, of the probability of connectivity on our X axis, the k or degree of connectivity node. We can see, in this case, a Poisson type distribution that has exponential features. And what I mean by that is, as we look at the tails, the probability of having a very highly connected lymph node, or conversely, a very low connectivity, falls off exponentially. And there's a characteristic mean connectivity here that, whereas, the majority of nodes have some typical connectivity. [Slide 9] So what about our real network? When we go in and look at the distribution of connectivity for all the nodes in this particular protein protein interaction network, we see that the probability distribution is better fit by a power law. And in a power law distribution, We end up with more highly connected nodes than would be expected in a randomly connected network where that probability falls off exponentially, okay? [Slide 10] So why might this be important? So now let's sort of compare. We call networks that have this sort of Power Law Distribution of connectivity, can be called Scale-Free networks. And they're called Scale-Free Particularly because of this absence of a characteristic node, there's no mean characteristic node or connectivity that characterizes the nodes in our network. So just to compare. A random network, connectivity follows a Poisson Distribution, which these exponential tails at the very high and low levels of connectivity. There's a mean characteristic connectivity. So in other words, deviation from the mean becomes rare. Whereas in a scale free network that follows a power law distribution has no characteristic node. And so you can see two examples of a random and a scale free network generated here. [Slide 11] And you can see now the presence of some of these highly connected nodes that are very unlikely in a randomly connected network. [Slide 12] So it turns out that many biological and cellular networks have these scale free features. So some of the first examples that were looked at. This was an extensive study looking at a number of metabolic networks from 43 different organisms. So what a metabolic network is, just to remind you from our earlier discussion, this would be each node representing a metabolite, and each edge or interaction representing a reaction, often enzymatic, that converts that one metabolite node into another. And when you look at these across a range of model organisms from E coli and our nematodes model, worm model organism C elagans, we can see evidence of these power law distributions of how these metabolic networks are organized. And they seem to have this scale-free features where we see the existence of these highly connected nodes, in this case, metabolites, more often than would be predicted if all the metabolites were just randomly connected. [Slide 13] So what's the functional consequence of this structure, that we often see in biological networks. Well, there's some color coding in our yeast protein network that we looked at before, but I didn't tell you what that color coding was. And so, it turns out in this study, they went into each of these genes, proteins in the yeast, and knocked out, or deleted that protein from the network. And then asked the question was that deletion or knockout lethal, or have some effect on cell survival and growth, or was it non-lethal? And what they found was a correlation between the number of links, right? A number of edges and connections that a particular protein made. Right, so there's a relationship here between the, basically how essential that protein was relative to how many connections it made. So you can see in the most highly connected proteins here, about 60% of those proteins were essential. In other words, knocking them out had a lethal consequence on the cell as compared to, if you look at the very low connected nodes, proteins that made very few connections relative to other proteins in the network, only about 20% of those proteins were found to be essential. In other words, only 20% of the time did knocking out one of those proteins result in a lethal impact for the cell. So there's functional consequence, therefore, of the connectivity of the nodes. And having these highly connected nodes effects the stability in robustness of the network. [Slide 14] So one of the functional implications of this particular type of structure is that these scale free networks with very highly connected nodes are stable to random attacks, right? Stable to random attacks, and therefore to error tolerance as well. So there's this small number of these nodes that keep the network intact. And the random networks, therefore, would be more likely to fall apart if we were to just randomly go in and attack one of these nodes. So our probability of hitting a very highly connected node on a random attack is low. And so our network is more likely to stay intact. Now this type of structure and theory has been applied in a really wide range of networks beyond just biological, right? So directed attack at hubs can be very effective in looking at disease spread and epigenetics. Used in strategic warfare, and has more recently in the last 15 years or so been considered in biological network. So let's think a little bit more about what the consequence of this for cell design is. [Slide 14] So, when might we want to attack a network, right? So if we, as engineers, wanted to intentionally attack a network or change it, the metabolic engineering, or if we were trying to kill a particular cell, perhaps it's a cancer cell, or perhaps it's a pathogenic bacteria, and we want to kill this cell by going in and targeting particular proteins or genes as drug targets. So this structure has some implication for how we might do that in targeting our signaling system or our metabolic system as a full network and considering all of the interactions. [Slide 15] So let's look at some other quantitative features of networks and their impacts or significance in biological networks. So another quantitative features that can be defined in networks is the shortest path length. So, what this is, how many steps or edges do I have to go through in the least number to get from one node to another? So, the networks we've been looking at, such as our protein protein interaction, have been undirected networks. In other words, protein A interacts with B, this could be a binding. So there's no direction to that. If A binds to B, B binds to A and there's no directionality. In our metabolic networks, or in our transcription factor networks that we'll look at later, these could be directed networks whereas now, we have arrows, right? So in this case, if each node is a metabolite, say metabolite C can be enzymatically converted to A, A is not necessarily able to be enzymatically converted back to C. In other words, all of our reactions are not necessarily practically reversible in the cell, so we can have directionality in that. And so this becomes important when we look at our shortest path length, that we'll label L, sub I J. Looking at, so L sub I J, if we were to go into and apply this example here, and we look at the shortest path length to get from A to B, well, it only takes One step, one edge, right? So our shortest path length would be 1. Whereas if we looked, and conversely in our undirected network, now flipping that, our path length from B to A is the same. Because we have no direction. If we were to look at the shortest path link between A and F. Now I have two ways to get there, but in both cases, it takes me two steps or I have to go through two edges. So from A to G to F, or A to B to F, and conversely F to B to A, or F to G to A. My shortest path length is now 2. Now there's other paths to get there, right? I could go from A to C to B to F. But my shortest Is directly via B. And that's what we're looking at here. Now in our directed, such as our metabolic networks where we have directionality in our edges, or our connections. That this converse relationship is not always equal. So in other words, the shortest path length from A to B does not always equal the shortest path from B to A. And looking at in this example, if we look at A, we can get to B, no we have to follow the the direction of the arrows, right? So we can't go directly to B in this case. But we could get to B via G and F and up to A following, and so our shortest path length is 3. Now, on the other hand, to go from B to A, we can go directly. We can convert, if B is a metabolite, we can convert B directly to metabolite A via a single edge or enzymatic step for a case of a metobolic network. So in this case, our length sub BA is 1. So, why do we care about this? And the other thing I'll point out too is that there can be connections where there is no path. So, in this case from A to C, because the direction of arrows, I have no way to convert A to C, okay? [Slide 16] So, that can exist in our directed network, such as metabolism as well. So, when we look at some of our real metabolic networks. And we look in our scale free, or are also sometimes considered small world networks because of this feature that we'll talk about here. The shortest path length is often very small in our scale-free networks. Precisely because we have these very highly connected nodes, right? So, in our metabolic networks, if we were to think of the shortest path length is essentially the minimum number of reactions that it would take to convert A to B, right? So in our metabolic network, the shortest path length turned out to be for most metabolites, connecting them 3 or 4 reactions. Very small. So you can think of this, this is a common feature of a lot of social networks. So If you've ever played the game Six Degrees of Kevin Bacon, it takes a very small number of steps to get from one actor who's been in a movie to another. To Kevin Bacon, via connections with other actors or actresses. Maybe a more modern example of this might also be think about your Facebook page, right? You could probably get from you to anybody else in the world by a surprisingly small number of steps, because there are people out there who are highly connected nodes. People that have ridiculous numbers of Facebook friends, right? And so by traveling through highly connected nodes, you can get from one friend to almost anybody in the world with a surprisingly small number of steps. This is this 6 degrees of separation concept that's also been popularized in books and movies. But so what does this mean for cell function dynamics? And if we would want to engineer the metabolism of these cells. [Slide 17] And so, let's also now take a step and think about local structure. So now we're sort of zooming into individual nodes, right? So we've been sort of talking about just sort of simple connections of one nodes to another. But are there some particular local connections of how those nodes may be genes or proteins are interacting with one another.that has some particular structure that is enriched in our network compared to a random structure? Now if we have these sorts of local structures which could be one, Autoregulation. And one we'll look at in a few lectures coming up in particular, is Negative Autoregulation. We could also see, are there any feedback or feedforward loops in our system? And any local structure, such as these, that is enriched in a real network, in our biological network, compared to a randomly connected network, we would call this a motif. And then we can ask the question, why might these local structures, these motifs, be enriched in our network? First, do they exist? And if they do, what consequence or what design features do they bring? Why might evolution have enriched these inner biological networks? [Slide 18] So let's take a look at another real network now. This is an E coli Transcription Network where each node represents a gene or an gene operon in E coli. And each arrow or edge here, and we have direction to these now. This is a directed network represents how that gene or operon at that node, either activates or represses the expression of the node, the gene or operon that it connects to, okay? And in this particular network that we're looking at represents about 20% of the interactions known in E coli, or shown in this particular network. It contains about 420 genes or operons in this system that with about 520 edges or connection, gene activation, gene repression interactions. And so we can ask a question, does this network contain motifs? We can also in just sort of visual observation you can, even without looking at the connectivity distribution, though we certainly could do that for this network. But you can see the existence of some very highly connected nodes, such as this very obvious one here that's connected to quite a few other genes. [Slide 19] So, returning to our questions, are there any motifs that are enriched in this particular network? It turns out that there are. And one that is enriched in this network are feedforward loops. So here now, nodes have been highlighted in blue that are participating in a feedforward loop. Where X has some direct path to influencing the expression of Z. As well as an indirect path via another gene on the expression of Z. And we would call this a feedforward loop. [Slide 20] And this represents now a motif, a local structure, that is enriched within the larger structure of our network. And so coming up, we're going to go a little deeper into transcription networks, looking at this regulation of gene expression. We'll also build a simple mathematical model of gene expression that will allow us to dive deeper into the functional consequences of these structural motifs that are enriched in transcription networks. And then we'll take a deeper dive into some specific motifs that are particular enriched in a number of biological networks, particularly in transcription networks, as well as other cell, cell interactions. Neuronal networks, for example. And we will take a deeper look into this negative autoregulation that we briefly introduced, as well as feedforward loops. And these particular structural motifs have consequences that allow functions such as speeding up the temporal response of gene expression. Or in the case of feedforward loops, filtering noise in the input output response of function of those connections. So I hope to see you next time.