Science Interactome: 2013

Saturday, November 16, 2013

Cytoscape: Getting Started Part 1

Cytoscape: It's installed, now what?
Intended audience:
Those who have already installed Cytoscape, have found the official Cytoscape tutorial, but still have little idea of what to do.

Background:
I have noticed for some time, both my graduate colleagues and others that I have met at conferences seem to be quite annoyed with Cytoscape. A brief description of Cytoscape; it is a software commonly used to visually construct networks with nodes and edges (I am more interested in the biological networks). With the plethera of data out there, visualization is absolutely critical to helping other understand networks. In addition, Cytoscape can be used in conjunction with several "plugins" (which are basically functional Cytoscape software extensions) in order to calculate or better visualize any given network.

As a sidenote, I will be using Cytoscape 2.8.3 for my tutorial. I suspect that, at least for all the core basics, different versions of the software will not vary too much.

Getting started 101:
Initially, this is (more or less) the software screen you get when you first open the application.

The first thing you want to do is import the data that will be used to construct your visual network. The great thing about Cytoscape is that it was built with Notepad or Excel in-mind. I am comfortable with Excel so I will use it to build my network consisting of protein-protein interactions (ie. I would have a list of which protein interacts with which protein).

Data import format:

The format is as such:

In this example, I have paired up my protein pairs such that I know SPAC1002.06C interacts with SPAC6G9.13C (the very first pair). Likewise, the second, third, and so forth lines represents other pairs. Cytoscape has a built-in feature which can easily remove all instances of duplicate pairs so if this is a problem, do not worry -- we will get to that soon!

Importing the data:

To upload your Excel file, go to FILE > IMPORT > Network from Table (Text, MS Excel)....

Make sure to select the appropriate columns for "Source Interaction" and "Target Interaction". Cytoscape also has an "Interaction Type" function in-case you have a file with multiple types of interactions (ie. a combination of protein interactions and genetic interactions). In our case, all pairs represent the same type of interaction (protein-protein). However, if multiple types of interactions are present and indicated by "Interaction Type", then different types of edges can be differentiated.

Importing attributes:

In Cytoscape, attributes are various descriptions of the nodes in your network. Let's say

Playing with the network:

This is the part that takes some time to explore.

Let Cytoscape organize your network-look in several ways:

Cytoscape allows you to manually drag nodes (the circles) and/or edges (the lines) however you want on the main screen. However, it is often best to let Cytoscape organize the "shape" of your network based on various built-in algorithms (more can be imported via plugins).

Starting with the upper toolbar, go to LAYOUT > Cytoscape Layouts. Choose any layout and see how it changes your network look. The layout seen above is Force-directed layout > (unweighted).

Zoom-in/ Zoom-out:

You can zoom in or out of the network by either scrolling with you mouse or using the zoom functions in the upper-left corner (the two Magnifying Glasses).

In the lower left corner, there is a global view of your network with a blue rectangle over it. The blue rectangle tells you how much of your network you are currently seeing in the main screen (the bigger area spanning the middle to upper-right of the screen).

Sunday, July 21, 2013

The Right Lab, the Right Student, the Right Fit?

About a month ago, I had a casual conversation with a colleague that went something like this:

"Colleague: You're an incoming 5th-year graduate student; you're just about done!"
"Me: ...Actually, I'm an incoming 4th-year graduate student...."
"Colleague: Oh! ...well, you've got a long way to go man!"

It was a friendly joke, of course, but it got me thinking: how do graduate students measure their time on this degree (PhD) and why? I have heard it measured in terms of years left. I have heard it measured in terms of years since started. I have even heard it measured in terms of "forever-ness". I have never really put too much thought into how long I have been here and how much more I have to go -- they have been merely numerical facts and averages. But thinking about it, many people seem to attach vast emotions to these numbers. But why?

In agreement with the famous PhD comics series, it seems that nearly all newcomers to the graduate program are "full of life, hope, and dreams of fame and success." It seems to usually be accompanied by the dream of "get rich quick." Finish the program asap and publish as many papers as possible. But I never really adhered to that dream coming into the PhD program; I entered hoping for a long tenure whereby I would emerge as a "Renaissance biologist." To me, graduate school is the last frontier of studentship -- where I would be most free to study all that I wanted (excluding my retirement where I could do just that too).

High-throughput versus low-throughput

I recently had a discussion with my lab-mate about the graduate student in a high-throughput versus a low-throughput lab. High-throughput labs run large-scale (often thousands to millions of mini-experiments), systematic experiments that usually feel repetitive. Publishable data almost always take a long time, even with maximum physical exertion, due to the scale of projects. However, once compiled, the end-data is always rich with information for a very wide-range of scientists. Low-throughput labs tend to be more focused onto a few genes/proteins; they are more detailed and thorough. Intellect, hard-work, and some luck can definitely lead to a slew of publishable work although this does not have to be the case.

By the end of this discussion, I concluded that both lab types are simply just that, different in nature; thus rejecting the notion of a better or worse work type.

This discussion ties into my earlier conversation because I came to a general hypothesis:
"New graduate students tend to be overly optimistic about their future success as a graduate student
and will tend to see low-throughput labs as the best opportunity to get rich quick [in papers]."

Biology is a fast-changing field and deciding how to make moves in the field will always be difficult. I have naively separated lab types into high versus low-throughput; clearly, there is a spectrum.For me, I will try to tread the unknown forest [high-throughput focus] and leave a path of my own. Only time will tell how that turns out.

Sunday, June 23, 2013

US Supreme Court Rules "Products of Nature" not Patentable

US Supreme Court rules human genes not patentable

Passed 9 to 0 on June 13, 2013, the New York Times and Science Magazine reported that the US Supreme Court has ruled that human DNA sequences (genes) which are "naturally occurring" and extracted from tissues can not be patented. Quoting Science on Justice Clarence Thomas, "...separating that [BRCA1 and BRCA 2] gene from its surrounding genetic material is not a act of invention." Therefore, no one can hold exclusive rights to them.

The case focused on the BRCA1 and BRCA2 patents which were held for almost 16 years by Myriad Genetics. It was reported in the Times article that the Salt Lake City firm made over $100 million in revenue in the most recent quarter on BRCA genetic testing alone. More specifically, one of the most lucrative businesses of Myraid Genetics was to test the BRCA genes for mutations that might render the patients at risk for break or ovarian cancer -- a type of test that could not be performed by any other group without Myriad consent. I recall as a senior undergraduate, my professor of Cancer Biology, who was personally afflicted by breast cancer and had mutations in her BRCA genes, first brought it to my attention about the long-time conflict surrounding these patents. On the one hand, these patents financially benefit the patent-holders and are supposed to "protect the integrity of scientific discovery." On the other, they drastically increase the cost of the said discoveries. For instance, Myriad charged $3000 to $4000 per diagnosis test. In addition, no one else could perform the test, at the threat of lawsuits from Myriad.

However, even though the high court ruled that extracted DNA could not be patented, cDNA derived from these sequences can be because they are synthesized from the lab. Myriad interpreted this ruling [on cDNAs] as a win for them because, as most geneticists and molecular biologists know, creation is cDNA today is a crucial step to working with genes in the lab for a multitude of experiments. All this calls for is the need for labs around the world to continue pushing forward on cost-effective sequencing technologies that require minimal DNA amounts. As the cost of sequencing has decreased at an astonishing rate over the past decade and new ways of doing the sequencing has been invented, I would not be surprised that this concern becomes obsolete in the near future. However, many lab experiments which require large amount of cDNA for cloning might be a tougher hurdle to navigate. A question comes to mind: how different can we make the cDNA before it is determined to be significantly different from the original, patented cDNA?

Nevertheless, the ruling may prove to be an impetus for uncertainity for intellectual patent lawyers in the near future. Other areas of influence will probably be for stem cells, "lower" organisms' genes, and crop genes. Another Times article, underscores the parallel between the Myiad case and that for genetically-modified crops whereby, once a crop has been introduced into the "wild", when can its genes be considered non-patentable, natural products?

The ownership of biological entities, especially genes, are very important for molecular biologists. Our work relies entirely on the free ability to study what's out there. More of these landmark cases are bound to arise in the near future and it would be in our best interests to keep tabs.

Friday, May 31, 2013

Heterogeneity of the human metabolic network in human tumors

Late last month, the paper Heterogeneity of tumor-induced gene expression changes in the human metabolic network was published in Nature Biotechnology. In 1924, Otto Heinrich Warburg theorized that tumor cells might not derive their energy source via the usual aerobic respiratory pathway but adopt the anaerobic pathway instead. Since then, this theory has gained much support, especially for explaining the survival of tumor cells in the dense hypoxic cores of neoplasias. Regardless however, cancer metabolism has been largely ignored -- until recently. In this paper, the group of Hu, J. et al. sought to look (at the individual and systems level) for similarities or difference in the expression profiles of metabolic genes in multiple tumor types.

In brief summary, the authors analyzed a compendium of gene expression profiles collected over the past decade from microarray studies of 22 different tumor types. They used only data from the most comprehensive microarray platform to-date (HG U133 Plus 2.0) in order to capture expression profiles from the most human genes possible. Furthermore, they used data derived only from studies which used tissue from the biopsies of primary tumors.

They reported several interesting findings including:

The identification of several key pathways in the glycolytic cycle which are overall upregulated or downregulated across all or most of the tumor types.
A significant amount of heterogeneity in gene expression patterns across tumor types
The rewiring of the gene expression program for several key respiratory pathways

Overall, they statistically showed clear examples of metabolic pathways/genes that clearly change expression patterns when comparing tumor to normal tissue. Of these examples, some appear to be conserved across different tumor types, some appear to be dependent on cancer type. But overall, they show that the metabolic network of human tumors do go haywire and really does warrant further attention by the cancer research community.

Relation to the Warburg effect?

The authors do not make clear how their results support or refute the claim of the Warburg effect. They show that the metabolic network changes going from a normal to a tumor tissue but how these changes, together, play out is still unclear. The goal of network biology is to put biological context onto the wiring details of a biological system but the functional understanding appears to be sparse.

One hypothesis could stem from the authors' finding that, at the individual gene level, the expression program of TCA cycle components appears to have mostly changed in colon cancers. This example might hold the clues to suggest that perhaps this pathway imbalance somehow "forces" the cell to adapt anaerobic respiration. Further biochemical studies must be undertaken to address the biological question surrounding this mechanism.

Unavoidable bias.

The authors were unable to perform paired-analyses (paired based on tumor origin in terms of the patient) at the single gene level due small sample size. Furthermore, it is unclear how heterogeneous the biopsy samples from each tissue type were and the age of the patients. These are clearly unavoidable pitfalls of the data but it emphases the need for more numbers in order to better understand what is happening. There are statistical methods to attempt to "correct" for these biases but the gold-standard is usually to correct for these biases at the experimental, data-collecting stage.

Comparison to yeast.

Interestingly, the yeast metabolic network also changes during growth, depending on the availability of glucose - shifting from aerobic respiration during the early phase of growth to anaerobic during the late phase (hence the eventual production of ethanol). Since yeast are much easier to work with in the lab, it would be interesting to compare the yeast versus human cancer phenomenons. The only simple (yet major) caveat to such comparison would be that yeast were "built" within their biology to readily switch respiration pathways. The underlying difficulty with human tumors are the multitude of various mutations that must first occur. These mutations vary widely from tumor to tumor and it is currently tough to differentiate the driver from passenger mutations. This might make it difficult to make generalized conclusions about such comparisons.

Final remarks.

The paper definitely reveals insightful findings, although it is not clear how much "WOW factor" there is to the general conclusion that the metabolic network is rewired in cancer. It definitely provides some clues for where scientists can and should focus for future work. But, as with most computational work, much experimental work will be required to better understand the biology.

Sunday, May 26, 2013

World, Hello!

As this is my first post to the blogger world, I'd like to introduce myself:

My name is Tommy. I am Ph.D graduate student studying protein interactomes (networks) at Cornell.

What does that mean?

Well, starting from the core basics - all living organisms have DNA which sometimes encode for RNA which sometimes encode for proteins. Proteins are these molecules with all sorts of strange shapes and sizes which float around in the cell and often interacts physically with everything around them, including with other proteins! These physical interactions between proteins can lead to other interactions to happen, which can then lead to other ones, and so on so forth. The point is, eventually and somehow, "protein-protein" interactions lead to very confusing and complex cascades of events that lead to the plethora of phenotypes there are. Brave souls out there have worked for decades trying to map out these cascades (or pathways) but scientists have only begun to scratch the surface. For an example of some pretty well worked out cascades, try googling "p53 cascade" or "ras cascade". Scientists like myself believe that if we can accurately draw out ALL of these interaction pathways in a cell at any given time under any given condition, we will be able to unlock exactly the mystery of how and why a cell does what it does.

It's a small-world out there.

One of the weird things about these protein networks is that, in many ways, they are quite similar to social networks. This point was first illustrated by the Watts and Strogatz in their 1998 Nature paper "Collective dynamics of 'small-world' networks". Basically, many interaction networks seem to "organize" itself by having a few things interacting with a ton of other things, whereas most things will have their own small niche. The first time I learned about this concept, I thought about the social dynamics of high school - you have the few popular kids who seem to know everyone and everyone knows them, but most of the student body will know much less people. Furthermore, many of us have probably heard the "six-degrees of separation" theory where every person in the world is separated from each other by at most 6 steps (or 6 linearly related people). In this high school example, it is pretty easy to understand the small-world idea. But the fact that networks of inanimate objects (proteins) follow very similar patterns is just bizarre. This is how I initially got intrigued by the idea of biological networks - how and why are inanimate objects, with no canonical thinking capability, able to organize their relationships the same way that humans do?

Google map it.

Today, what would you do if you wanted to find directions to some location? Use apple maps? Of course not! You would google map it. Back to the original idea of harnessing the power of interactomes, what if we could generate "Google maps" of biological entities? I think that is one of the major goals of systems biology today, one in which I am actively involved in as a graduate student.

The experiment.

Now that you know my interest lies in how things relate to each other at a network level, I will impart one of my reasons for this blog. This is an experiment (probably a poor one but it will suffice to amuse me) where I want to discover the network of my own scientific interests. I plan on blogging about anything I find amusing/interesting related to science and my science career progress. As I am in a malleable state in my scientific development, I would imagine it difficult to force myself to be interested in and post only a very narrow range of topics. If the small-world theory holds up, the long-term of this blog should do well at capturing the size of my interest network and the "architecture" of it.

I suppose I will now begin the journey that I was already on, though this would be the first time I would be documenting it. Enjoy blogger-sphere.

Pages