Network news
Hui Ge sifts through oceans of data to explore how
genes collaborate
Consider the worm.
For a multi-cellular organism, Caenorhabditis elegans keeps
things simple. A typical adult roundworm has 959 cells, no more and no
less, and scientists have traced the exact lineage of each cell. The
animal goes through life without a brain or much of a sex life (almost
all are hermaphrodites).
But C. elegans also has about 19,000 genes—almost as
many genes as humans. And just as in humans, no gene does its work alone.
Instead, tasks are accomplished through highly complex networks of protein
interactions.

Today's advances in systems biology began with genome sequencing. Hui Ge
is among those creating more powerful next-generation platforms that integrate
protein interactions and other kinds of high-volume analyses as well.
Photo: Sam Ogden
|
This is the realm of systems biology, and of Whitehead Fellow Hui
Ge,
who studies embryonic development in the worm.
“I’ve always been interested in how a fertilized egg develops
into a whole organism like us,” says Ge. She gets the big picture
on this process by combining data from several high-throughput analysis
techniques that cut wide swaths through the worm genome, and using advanced
statistical methods to sort through the results.
This systems biology approach starts with the components of a system,
and studies how those components work together to achieve a certain function,
such as protein synthesis or protein degradation. “It’s important
to know not just the individual components of a cell but how they are
mapped together,” Ge notes. “It’s like a subway system;
if you remove a station, the effect on the system will depend on its
position.”
“These are data-driven approaches as opposed to the more traditional
hypothesis-driven approaches,” she adds. “We put together
all this high-throughput information, and then we can find predictions
for uncharacterized genes and connections between them, which we can
test. And this can be an iterative process in the lab. You make predictions
and validate them, and then that validation can help your prediction
techniques.”
Reading the "Interactome"
Ge graduated with a bachelor’s degree in biochemistry and molecular
biology from the Beijing University in 1999, and won a scholarship to
Harvard Medical School.

Ge and her co-workers in Mark Vidal's Harvard lab mapped out interactions
between many proteins in the C. elegans work,
in this 2004 Science paper.
↓
In a 2005 Nature paper, Ge and colleagues combined maps of protein interactions
(blue), gene expression (red) and loss-of-function profiling from RNA interference
studies (red).
↓
Next, the researchers filtered out a network "backbone" that grouped proteins
shown to be interacting by at least two sets of data.
↓
When the researchers zeroed in on "sub-networks," they found that proteins
whose functions were already known helped to characterize unknown proteins
nearby.
Images courtesy of Nature and Science
|
She arrived just as systems biology began to soar.
“I fell in love with the idea that you can study how the organism
works, not just by studying individual genes but understanding what a
lot of genes do at a time,” she says.
In a homework assignment for a class taught by Harvard’s George
Church, Ge came up with a computational strategy that eventually turned
into a paper in Nature Genetics.
The strategy was about correlating data from large-scale studies of
genes with large-scale studies of protein interactions. More specifically,
it was to correlate transcriptome data (reflecting which genes a cell
expresses under certain conditions) with interactome data (mapping interactions
between the proteins). Where the two data sets agreed, clearer pictures
of protein roles would emerge.
“She was one of the first researchers to suggest putting data
together from very different data sets to get something that was better
than the sum of the parts,” says Harvard’s Marc Vidal, in
whose lab Ge ended up.
Doing research on yeast, Ge and her co-workers demonstrated “the
first global evidence that genes with similar expression profiles are
more likely to encode interacting proteins,” as their paper put
it. And they showed that the integrated data could help to improve hypotheses
generated from either approach alone.
Getting worm
Also at the Vidal lab, Ge worked on a big project to map out much of
the interactome of C. elegans. Eventually published in Science,
this paper had no fewer than eight other co-first-authors. (The worm
was the likely target because it was the first multi-cellular organism
to be sequenced completely, in 1998.)
Next, Ge and colleagues tackled very early embryogenesis—the process
by which the worm divides twice, into four differentiated cells, in the
first hour after fertilization.
The scientists combined data from three sources: protein-protein interaction,
gene expression, and loss-of-function profiling based on RNA interference.
They then made predictions about how the embryonic “molecular machines” work.
Testing 10 uncharacterized proteins by seeing where they popped up in
live animals, the researchers found that the locations generally were
consistent with the proteins’ predicted roles, findings reported
in a 2005 Nature paper.
Completing her PhD in genetics, Ge was picked as a Whitehead Fellow.
Before starting at the Institute, though, she spent six months at the
lab of Harvard’s Craig Hunter, learning the craft of worm wet-lab
work.
Healthy pairs of genes
At Whitehead, Ge and colleagues have embarked on two main projects with
the worm, the first being further explorations of genetic interactions
during embryonic development.
RNA interference studies have highlighted about 2,500 genes whose loss
kills the worm embryo. That number seems pretty small compared to the
total of around 19,000 genes, she says, and she suggests that it’s
because genes can back up each other’s functions.
“We are combining the protein interaction map with the genetic
interaction map to predict these pairs that give you a synthetic phenotype,” Ge
says. “These genes are not functionally equivalent, but they complement
each other’s function. Knowing these kinds of genetically buffering
pairs is very important for understanding development but also for understanding
disease.” She gives the example of the mammalian p53 tumor suppressor
gene: “Even if you knock down p53, a mouse will not get cancer
immediately. It increases the chance that when something else is damaged,
the mouse will get cancer.”
The second major effort is to take a dynamic view of gene expression
that integrates time and space information. “In multi-cellular
organisms, at different locations, the molecular networks are actually
different, because not all of the genes are expressed at the same time,” Ge
says. One student in her lab, for instance, is adding data about location
and trying to predict the genes that are expressed in certain tissues
such as muscles or skin.
“Quantitative science is providing us with a process that helps
us understand biology more quickly and in a systematic way,” says
Ge. “Little by little, we are learning how to achieve these projects.”
Today’s attempts at detailed molecular modeling are “still
first draft, still relatively fuzzy—just like genome sequencing once
was,” acknowledges Ge’s mentor Vidal. “But they are really
shaping up.”
|