Wednesday, November 28, 2007

The Incredible Shrinking Human Genome

When the human genome was still terra incognito (or, at least our knowledge of the sequence was something like my view of the world sans my glasses oft mistaken for bulletproof glass), a key question was how many genes were present. It was widely cited by textbooks that the number was somewhere in the 50K-70K range, or perhaps even 100K, and some of the gene database companies such as Incyte and HGS and Hyseq were gleefully proclaiming the number much higher (just think what you are missing without our product!). The number wasn't unimportant. If you had some other estimate of what fraction of genes might be good targets for drug development, then the total number of drug targets was dependent on your estimate of the number of genes -- and drug targets were saleable -- and patentable.

At some point, a clever chap at Millennium decided to try to pin down these estimates. First he went for the textbook numbers, which everyone thought were well reasoned from old DNA melting curve experiments estimating the amount of non-repetitive DNA. Surprisingly, he was unable to find any solid calculation converting one to the other -- for all his searching, it appeared that the human gene estimate had appeared spontaneously like a quantum particle.

Using some other lines of thinking (I actually have a copy of his neat document somewhere, though technically it is a Millennium secret -- nothing just ages out of confidentiality. Silly, isn't that!) he argued from estimates of the gene content of yeast and from what had been found from C.elegans for a new estimate. Now, I couldn't find the flaw in his logic but I couldn't quite get myself to accept the estimate. It was preposterous! Only 30K genes for human?

Well, of course the estimate came in even further south of there. And a new paper from the Broad has nearly nipped that down to 20K even. Alas, the spectacularly endowed Broad wasn't munificent enough to publish with the Open Access option for PNAS, so until I make another pilgrimage to the MIT Library I'm stuck skimming the abstract, supporting materials & GenomeWeb writeup.

In some sense, the analysis is inevitable. It's hard to look at one genome and get an accurate gene estimate, but with so many mammalian genomes it gets easier -- and this paper apparently focused on primate genomes, which we have an amazing number of already. It sounds like they focused on ORFs found in human mRNA data, which at least removes the exon prediction problem.

The paper has the usual caveats. The genome is finished -- but not so finished. Bits and pieces are still getting polished up, and while they are generally dull and monotonous a gene or two might still hide there (the GenomeWeb bit mentions 197 genes found since the 'completion' of the genome which were omitted). The definition of gene is always tricky, generally going along the lines of Humpty Dumpty in Looking Glass: 'When I use a word...it means just what I choose it to mean -- neither more nor less.'. Gene here means protein-coding gene, to the exclusion of the RNA-only genes of seemingly endless flavor that pepper the genome.

The other class of caveat is very short ORFs -- and some very short ORFs do interesting things. For example, many neurotransmitters are synthesized from short ORFs -- and tend to evolve quickly, making it challenging to find them (I know, I tried in my past life).

Will this gene accounting ever end? The number will probably keep twiddling back and forth, but not by huge leaps barring some entirely new class of translational mechanism.

Speaking of genes & accounting, one of the little gags in Mr. Magorium's Wonder Emporium, a bit of movie fluff that is neither harmful nor wonderful, is a word derivation. The title character hires an accountant to assay his monetary worth, and promptly dissects the title: clearly it is a counting mutant. I find mutants more interesting than accountants, but both have their place -- and I never before realized that one was a subset of the other!

Tuesday, November 20, 2007

Gene Logic successfully repositions, Ore What?

Gene Logic today announced that Pfizer has filed a patent based on a Gene Logic drug repositioning effort. This would appear to be one of the most significant votes of confidence in such efforts by an outside partner.

Drug repositioning is the idea of finding new therapeutic uses for advanced compounds, particularly compounds which are very advanced but failed due to poor efficacy in the originally targeted disease. A number of companies have sprung up in this field -- the two I am most familiar with are Gene Logic and Genstruct -- and at least some large pharmas have in-house programs.

The reality is that many existing drugs have origins in therapeutic areas which are quite different than those they started in. Perhaps the most notorious case is Viagra, which was muddling along as an anti-hypertensive until an unusual side effect was spotted. Minoxidil similarly began in the anti-hypertensive until its side effect was noted. The route to some psychiatric medications began with anti-tuberculosis agents and antihistamines. I doubt that's a complete list.

Gene Logic is one of the original cohort of genomics companies and has been through many iterations of business plan. If memory serves, they were one of several companies originally built around a differential display technology, a way of obtaining mRNA signatures for diseases which predated microarrays. Gene Logic later became one of the major players in the toxicogenomics space, and as part of that effort built a large in-house Affy-based microarray effort. They built microarray databases for a number of disease areas (I've used their oncology database), built a sizable bioinformatics effort, and even acquired their own CRO.

However, none of that could quite be converted into a stream of gold, so over the last year or so the whole mess has been deconstructed, leaving behind the drug repositioning business which had begun as a unit of Millennium (which is one reason I'm familiar with it). They'll even be changing their name soon, to Ore Pharmaceuticals (presumably Overburden and Slag, while appropriate for the mining theme, did not last long in the naming queue).

While there is certainly historical precedent for repositioning, the question remains whether companies can make money doing it, and whether those companies will simply be the big pharmas or the gaggle of biotechs chasing after the concept. Depending on the company, some mixture of in vivo models, in vitro models and computational methods are used. One way to think of it is doing drug discovery, but with a compound which already has safety data on it. There is also extensive interest in the concept in the academic sector, which is a very good thing -- many drugs which may be repositionable have little or no patent life yet, meaning companies will find it difficult to invest in them with any hope for a return.

Gene Logic / Ore has one repositioned drug which has gone through clinical trials, GL1001 (nee MLN4760). This is a drug originally developed by Millennium as an ACE2 inhibitor. Since I'm among the discoverers of ACE2, I tend to key an eye on this one. Millennium gave it a whirl in obesity, but now Gene Logic has found a signal in inflammatory bowel disease in animal models.

That Pfizer bothered to file a patent is significant, as it triggered a milestone payment -- amount unspecified, but these are usually something interesting. But that is still a long way from starting a new trial -- that will be the real milestone, and whichever drug repositioning firm can claim that will really be crowing -- that is, until somebody actually gets a drug approved this way.

Friday, November 16, 2007

Docs eager to winkle wrinkles, slow to hole a mole

I'm no fan of the local TV news broadcasts & therefore rarely catch them. So it was quite by accident that I caught a story last night that is the sort to give one the shudders.

The station had two staffers call 20+ dermatologists. One staffer would state that she had a suspicious mole to be checked out, whereas the other staffer would call requesting an appointment for cosmetic Botox. Now, at some level the results shouldn't be surprising, as if the pattern was the opposite it wouldn't have made the local news. But what was striking was the range of difference: in one case the mole would get an appointment several months in the future, but the same office would be willing to Botox away the next day. Yikes!

Perhaps more striking was the one doc who showed such a pattern who was interviewed on camera. She made no apologies and showed neither shame nor remorse. Her practice is no longer taking on new 'medical' patients, but is happy to accept new cosmetic ones. She did say that perhaps if the caller had been more persistant, perhaps they would have gotten a sooner appointment. Alas, the interviewer did not ask her to explain the ethics of the situation. It is not hard to imagine that many patients calling about a mole are on the fence as to whether to worry or not, and being given a long wait will push them back to complacency (hey, if the doctor's not worried about it why should I be?). Some small fraction of those persons may have early stage melanomas, with potentially lethal results from delay in removal.

It's not hard to guess at the driver of this issue: Botox is elective, probably not covered by insurance, and therefore patients will pay top dollar; mole screening is covered & governed by the immense pricing power of insurance companies.

Somewhere in the last week or so I saw an article commenting that increasing numbers of doctors from a wide variety of specialties are performing cosmetic procedures. A Polyanna might think this would provide the competition to drive the dermatologists back to doing the important stuff, but more likely the corruption will just spread to more specialties. In high school I once switched dentists because it was impossible to get appointments at my long-time dentist, but I went running back after a few appointments when I realized the new guy opened every session with an inquiry as to whether I might want to change out my cap for a better color match.

A real pessimist might note that these new-fangled genetic tests are coming down the pike, that they may also be considered elective and not covered by insurance, and may represent another monetary siren tempting docs to neglect treating disease.

By coincidence, I had just watched the one movie I know of that opens with a monologue on ethics, Miller's Crossing. Caspar & the interviewed doc would be unlikely to have any arguments in that department.

Thursday, November 15, 2007

Trapping the wily VEGF

Blogging on Peer-Reviewed Research

A significant challenge in pharmacology is the correct dosing of the drug. "The dose makes the poison" is an adage in toxicology, but "the dose makes the drug" just as much. Too little drug and insufficient effect will occur; too much and the patient is likely to suffer from toxic side-effects.

Traditionally, drug dosages evolved completely empirically. Many drugs have profiles allowing very crude dosing -- "take two aspirins and call me in the morning" is remarkable advice, remarkable because it generally works. Other drugs, particularly in trials, are dosed by body size. This makes rough sense, as if you wish to obtain a certain concentration of drug the amount of body it will be diluted in should be taken into account.

Over time various influences on dosing have been realized. I ate a grapefruit today pondering whether one day this small pleasure will be forbidden to me; the metabolism of many drugs is altered by natural compounds present in grapefruit. Individual variation plays a major role as well, with some chemotherapy drugs at normal doses near-lethal to small fractions of the population, because those individuals metabolize the drug differently. Some drugs have notoriously narrow dosing windows: underdose a heart patient and they may have angina or other nasty events; overdose them and they can have nosebleeds which simply won't end.

It is hard enough to dose drugs for which there are decades of experience or which are relatives of drugs with long pedigrees. Dosing brand new agents with new activity profiles is far more difficult. Hence, there is a real need for compasses which could point the way to the correct dose.

VEGF is an important soluble signaling factor which stimulates angiogenesis, the formation of new blood vessels. Anti-angiogenesis agents have emerged as an important tool in oncology and also in the vision-robbing disease macular degeneration. VEGF can be targeted in a number of ways: the antibody drug Avastin (bevacizumab) directly binds VEGF, whereas multi-targeted ("dirty") kinase inhibitors such as Nexavar (sorafanib) and Sutent (sunitinib) knock out the cellular receptors for VEGF (which are tyrosine kinases) among their many targets.

VEGF-Trap is an investigation drug being developed by Regeneron, one of those feline biotech companies (9 lives!) which keep plugging along. VEGF-Trap is a pastiche of carefully chosen protein parts: pieces of two different human VEGF receptors plus a bit from a human antibody (IgG1) constant region.

In a new paper in PNAS (open access) the Regeneron folks show that VEGF-Trap forms stable, inert, monomeric complexes with VEGF which remain in circulation. By measuring the amount of free and VEGF-complexed VEGF-Trap in circulation they can measure VEGF levels and identify a dose which ensures that maximal trapping occurs. If insufficient drug is applied, then little or no free VEGF-Trap is detected.

One significant surprise, in both mice and humans, is that VEGF levels are higher than previously reported. Furthermore, VEGF levels do not differ greatly between individuals with cancer (human patients or xenografted mice) and those without. Human and mouse VEGF levels were very similar, when normalized for body mass. Maximal anti-tumor effects were observed in the mouse models at the dosing point where free VEGF-Trap was observed, suggesting that this method of VEGF measurement can guide dosing.

Can you do the same trick with bevacizumab? Not according to the paper: antibodies form multivalent complexes with their targets, and these complexes are removed from circulation by various mechanisms. Measurements of bound complex are therefore difficult and not informative.

During my previous job I got interested in whether VEGF, or other angiogenic mediators, might be useful for patient stratification. Several papers claimed that soluble angiogenesis factor levels were useful in predicting cancer outcome, but when I compared the measurements in the papers they weren't even on the same scale: the reported baseline measurements in normal individuals were completely different. It didn't invalidate the concept, but certainly prevented any useful synthesis of various papers.

John S. Rudge, Jocelyn Holash, Donna Hylton, Michelle Russell, Shelly Jiang, Raymond Leidich, Nicholas Papadopoulos, Erica A. Pyles, Al Torri, Stanley J. Wiegand, Gavin Thurston, Neil Stahl, and George D. Yancopoulos
VEGF Trap complex formation measures production rates of VEGF, providing a biomarker for predicting efficacious angiogenic blockade
PNAS published November 13, 2007, DOI: 10.1073/pnas.0708865104

Wednesday, November 14, 2007

Ring around the protein...

Blogging on Peer-Reviewed Research
One of the journals I monitor by RSS is Nucleic Acids Research, and the usual steady flow of new items has become a torrent, mostly about databases. Yes, the annual Database Issue is on its way and the Advance Access shows the signs. And, it's all Open Access.

Every year this issue grows and grows, and each year I skim through all the little niche databases. They may be small and esoteric, but somebody has a passion for that niche & that's great!

I've always liked oddities & anomalies in biology: the rules are useful, but the mess is fascinating. Somewhere in my undergraduate days I came across the fact that there were known examples of circularly permuted proteins, proteins whose final sequence is attained by moving a segment from the tail end around to the front. But somehow the existence of proteins whose mature form is a circle (via a post-translational step) had escaped me. But now that void is filled, as I can loop around to CyBase, a database of cyclic proteins and peptides.

Why circlets? Well, one obvious advantage is two fewer free ends for proteases to make mischief of -- and many of these proteins are protease inhibitors. Indeed, the stability extends to other abuses, with the suggestion that these might make interesting scaffolds for drug development. Circles also make for attractive sequence profile displays. And not only does the database cover naturally cyclic proteins, but has tools to help you design your own!

Conan K. L. Wang, Quentin Kaas, Laurent Chiche, and David J. Craik
CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering
Nucleic Acids Research Advance Access published on November 5, 2007.
doi:10.1093/nar/gkm953

Monday, November 12, 2007

Just how wrong is Marilyn vos Savant?

Marilyn vos Savant is a writer of a regular column in Parade magazine. These columns address many things, but often have interesting logic puzzles. Given that she is claimed to have the highest recorded IQ ever, whole sites have sprung up to find fault with her writings. Now, I'll confess I'm always looking for an angle -- and rarely finding one.

But this past Sunday, she gave me a bit of an opening: in response to a question as to whether there are any beneficial viruses. Her response:
No. Bacteria are living one-celled microorganisms. By contrast, viruses aren’t alive in a strict sense: They are the ultimate parasites and cannot replicate without a host. They invade the cells of animals, plants and even bacteria, then either lie dormant or get into the driver’s seat and cause changes that disrupt normal cell functioning, the very essence of disease.


The first two sentences and most of the third are dead on the money: bacteria are unicellular organisms, viruses aren't considered alive & invade other cells, where they can lie dormant or immediately go crazy. However, that last bit is the clincher. Apparently Ms. vos Savant is unaware that in the bacterial world there are examples of viruses benefiting their host by bringing along useful genetic stuff. Diptheria is one example, in which the toxin (which presumably helps the bacterial host) is encoded by a virus (phage).

Are there examples outside of bacteria? I don't know of any, but I'm hardly up on my viruses. Moreover, how would we know? Suppose there were viruses which were simply neutral (or nearly so), would we have ever detected them?

Also, in a broader sense some of those phage out there may be an important ecological control on bacterial nasties. So this could be another class of "beneficial" viruses.

Just because you are a parasite doesn't mean you're guaranteed bad!

Wednesday, November 07, 2007

Can you Flex your DNA?

One component of many employment benefit packages in the U.S. (I don't know about other countries) is a Medical Flexible Spending Account. At the beginning of the plan year you choose an amount to set aside from each paycheck which goes pre-tax into an account, the contents of which can be used to reimburse medical expenses. Depending on your tax bracket, this is equivalent to getting a 20-35% discount on your medical bills.

These accounts show the many fingerprints of bureaucracy. Contribution limits are stiff, often both a percentage of pay and an absolute maximum (current $4K, I think). There are, as one might expect, curiosities to the restrictions on what the money can be used for. Health insurance premiums are not coverable, but co-pays and co-insurance are. Both the basic eyeglass frame and the Elton John special are generally reimbursable. A periodontal visit is reimbursable, but the dental floss which might prevent or moderate that visit is not. In general, you must save all your receipts and then send them in to the plan operator. Some offer a handy debit card that works with one of the credit card networks, but if you spend on anything out-of-bounds you'll need those receipts to sort it out -- and you may need those receipts in any case. Lots of paper to save.

But perhaps the most burdensome requirement is use-it-or-lose-it; any money not spent by the end of the plan year is forfeited to the plan operator. Over the last year or so the IRS has loosened this restriction to allow some overlap in plan years, but not all plans allow such. So, you must carefully plan your expenses or the whole benefit is lost -- or worse. And, should you wish a mid-course correction, that's generally not allowed -- you can't change your contribution event unless something big happens, such as the addition or subtraction of a family member (the former is hard to plan precisely, the latter should never be planned!).

So, around this time companies offering covered items start urging folks to check their account balances and spend them before they lose them. Eyeglass merchants are at the head of the line, but so are laser vision correction places.

Which leads to the title question: can you use MedFlex account funds to pay for DNA testing? I honestly don't know, and really nobody lacking a tax law specialty has any business answering the question. But, if I were running one of the personal genomics startups out there I'd be finding out the answer. Perhaps a precedent has been set for other diagnostic procedures not (yet?) well recognized to be of medical value, such as the full-body scans which were heavily marketed a bunch of years back. For if these funds are available, then that is a ready pot of money which might be spent. One big ticket receipt sure wouldn't be a pain to submit, and if the companies were clever they could split the bill over two fiscal years (say one covering the scan and one the counseling) to enable two plan years to be charged with the expense. I don't know whether it would make good medical sense to have such a scan, but some folks on fence might be swayed if they could see it as a bargain.

Tuesday, November 06, 2007

Lung Cancer Genomics

Blogging on Peer-Reviewed Research
A large lung cancer genomics study has been making a big splash. Using SNP microarrays to look for changes in the copy number of genes across the genome, the group looked at a large batch of lung adenocarcinoma samples. Note: the paper will require a Nature subscription, but the supplementary materials are available to all.

As with most such studies, there was some serious sample attrition. They started with 528 tumor samples, of which 371 gave high-quality data. 242 of these had matched normal tissue samples. All of the samples were snap-frozen, meaning the surgeon cut it out and the sample was immediately frozen in liquid nitrogen.

The sub-morphology of the samples is surprisingly murky; much of the text focuses on Non-Small Cell Lung Cancer (NSCLC), the most common form of lung adenocarcinoma, but the descriptions of the samples do not rule out other forms.

After hybridizing these to arrays, a new algorithm called GISTIC, whose full description is apparently in press, was used to identify genomic regions which were either deficient or amplified in multiple samples.

Many changes were found, which is no surprise given that cancer tends to hash the genome. Some of these changes are huge: 26 recurrent events involving alteration of at least half a chromosome arm. Others are more focused.

One confounding factor is that no tumor sample is homogeneous, and in particular there is some contamination with normal cells. These cells contribute DNA to the analysis and in particular make it more difficult to detect Loss-of-Heterozygosity (LOH), in which a region is at normal copy number but both copies are the same, such as both carrying the same mutated tumor suppressor.

Seven recurrent focal deletions were identified, two of which cover the known tumor suppressors CDKN2A and CDKN2B, inhibitors of the cell cycle regulatory cyclin-dependent kinases. The corresponding kinases were found in recurrently amplified regions; an neat but evil symmetry. Tumor suppressors PTEN and RB1 were found also in recurrent deletions. The remaining recurrent deletions hit genes not well characterized as tumor suppressors. One hits the phosphatase PTPRD -- the first time such deletions have been found in primary clinical specimens. Another hits PDE4D, a gene known to be active in airway cells. A third takes out a gene of unknown function, AUTS2.

In order to gain further evidence that these deletions are not simply epiphenomena of genomic instability, targeted sequencing was used to look for point mutants. Only PTRPD yielded point mutants from tumor samples, several of which are predicted to disable the enzymatic function of this gene's product.

On the amplification side, 24 recurrent amplifications were observed. Three cover known bad actors: EGFR (target of Iressa, Tarceva, Erbitux, etc), KRAS and ERBB2 (aka HER2, the target for Herceptin). Another amplification covers TERT, a component of the telomerase enzyme which is required for cellular immortality, a hallmark of cancer. Another amplification covers VEGFA, a driver of angiogenesis and part of the system targeted by drugs such as Avastin. Other amplifications, as mentioned above, target cell cycle regulation: CDK4, CDK6 and CCND1.

The most common amplification has gotten a lot of press, as it covered a gene not previously implicated in lung cancer: NKX2-1. A neighboring gene (MBIP2) was present in all but one of the amplifications, and so NKX2-1 was focused on. Fluorescent In Situ Hybridization (FISH), a technique which can resolve amplification on a cell-by-cell basis in a tissue sample, confirmed the frequent amplification of NKX2-1 specifically in tumor cells. Resequencing of NKX2-1, however, failed to reveal any point mutations in the tumor samples. RNAi in lung cancer cell lines with NKX2-1 amplification showed a reduction of a commonly-used tumor-likeness measure (anchorage-independent growth). This effect was not seen in a cell line with undetectable NKX2-1 expression, nor was it detected when MBIP2 was knocked down. Previous knockout mouse data has pointed to a key role for NKX2-1 in lung cell development. The protein product is a transcription factor, and the amplification of lineage-specific transcription factors has been observed in other tumors.

What will the clinical impact of this research be? None of the targetable genes which were amplified are novel, so this will nudge interest further along (such as in using Herceptin in select lung cancers), but not radically change things. Transcription factors in general have no history of being targeted with drugs, so it is unlikely that anything will come rapidly from the NKX2-1 observations. On the other hand, there will probably be a lot of work to try to characterize how NKX2-1 drives tumor development, such as to identify downstream pathways.

At least some of the press coverage has remarked on the price tag for this work & the surrounding controversy over the Cancer Genome Project that this represents. The claimed figure is $1 million, which does not seem at all outrageous given the large number of microarrays used (over one thousand, if I'm adding the right numbers) -- a few hundred dollars per microarray for the chip and processing is not unreasonable, and the study did a bunch more (analysis, sequencing, RNAi). If such a study were to be repeated at todays prices in the next 5 big cancer killers (breast, ovarian, prostate, pancreatic, colon), it means another $5M not spent on other approaches. In particular, the debate centers around whether the focus should be on more functional approaches rather than genomics surveys. As fond as I am of genomics approaches, it is worth pondering how else society might spend these resources.

It is also worth noting what the study didn't or couldn't find. A large number of known lung cancer relevant genes did not turn up or turned up only weakly. In particular, p53 is mutated in huge numbers of cancers but didn't really turn up here. The technique used will be blind to point mutants and also can't detect balanced translocations. Nor could it detect epigenetic silencing. If you want to chase after those, then it is more genomics -- which is probably one of the things that eats at critics, the appearance that genomics will never stop finding ways to burn money.

Weir et al. Nature Advance Publication. Characterizing the cancer genome in lung adenocarcinoma. doi:10.1038/nature06358