Monday, November 17, 2008

Biological Informatics Subject-Tracer Blog

Marcus P. Zillman's Biological Informatics Subject-Tracer Blog contains a long list of web resources. Some of the links are stale, but there's still a lot of good stuff there.

Friday, November 14, 2008

Amino acid "seqlets"

IBM publishes bio-dictionaries of 'seqlets' for a number of organisms. What's that? you ask. According to IBM: "In a number of publications, we have presented and discussed the idea of the Bio-Dictionary: the latter is a collection of recurrent amino acid combinations (='seqlets') which completely cover the sequence space defined by the biggest possible collection of amino acid sequences. Normally, we recompute the contents of the Bio-Dictionary on a regular basis, typically once a year."

Bio-dictionaries for a handful of Archaeal genomes and a dozen or so bacterial genomes are available for download here. Note that the files are in .Z (Unix compression format) form. Don't expect to view them in your browser.

Wednesday, November 12, 2008

IBM open-source bioinformatics tools

IBM Watson Research Center's Bioinformatics & Pattern Discovery Group has a number of interesting online tools (around a dozen in all) for which source code is available. Go to the Brief Tutorials page to get a quick idea of what the tools are all about. Some of them are rather esoteric, but by the same token that also means they're not the run-of-the-mill sequence analysis tools. These are IBM's own special contributions.

Worth checking out.

Tuesday, November 11, 2008

Journal of Proteomics & Bioinformatics

The Journal of Proteomics & Bioinformatics is an Open Access journal. All papers are available for free download in PDF format.

Saturday, January 26, 2008

Distribution of Protein Sizes in C. elegans

Source: (Click image to enlarge.)

It would be interesting to see a similar graph for E. coli (and other organisms). I suspect the spike at ~300 residues is unique to C. elegans. If you "back out" the spike, the distribution looks fairly Gaussian.

What would be interesting is to indicate certain classes of proteins in red (or some other distinguishing color), such as proteins that require special help in folding. Naïvely, one would expect larger proteins to require more assistance with folding. But it could be more complicated than that: What if proteins that require some sort of special "folding assistance" tend to clump up at particular spots in the distribution?

Tuesday, January 22, 2008

GroEL-Dependent Proteins Are Mostly Small

R. John Ellis (Nature 2006 442:360) makes the following observation(s): "About 85 different proteins of the bacterium Escherichia coli are thought to require encapsulation inside GroEL–GroES to fold correctly. Of these, 60% are 30–50 kilodaltons in size, and only 14% are greater than 50 kDa in size. The cage of GroEL–GroES measures 80 times 85 Angstroms, sufficient in principle to house proteins up to 70 kDa. However, the available volume is somewhat less, owing to the presence of 23 amino acids at the end of each GroEL subunit. Removal of these tails does not affect the basic mechanism, so they provide a way of changing the size of the cage."

This is interesting in two respects. It is interesting that so few E. coli proteins (only 85) are thought to require GroEL assistance in folding. It's also interesting that GroEL-assisted proteins are mostly small to medium-sized, because one would imagine that small proteins would have simpler folding requirements than large proteins. But on the other hand, the E. coli proteins that require GroEL-assist are (evidently) exceptional in some way.

Thursday, January 17, 2008

Database of Large Proteins

I stumbled upon a database (one of several on the Web, it turns out) of large proteins at "Large," in this case, means greater than about 50 kDa in size.

Although these proteins are interesting choices of study in their own right, to me they have extra significance in that they are all too large to fit inside a GroEL cage.

Monday, January 14, 2008

GroEL/Tuberculosis Connection?

Interestingly, Mycobacteria encode two GroEL paralogs, known as GroEL1 and GroEL2. The latter is essential, while the former is nonessential and contains the attB site for phage Bxb1 integration. According to Ojha et al., GroEL1 modulates synthesis of mycolates (the long-chain fatty acid components of the mycobacterial cell wall) during biofilm formation and physically associates with KasA, a key component of the type II Fatty Acid Synthase involved in mycolic acid synthesis. These functionalities are pretty far removed from what most would consider typical GroEL functionality.

It turns out that
inactivation of the Mycobacterium smegmatis GroEL1 gene by phage Bxb1 integration prevents the formation of mature biofilms. This is potentially significant since Mycobacterium strains with altered mycolate profiles are sometimes resistant to the antituberculosis drug Isoniazid.

Ojha, A., et al., "GroEL1: A dedicated chaperone involved in mycolic acid biosynthesis during biofilm formation in mycobacteria" in Cell Volume 123, Issue 5, 2 December 2005, pages 861-873 (DOI: 10.1016/j.cell.2005.09.012)

Friday, January 11, 2008

Folding Time vs. Size

I wonder if anybody has ever done a scatter-plot of protein folding time versus protein size? It would be interesting to see graphs for folding-time in vivo, in vitro, and in silico.

It would also be interesting to see the same kind of graphs, with and without GroEL, for the 85 or so E. coli proteins known to use GroEL-assist.

Wednesday, January 2, 2008

Folding Time-Scales w/wo GroEL

YANTS (Yet Another Note-to-Self): If the folding of certain proteins occurs up to 10 times faster inside a GroEL cage, one would think this would make an interesting system to study via MD simulation (especially for proteins that fold so slowly without a cage that they can't be studied with conventional MD-sim technology).