Friday, November 14, 2008

Amino acid "seqlets"

IBM publishes bio-dictionaries of 'seqlets' for a number of organisms. What's that? you ask. According to IBM: "In a number of publications, we have presented and discussed the idea of the Bio-Dictionary: the latter is a collection of recurrent amino acid combinations (='seqlets') which completely cover the sequence space defined by the biggest possible collection of amino acid sequences. Normally, we recompute the contents of the Bio-Dictionary on a regular basis, typically once a year."

Bio-dictionaries for a handful of Archaeal genomes and a dozen or so bacterial genomes are available for download here. Note that the files are in .Z (Unix compression format) form. Don't expect to view them in your browser.

No comments: