Issue #43

Last Update December 24, 2005

Technology O'Reilly's Gone Bio by Eric Katz You open an O'Reilly book and scan through the contents mindlessly.  Three catch your eye: Multiple Sequence Alignments, Trees and Profiles; Predicting Protein Structure and function from a sequence; Tools for Genomics and Proteomics.  What are these titles doing in a book released by the famous computer reference publisher?  The answer: you are obviously reading one of O'Reilly's new series on Bioinformatics.

After decades of making life easier for businesses, government agencies and students, the computer industry has begun to direct its powerful tools at the problems of life itself.  Got too many DNA sequences? Let's organize them in a database.  Looking for recurring motifs in bunch of genes? There's nothing like a pattern-matching algorithm to find them.  Trying to solve a truly intractable, non-linear math problem?  Why not try a heuristic search?

There is more complexity in a single bacterial cell than in any tax return. Life is bound by the laws of physics and chemistry, but that doesn't make it any easier to understand how an entire genome functions, or how a protein folds.

Quick to spot a knowledge gap they can fill, O'Reilly has jumped on the bioinformatics bandwagon.  The major practical problem in bioinformatics is that it is a blend of two very different fields. Biologists need desperately to learn how to really use their computers, and computer scientists can't be of any help until they understand the biological world.

Developing Bioinformatics Computer Skills by Cynthia Gibas and Per Jambeck helps to bring these two fields together.  After a lengthy introduction, the book presents a section describing to biologists (in brief) how to use a Linux workstation.  Basic commands are described so that the new user can manipulate and navigate through her files.  The Bash shell - a powerful command line environment - is described, so that she can make programs interact with each other.  Regular expressions are introduced, so that she can begin to examine patterns in her data files.  Finally, the tools of a multi-user environment are described so she can regulate the multiple processes in her busy laboratory.

After this quick tour of the cyber-lab, it's time to get to work.  The bulk of the book is dedicated to real bioinformatics tools.  It starts with an introduction to the use of the web for finding articles.  Ho hum.  But did you know that the articles database at NCBI is intimately linked with its genetic and protein databases?  You can search for articles by entering your sequence, finding similar molecules and then linking directly to the articles that described them first.  And if your sequence is novel, you can easily submit an on-line record of your discovery.  It's all in the book.

But you don't need a Linux station to fiddle around on the web.  The rest of the book describes the nuts and bolts of a number of bioinformatics packages.  How does ClustaW align multiple sequences? How can we create molecular models with MolScript?  How can we create a biochemical pathway database? It's all in the book.

The books that O'Reilly is most famous for are their "In a Nutshell" series. These books present a language or computer system in a brief, consistent format.  True to tradition, O'Reilly's Sequence Analysis in a Nutshell is a compact volume that describes the basic functions and options available for a host of sequence analysis programs.  Warning for Biologists: whereas Developing Bioinformatics Computer Skills is largely designed for biologists, Sequence Analysis In A Nutshell is written in the terse style that programmers are used to from the Unix man pages.

The book starts with a section describing the data formats of several bioinformatics databases.  The formats described include: FASTA, GenBank, SWISS_PROT, PFAM and PROSITE.  All of these are widely used formats that might be useful in the design of a new tool.

 At the end of the book are two sections that have probably never been seen before in a computer manual: a complete table of the amino acids, with structures and chemical properties, and a complete table of the genetic code (that is the translation code from DNA triplets to amino acids) for every kind of organism imaginable. Even biology texts will often skimp on this last part.

O'Reilly has put out two excellent, useful, volumes describing the tools and skills necessary to set up a modern bioinformatics laboratory.  The programming community will have no trouble finding them: let's hope the word gets out to the biologists as well.

Eric Katz is a Doctoral student in Bioinformatics at Kent State University.

New York Stringer is published by NYStringer.com. For all communications, contact David Katz, Editor and Publisher, at david@nystringer.com

All content copyright 2005 by nystringer.com

Click on underlined bylines for the author’s home page.

Click here to send Events Listings

Click here to send us email.