Microsoft WORD format is not a sequence format

I found this on a bioinformatics info site related to the EMBOSS package. I find the tone of it rather amusing, especially as people usually refers to Word-files simply as “text”:


Before reading the rest of this document, please note:
Microsoft WORD format is not a sequence format.

Sequences can be read and written in a variety of formats. These can be very confusing for users, but EMBOSS aims to make life easier by automatically recognising the sequence format on input.

That means that if you are converting from using another sequencing package to EMBOSS and you have your existing sequences in a format that is specific for that package, for example GCG format, you will have no problem reading them in.

If you don’t hold your sequence in a recognised standard format, you will not be able to analyse your sequence easily.

What a sequence format is NOT

When we talk about ‘sequence format’ we are NOT talking about any sort of program-specific format like a word processor format or text formatting language , so we are not talking about things like: ‘NOTEPAD’, ‘WORD’, ‘WORDPAD’, ‘PostScript’, ‘PDF’, ‘RTF’, ‘TeX’, ‘HTML’

If you have somehow managed to type a sequence into a word-processor (!) you should:

  • Save the sequence to a file as ASCII text (try selecting: File, SaveAs, Text)
  • Stop using word-processors to write sequences.
  • Investigate a sequence editor, such as mse
  • Investigate using simple text editors, such as pico, nedit or, at a pinch, wordpad

Now, repeat after me:
Microsoft WORD format is not a sequence format

EMBOSS programs will not read in anything which is held in Microsoft WORD files.

So, remember that Word format is not a sequence format, and be careful with you bioinformatics research! Original text found at:


