File Encoding Schemes
Steven Shepard
December 1996
An editted version of this
paper appeared with the title
"File Encoding Schemes" in the Burlington Business
Digest, December 1996.
The growth of distributed
computing has yielded a flexible and diverse environment that
includes PCs, minicomputers, and mainframes, all transparently
interconnected by a wide variety of network types. This
proliferation of hardware leads to a diversity of operating
systems, access protocols, user applications, and text
representation schemes, which create new challenges for the
user.
The compatibility issues
associated with operating systems, access protocols, and
applications have straight-forward solutions. Text
representation, however, is a different animal. As networked
applications have evolved in concert with this distributed
computing model, a need has arisen for techniques that allow
users to transport a broad variety of file types across networks
in a simple and transparent fashion. For example, a user might
want to create a document in a proprietary word processor such
as Microsoft Word or WordPerfect, attach the file to an e-mail
message, and transmit it across a network to another user. While
this seems like a simple exercise, it can become quite
complicated.
Machines communicate by
transferring binary data long strings of ones and zeroes that
are easily digestible by the buffers and processors of other
computers. Information that is transferred from machine to
machine is most commonly encoded as either pure binary data, or
in a form called hexadecimal. When data is encoded in binary
form, it is transferred as long, uninterrupted streams of ones
and zeroes. No attempt is made to chop the bits into
recognizable characters that will be done by the receiving
application. Hex, as it is commonly called, is used to distill
long strings of binary ones and zeroes into more manageable data
a form of shorthand, if you will.
If a user were to transfer a
document created in a commercial word processor across a
network, and a receiving machine were to try to read the data,
it would have problems interpreting the file because files
created by word processing applications contain much more than
just text: they also contain proprietary formatting information
that is only readable by that application, and any attempt to
turn those command structures into interpretable characters will
result in an unreadable collection of gibberish, because they
are not files intended to be read by humans: they are intended
to be read by an application running in another machine.
Clearly, some technique is required that will allow these files
to be transparently and easily moved between machines, without
being corrupted by systems attempting to read them as text
files.
To satisfy this need,
programmers have created a variety of applications that allow
file structures created in one machine environment to be
interpreted and read on another. Generally speaking, they
convert files from one binary representation scheme to another,
or encode files in a format that is universally interpretable by
other machines that have the correct decode software. Some of
these programs are specific to particular computing
environments; others are generically applicable. The most common
of them are UUEncode, UUDecode, BinHex, and MIME.
UUEncode is a standard data
encoding scheme used on most transmission systems. It has been
adopted as a defacto standard on the Internet for text file
transfers, and is used to convert text file structures to a
binary format that is more easily interpretable (from 8-bit code
to 7-bit code). UUDecode, as the name implies, perform the
opposite task.
BinHex is a standard
representation system for the Macintosh environment, and is used
to convert binary-encoded files to hexadecimal representation,
and vice-versa.
MIME, which is an acronym for
Multipurpose Internet Mail Extensions, is an application that
was created and blessed by the Internet Engineering Task Force (IETF)
in response to the growing file diversity that makes up Internet
traffic. Whereas the primary traffic type on the Internet has
always been simple, text-based electronic mail, audio, video,
and image files have come to represent a significant percentage
of all Internet traffic. MIME allows audio and video files to be
embedded in a single file structure, and is primarily used to
move heterogeneous file types across TCP/IP networks.
These applications are
available in commercial versions from various software houses,
or as shareware, downloadable from any of a number of archive
sites scattered across the Net. They are extremely easy to use:
after booting the selected application, the user simply selects
the file that is to be encoded, and the application does the
rest. The encoded file is created and given a unique name. It
can then be attached to an e-mail message, transmitted across
the network, and decoded and read properly by the receiving
machine.
To read the message, the
receiving machine must run the appropriate decode software.
This, of course, brings up a quandary: which program is the best
to run? If an e-mail user is connected to the Internet via a
commercial Internet Service Provider, the best choice is
probably UUEncode, because of its universal use and broad
availability. Again, the transmitting machine must be running
UUEncode, while the receiving machine must have UUDecode.
Many of the online services
provide a facility for the transparent attachment of files to
e-mail messages, as long as the user is transferring the file to
another user in that system. For example, an America Online user
can attach a document and send it to another AOL user without a
problem. If the file is to be sent to a user outside of AOL,
however, the attachment must be encoded.
If the attachments are to be
something other than text audio files (.AU or .WAV files, for
example), still images or video clips then they should be
encoded using MIME to ensure proper receipt on the other end of
the transmission. Again, the receiving device must have the
appropriate decode software.
The use of these encoding
applications is still a bit ponderous, in that there are many
different programs that can be used for the same purpose. This
places an added burden on the users of these applications,
because they must ensure that the receiver of the encoded
document has the appropriate decode software. Eventually,
standards will emerge (and in fact are already well underway;
MIME is perhaps enjoying the most popularity) that will make the
task even more transparent. Until that happens, though, users
will be forced to play the pick-and-choose game.
ABOUT THE AUTHOR: Steven
Shepard is a Senior Member of Technical Staff with Hill
Associates, a telecommunications education and consulting firm
in Colchester. He can be reached at s.shepard@hill.com.
|