[library/applicants_header.htm][library/home_header_code.htm]
[library/resources_menu.htm]

 

Telecom Training Center

Click here to learn more about Hill Associates, Inc.

 Security and Protection


File Encoding Schemes

Steven Shepard
December 1996

An editted version of this paper appeared with the title
"File Encoding Schemes" in the Burlington Business Digest, December 1996.

The growth of distributed computing has yielded a flexible and diverse environment that includes PCs, minicomputers, and mainframes, all transparently interconnected by a wide variety of network types. This proliferation of hardware leads to a diversity of operating systems, access protocols, user applications, and text representation schemes, which create new challenges for the user.

The compatibility issues associated with operating systems, access protocols, and applications have straight-forward solutions. Text representation, however, is a different animal. As networked applications have evolved in concert with this distributed computing model, a need has arisen for techniques that allow users to transport a broad variety of file types across networks in a simple and transparent fashion. For example, a user might want to create a document in a proprietary word processor such as Microsoft Word or WordPerfect, attach the file to an e-mail message, and transmit it across a network to another user. While this seems like a simple exercise, it can become quite complicated.

Machines communicate by transferring binary data long strings of ones and zeroes that are easily digestible by the buffers and processors of other computers. Information that is transferred from machine to machine is most commonly encoded as either pure binary data, or in a form called hexadecimal. When data is encoded in binary form, it is transferred as long, uninterrupted streams of ones and zeroes. No attempt is made to chop the bits into recognizable characters that will be done by the receiving application. Hex, as it is commonly called, is used to distill long strings of binary ones and zeroes into more manageable data a form of shorthand, if you will.

If a user were to transfer a document created in a commercial word processor across a network, and a receiving machine were to try to read the data, it would have problems interpreting the file because files created by word processing applications contain much more than just text: they also contain proprietary formatting information that is only readable by that application, and any attempt to turn those command structures into interpretable characters will result in an unreadable collection of gibberish, because they are not files intended to be read by humans: they are intended to be read by an application running in another machine. Clearly, some technique is required that will allow these files to be transparently and easily moved between machines, without being corrupted by systems attempting to read them as text files.

To satisfy this need, programmers have created a variety of applications that allow file structures created in one machine environment to be interpreted and read on another. Generally speaking, they convert files from one binary representation scheme to another, or encode files in a format that is universally interpretable by other machines that have the correct decode software. Some of these programs are specific to particular computing environments; others are generically applicable. The most common of them are UUEncode, UUDecode, BinHex, and MIME.

UUEncode is a standard data encoding scheme used on most transmission systems. It has been adopted as a defacto standard on the Internet for text file transfers, and is used to convert text file structures to a binary format that is more easily interpretable (from 8-bit code to 7-bit code). UUDecode, as the name implies, perform the opposite task.

BinHex is a standard representation system for the Macintosh environment, and is used to convert binary-encoded files to hexadecimal representation, and vice-versa.

MIME, which is an acronym for Multipurpose Internet Mail Extensions, is an application that was created and blessed by the Internet Engineering Task Force (IETF) in response to the growing file diversity that makes up Internet traffic. Whereas the primary traffic type on the Internet has always been simple, text-based electronic mail, audio, video, and image files have come to represent a significant percentage of all Internet traffic. MIME allows audio and video files to be embedded in a single file structure, and is primarily used to move heterogeneous file types across TCP/IP networks.

These applications are available in commercial versions from various software houses, or as shareware, downloadable from any of a number of archive sites scattered across the Net. They are extremely easy to use: after booting the selected application, the user simply selects the file that is to be encoded, and the application does the rest. The encoded file is created and given a unique name. It can then be attached to an e-mail message, transmitted across the network, and decoded and read properly by the receiving machine.

To read the message, the receiving machine must run the appropriate decode software. This, of course, brings up a quandary: which program is the best to run? If an e-mail user is connected to the Internet via a commercial Internet Service Provider, the best choice is probably UUEncode, because of its universal use and broad availability. Again, the transmitting machine must be running UUEncode, while the receiving machine must have UUDecode.

Many of the online services provide a facility for the transparent attachment of files to e-mail messages, as long as the user is transferring the file to another user in that system. For example, an America Online user can attach a document and send it to another AOL user without a problem. If the file is to be sent to a user outside of AOL, however, the attachment must be encoded.

If the attachments are to be something other than text audio files (.AU or .WAV files, for example), still images or video clips then they should be encoded using MIME to ensure proper receipt on the other end of the transmission. Again, the receiving device must have the appropriate decode software.

The use of these encoding applications is still a bit ponderous, in that there are many different programs that can be used for the same purpose. This places an added burden on the users of these applications, because they must ensure that the receiver of the encoded document has the appropriate decode software. Eventually, standards will emerge (and in fact are already well underway; MIME is perhaps enjoying the most popularity) that will make the task even more transparent. Until that happens, though, users will be forced to play the pick-and-choose game.


ABOUT THE AUTHOR: Steven Shepard is a Senior Member of Technical Staff with Hill Associates, a telecommunications education and consulting firm in Colchester. He can be reached at s.shepard@hill.com.

[library/footer_menu.htm]