! ! compress ! 1 COMPRESS The COMPRESS command invokes a utility to copy a file, generating a file with (usually) fewer bytes. Files compressed by COMPRESS are recovered by DECOMPRESS. COMPRESS[/qualifiers] Input-file[,Input-file...] [Output-file] COMPRESS implements the Lempel-Ziv file compression algorithm. It operates by finding common substrings and replaces them with a variable-size code. This is deterministic, and can be done with a single pass over the file. Thus, the decompression procedure needs no input table, but can track the way the table was built. 2 Parameters Input-file Specifies the name of the file(s) to be compressed. If you have specified /EXPORT=VMS mode, the file must be stored on a disk. Input-file may be a list of file names separated by commas. The names may include wild cards. Output-file Specifies the name of the file created by COMPRESS. It is optional when /EXPORT=VMS is selected. If it is omitted, the name of the input file with _Z added to the extension will be used. If wildcards are used in the file name, they will be replaced by the corresponding field in the input file name. 2 Qualifiers Indicate special actions to be performed by the COMPRESS utility or special properties of either the input or output files. Qualifiers apply to the entire process. The following list shows all the qualifiers available with the COMPRESS command: o /BITS=value o /DELETE o /EXPORT=(option,...) o /IGNORE=(option,...) o /LOG o /METHOD=option o /MODE=(option,...) o /SHOW=(option,...) 2 /BITS /BITS=value This specifies the maximum number of bits to be used in the compression. It implicitly controls both the "quality" of the compression (more bits means more compression) and the amount of memory needed for both compression and decompression (more bits requires more memory). If the compressed file is to be read by a computer with limited memory (such as a PDP-11), choose /BITS=12, else leave BITS at its default of 16. The minimum value is 9 and the maximum value is 16. 2 /DELETE /DELETE /NODELETE (Default) If /DELETE is used, COMPRESS will delete the input file if the compression is successful. 2 /EXPORT /EXPORT=(option[,...]) Export controls the format of the output file. You can select the following: VMS (Default) Write a file that can only be read by VMS COMPRESS. UNIX Write a format that can be read by programs compatible with the Unix compress utility. [NO]ENDMARKER Write a special file endmarker after the data if specified. [NO]BLOCK Monitor compression and reinitialize if the quality decreases if specified. [NO]HEADER Write a file header with information for DECOMPRESS if specified. [NO]INTERCHANGE (Can only be used in conjunction with VMS.) Writes the information specific to VMS in such a format that the file may be decompressed on a separate VMS system that doesn't necessarily have the same population of usernames, etc. In general, use /EXPORT=VMS for compression where the result will be decompressed on a VMS system and /EXPORT=UNIX where the result will be decompressed on a Unix, RSX-11M, RSTS/E, or other non-VMS system. If /EXPORT=UNIX is specified, BLOCK, HEADER, or ENDMARKER may be negated to further qualify the output file format. 3 VMS VMS private format. VMS private format stores the true file name and attributes except for access control lists in the compressed file so LZDCMP can recreate a functionally identical file. Both the input and output files must be on disk. The file is read in block mode rather than record mode. (Note: The VMS 5.1 version of FDL$GENERATE has a bug so that indexed files with multiple areas are not stored correctly. The bug is fixed in VMS 5.2.) 3 UNIX Specifies an output format compatible with Unix compress v3.0. This allows transmitting sequential files to non-VMS systems that support a compress-compatible utility. If you have specified /EXPORT=UNIX, the utility can be configured for variants of Unix compress by negating BLOCK, HEADER, and/or ENDMARKER as needed. Notice that file attributes are not preserved by /EXPORT=UNIX. 3 BLOCK Selects an algorithm whereby COMPRESS evaluates its performance and re-initializes the compression tables whenever performance degrades. Older versions of Unix compress do not support this capability. If negated, ENDMARKER must also be negated. 3 HEADER If negated, COMPRESS does not write a header record. This is for compatiblity with very old versions of Unix compress. If negated, BLOCK and ENDMARKER must also be negated. 3 ENDMARKER If specified, a special "endmark" is written after the end of the file. This is necessary if the file is to be decompressed on RT11 or other systems that require the last block of a file to fill the last block. On the other hand, some versions of Unix compress cannot understand the "extra" endmarker. If you guess wrong, a few bytes of garbage may be appended to the decompressed file. A version of Unix compress that handles endmarkers correctly is available. 3 INTERCHANGE If specified, causes VMS-private file format to be recorded in a fashion that will permit decompression on a different machine; especially important if the owner of the file being compressed has an alphanumeric UIC that is not present amongst the population of users on the machine that will be performing the decompression. 2 /IGNORE /IGNORE=(option,...) When using /EXPORT=VMS, permits suppression of certain fields in the FDL so that the resultant file may be decompressed on a different VAX. For example, if a file owned by [CC,TEX] were to be compressed, a secondary FDL field `OWNER' would be included in the compressed FDL; were this to be decompressed on a VAX which did not include the corresponding UIC in its RIGHTSLIST.DAT, the decompression would fail due to illegal syntax in the FDL. All of these ignorable fields may also be suppressed by using the INTERCHANGE flag on the /EXPORT qualifier. FDL secondaries that may be suppressed with this qualifier are: 3 ALL Suppress all of the ignorable items. May be used in conjunction with negated versions of the other keywords, for example /IGNORE=(ALL,NOPROTECTION) to suppress everything apart from the PROTECTION secondary. 3 MAXIMIZE_VERSION Suppresses the MAXIMIZE_VERSION field in the FDL, if present. If this field has the value "no", DECOMPRESS would overwrite any existing file with the same name and version number. 3 MT Suppresses both fields relating to magnetic tape file storage, see MT_BLOCK_SIZE and MT_PROTECTION. 3 MT_BLOCK_SIZE Suppresses the field which sets the size in bytes to be used when recording the file onto magnetic tape. 3 MT_PROTECTION Suppresses the field which applies access restrictions if the file were to be recorded onto magnetic tape. 3 OWNER Do not record the FDL field which specifies the current owner of the file. This is the field most likely to cause problems when the file is decompressed on a different machine, if the same user identifier doesn't exist on that machine. 3 PROTECTION Suppress recording of the file's current protection; when decompressed, the file will assume the default protection of the user performing the decompression. 2 /LOG /LOG /NOLOG (Default) Prints the names of the input and output files when the compression completes. 2 /METHOD /METHOD=(option) This specifies the particular compression algorithm. Currently, only /METHOD=LZW is supported. 3 LZW Use the Lempel-Ziv-Welch compression algorithm. 2 /MODE /MODE=(option) This allows specification of variations on the compression method. 3 BINARY Tells COMPRESS that the input file is "binary", not "human readable text". This is necessary on Dec operating systems, such as VMS and RSX-11M, that treat these files differently. (Note that binary support is rudimentary and probably insufficient as yet.) This option is ignored if /EXPORT=VMS is specified. BINARY mode makes LZCOMP ignore record breaks in record- oriented files. 3 DELTA This option tells COMPRESS to perform a transformation on the input file whereby each byte is subtracted from its predecessor. I.e., the difference between bytes is compressed, rather than their actual value. This improves the performance of the algorithm on certain kinds of data files. However, it will worsen the compression of most other kinds of data. Note that files compressed with DELTA mode cannot be decompressed on versions of compress that do not support this mode. DELTA mode is useful for digitized video images where each pixel element is represented as a single byte. It may also be useful when compressing digitized encoding of similar analog data where each sample is contained in a single byte. 2 /SHOW /SHOW=(option, [,...]) Display information about the compression. If omitted, COMPRESS operates silently (except for error messages). 3 ALL Equivalent to /SHOW=(PROGRESS,STATISTICS,FDL)/LOG 3 PROGRESS Print status messages at intervals, showing the operation of the program. The report shows the current compression ratio (the ratio of input to output bytes). If this decreases, COMPRESS decides that the characteristics of the file have changed, and resets its internal parameters. The "gap" is the number of input codes used to compute the ratio. 3 STATISTICS Print a report at the end of the process. Note that COMPRESS reports the number of bytes it compresses, which includes the file as well as the information that COMPRESS records about the file (the File Definition Language block and some internal codes), and will therefore be several hundred characters greater than the actual size of the file. 3 FDL Dump the File Definition Language block that describes a VMS input file. 2 LZW_Overview LZW stands for a compression method described in "A technique for High Performance Data Compression." Terry A. Welch. IEEE Computer, Vol 17, No. 6 (June 1984) pp. 8-19. This section is abstracted from Terry Welch's article referenced below. The algorithm builds a string translation table that maps substrings in the input into fixed-length codes. The compress algorithm may be described as follows: 1. Initialize table to contain single-character strings. 2. Read the first character. Set (the prefix string) to that character. 3. (step): Read next input character, K. 4. If at end of file, output code(); exit. 5. If K is in the string table: Set to K; goto step 3. 6. Else K is not in the string table. Output code(); Put K into the string table; Set to K; Goto step 3. "At each execution of the basic step an acceptable input string has been parsed off. The next character K is read and the extended string K is tested to see if it exists in the string table. If it is there, then the extended string becomes the parsed string and the step is repeated. If K is not in the string table, then it is entered, the code for the successfully parsed string is put out as compressed data, the character K becomes the beginning of the next string, and the step is repeated." The decompression algorithm translates each received code into a prefix string and extension [suffix] character. The extension character is stored (in a push-down stack), and the prefix translated again, until the prefix is a single character, which completes decompression of this code. The entire code is then output by popping the stack. I.e., the last code put into the stack was the first code in the original file. "An update to the string table is made for each code received (except the first one). When a code has been translated, its final character is used as the extension character, combined with the prior string, to add a new string to the string table. This new string is assigned a unique code value, which is the same code that the compressor assigned to that string. In this way, the decompressor incrementally reconstructs the same string table that the decompressor used.... Unfortunately ... [the algorithm] does not work for an abnormal case. The abnormal case occurs whenever an input character string contains the sequence KKK, where K already appears in the compressor string table." The decompression algorithm, augmented to handle the abnormal case, is as follows: 1. Read first input code; Store in CODE and OLDcode; With CODE = code(K), output(K); FINchar = K; 2. Read next code to CODE; INcode = CODE; If at end of file, exit; 3. If CODE not in string table (special case) then Output(FINchar); CODE = OLDcode; INcode = code(OLDcode, FINchar); 4. If CODE == code(K) then Push K onto the stack; CODE == code(); Goto 4. 5. If CODE == code(K) then Output K; FINchar = K; 6. While stack not empty Output top of stack; Pop stack; 7. Put OLDcode,K into the string table. OLDcode = INcode; Goto 2. The algorithm as implemented here introduces two additional complications. The actual codes are transmitted using a variable-length encoding. The lowest-level routines increase the number of bits in the code when the largest possible code is transmitted. Periodically, the algorithm checks that compression is still increasing. If the ratio of input bytes to output bytes decreases, the entire process is reset. This can happen if the characteristics of the input file change. (This can be supressed by /EXPORT=(UNIX, NOBLOCK)). 2 Unix Is a trademark of AT&T Bell Laboratories. ! ! decompress ! 1 DECOMPRESS The DECOMPRESS command invokes a utility to restore copy a file that had been compressed by COMPRESS. DECOMPRESS Input-file [Output-file] 2 Command_Parameters Input-file input-file is the name of the compressed input file. It is required. It may be a list of file names separated by commas. The file names may include wildcards. Output-file Output-file is the name of the decompressed output file. It is optional when /IMPORT=VMS is selected. If it is omitted, the name of the original file input to LZCOMP will be used. If wildcards are used in the file name, the corresponding field of the default name will be used. 2 Command_Qualifiers Indicate special actions to be performed by the COMPRESS utility or special properties of either the input or output files. Qualifiers apply to the entire process. The following list shows all the qualifiers available with the DECOMPRESS command: o /BITS=value o /DELETE o /IGNORE=(option,...) o /IMPORT=(option,...) o /LOG o /METHOD=option o /MODE=(option,...) o /SHOW=(option,...) 2 /BITS /BITS=value If a header was not provided, this specifies the maximum number of bits that were used in the compression. This parameter is ignored if the compressed file contains a header. 2 /DELETE /DELETE /NODELETE Deletes the input file if the decompression is successful. 2 /IMPORT /IMPORT=(option, [,...]) Import describes the format of the input file. You can select the following: VMS (D) The file was created by VMS COMPRESS. UNIX The file was created by Unix compress or a program compatible with the Unix compress utility. [NO]ENDMARKER A special file endmarker follows the data. [NO]BLOCK The compress program may have reinitialized compression. [NO]HEADER The compress program wrote its parameters into a file header. [NO]INTERCHANGE (Can only be used in conjunction with VMS.) Ignore all fields in the recorded FDL that might prevent decompression of a file originally compressed on a different VAX; in particular, this prevents failure of the decompression process simply because the owner of the original file is not recognized in the rightslist of the machine performing the decompression. In general, the program can determine the proper value of these flags by reading the first few bytes of the file. If valid, the file header overrides the command line specification. Generally, this option is needed only if you are trying to read a file generated by a version of Unix compress that did not write a header. 3 VMS VMS private format. The output file is created with the same attributes as the input file to COMPRESS. 3 UNIX Specifies an input format compatible with Unix compress v3.0. /IMPORT=VMS and /IMPORT=UNIX are mutually exclusive. It is not usually necessary to specify /IMPORT=UNIX since this mode can be determined from the file header. 3 HEADER Tells DECOMPRESS that the input file does not have a header. This option is for compatibility with very early versions of Unix compress. 3 ENDMARKER Tells DECOMPRESS that the input file does not contain an endmarker. This option should be used when reading file written by the Unix compress program. 3 INTERCHANGE If specified, causes information recorded in the VMS-private file format to be ignored if it would interfere with decompression on a the target machine; especially important if the owner of the compressed file had an alphanumeric UIC that is not present amongst the population of users on the machine that is performing the decompression. 2 /IGNORE /IGNORE=(option,...) When using /IMPORT=VMS, permits suppression of certain fields in the FDL so that a file that was compressed on a different VAX may be decompressed successfully. For example, if a file owned by [CC,TEX] had been compressed, a secondary FDL field `OWNER' would have been included in the compressed FDL; if this is decompressed on a VAX which does not include the corresponding UIC in its RIGHTSLIST.DAT, the decompression would fail due to illegal syntax in the FDL. All of these ignorable fields may also be suppressed by using the INTERCHANGE flag on the /IMPORT qualifier. FDL secondaries that may be suppressed with this qualifier are: 3 ALL Suppress all of the ignorable items. May be used in conjunction with negated versions of the other keywords, for example /IGNORE=(ALL,NOPROTECTION) to suppress everything apart from the PROTECTION secondary. 3 MAXIMIZE_VERSION Suppresses the MAXIMIZE_VERSION field in the FDL, if present. If this field has the value "no", DECOMPRESS would overwrite any existing file with the same name and version number. 3 MT Suppresses both fields relating to magnetic tape file storage, see MT_BLOCK_SIZE and MT_PROTECTION. 3 MT_BLOCK_SIZE Suppresses the field which sets the size in bytes to be used when recording the file onto magnetic tape. 3 MT_PROTECTION Suppresses the field which applies access restrictions if the file were to be recorded onto magnetic tape. 3 OWNER Ignores the FDL field which specifies the original owner of the file. This is the field most likely to cause problems when the file is decompressed on a different machine, if the same user identifier doesn't exist on that machine. 3 PROTECTION Suppress use of the file's original protection; when decompressed, the file will assume the default protection of the user performing the decompression. 2 /METHOD /METHOD=(option) This specifies the particular compression algorithm. Currently, only /METHOD=LZW is supported. 3 LZW Use the Lempel-Ziv-Welch compression algorithm. 2 /MODE /MODE=(option) This allows specification of variations on the output file format. These values will be taken from the source file description if /IMPORT=VMS is chosen. 3 BINARY Create the file in "fixed-block, 512-byte record" format. This is probably the best format to use for decompressing binary files (such as tar archives) created on Unix. 3 DELTA Compress used the difference between successive bytes, rather than the bytes themselves. For certain file formats, such as bit-mapped graphics, this may yield a 10-15% improvement in compressibility. This is not compatible with some implementations of Unix compress. This value is normally read from the file header, and generally need not be specified by DECOMPRESS. 3 STREAM This creates the file in "binary" mode, rather than "text" mode. It is ignored if COMPRESS created the file in /IMPORT=VMS mode. The output file will be created in RMS "Stream-LF" format. 3 TEXT Create the file in "variable-length carriage-control" format. This is appropriate for decompressing readable text files created by Unix compress. 2 /SHOW /SHOW=(option, [,...]) Display information about the compression. If omitted, DECOMPRESS operates silently (except for error messages). 3 ALL Equivalent to /SHOW=(PROGRESS,STATISTICS,FDL) 3 PROGRESS Print status messages at intervals, showing the operation of the program. 3 STATISTICS Print a report at the end of the process. 3 FDL Dump the File Definition Language block that describes the output file. 2 Unix Is a trademark of AT&T Bell Laboratories. ! ! uncompress ! 1 UNCOMPRESS The UNCOMPRESS command invokes a utility to restore copy a file that had been compressed by COMPRESS. UNCOMPRESS Input-file [Output-file] (UNCOMPRESS is a synonym for DECOMPRESS (q.v.), provided for compatibilty with Unix systems.)