Newsgroups: comp.compression
From: gvcormac@plg.uwaterloo.ca (Gordon V. Cormack)
Subject: [News] Re: FAQ clarification - Arithmetic Coding Source Posted
Message-ID: <C2r8r1.2Dt@math.uwaterloo.ca>
Organization: University of Waterloo
Date: Sat, 20 Feb 93 16:06:36 GMT

In article <1m3e6qINNqcj@tamsun.tamu.edu>, sam3103@tamsun.tamu.edu (Stanley Archer Mcclellan) writes:
>
>   ... questions about arithmetic coding

Arithmetic coding is probably easier to demonstrate than to explain.  The
following implementation is based on Guazzo's description.  Most others are
based on Rissannen's (which is paraphrased in the FAQ).

I apologize for posting source in this newsgroup.  The programs are 171
lines of code, and may be of interest to the reader, not just to
the implementor.


Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
gvcormac@plg.uwaterloo.ca
-----------------------------

/*   Guazzo Coding Implementation  -  Compression version 0.0.0


     Copyright Feb. 1993

     Gordon V. Cormack  Feb. 1993
     University of Waterloo
     cormack@uwaterloo.ca


     All rights reserved.

     This code and the algorithms herein are the property of Gordon V. Cormack.

     Neither the code nor any algorithm herein may be included in any software,
     device, or process which is sold, exchanged for profit, or for which a
     licence or royalty fee is charged.

     Permission is granted to use this code for educational, research, or
     commercial purposes, provided this notice is included, and provided this
     code is not used as described in the above paragraph.

*/

/*
     This code uses a one-byte finite state predictor to drive an arithmetic
     coder for data compression.  It should give compression nearly identical
     to one-byte huffman coding.

     Find a better predictor, and you'll have a better compressor!

     It handles end-of-file properly, which requires more than 5 minutes
     thought.
*/


#include <stdio.h>
main(){
int max = 0x1000000,
    min = 0,
    mid,
    index,
    c,
    i,
    bytes = 0,
    obytes = 3;
int bit;
int one[256], zero[256];
for (i=0;i<256;i++) {
   one[i] = 1;
   zero[i] = 1;
   }

for(;;){
   c = getchar();
   if (c == EOF) {
      min = max-1;
      fprintf(stderr,"compress done bytes in %d bytes out %d ratio %f\n",
                      bytes,obytes, (float)obytes/bytes);
      break;
   }
   bytes++;
   for (i=0;i<8;i++){
      bit = (c << i) & 0x80;
      index = (1<<i) - 1 + (c >> (8-i));
      mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index]));
      if (mid == min) mid++;
      if (mid == (max-1)){ /* should never happen, but with floating pt? */
         mid--;
      }
      if (bit) {
         min = mid;
         one[index]++;
         }
      else {
         max = mid;
         zero[index]++;
         }
      while ((max-min) < 256) {
         max--;
         putchar(min >> 16);
         obytes++;
         min = (min << 8) & 0xffff00;
         max = ((max << 8) & 0xffff00) ;
         if (min >= max) max = 0x1000000;
         }
      }
   }
putchar(min>>16);
putchar((min>>8) & 0xff);
putchar(min & 0x00ff);
}
-------------------
/*   Guazzo Coding Implementation  -  Expansion  version 0.0.0


     Copyright Feb. 1993

     Gordon V. Cormack  Feb. 1993
     University of Waterloo
     cormack@uwaterloo.ca


     All rights reserved.

     This code and the algorithms herein are the property of Gordon V. Cormack.

     Neither the code nor any algorithm herein may be included in any software,
     device, or process which is sold, exchanged for profit, or for which a
     licence or royalty fee is charged.

     Permission is granted to use this code for educational, research, or
     commercial purposes, provided this notice is included, and provided this
     code is not used as described in the above paragraph.

*/

#include <stdio.h>
main(){
int max = 0x1000000,
    min = 0,
    mid,
    val,
    index,
    i;
int bit;
char c;
int one[256], zero[256];
for (i=0;i<256;i++){
   one[i] = 1;
   zero[i] = 1;
   }

val = getchar()<<16;
val += getchar()<<8;
val += getchar();
while(1) {
   c = 0;
   if (val == (max-1)) {
      fprintf(stderr,"expand done\n");
      break;
   }
   for (i=0;i<8;i++){
      index = (1<<i) - 1 + c;
      mid = min + ((max-min-1)*((float)zero[index]) /(zero[index]+one[index]));
      if (mid == min) mid++;
      if (mid == (max-1)){  /* should never happen */
         mid--;
      }
      if (val >= mid) {
         bit = 1;
         min = mid;
         one[index]++;
         }
      else {
         bit = 0;
         max = mid;
         zero[index]++;
         }
      c = c + c + bit;
      while ((max-min) < 256) {
         max--;
         val = (val << 8) & 0xffff00 | (getchar()& 0xff);
         min = (min << 8) & 0xffff00;
         max = ((max << 8) & 0xffff00) ;
         if (min >= max) max = 0x1000000;
         }
      }
   putchar(c);
   }
}
---------------
--
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
gvcormac@plg.uwaterloo.ca