C - simple compression - using 256 char values?

Discussion in 'Mac Programming' started by Dale Cooper, Sep 19, 2009.

  1. Dale Cooper macrumors regular

    Dale Cooper

    Joined:
    Sep 20, 2005
    #1
    Hi,
    I'm trying to find a way to compress four 2-bit values into one byte. I have a file with four different 1 byte ascii characters, but since the "alphabet" has only 4 letters I can in theory code four of these into one byte.

    I've made a method that takes four characters at the time, and "codes" them into one integer between 0 and 255. Since 1 byte can represent 256 different values, and one C char is one byte, I was under the impression that I could convert the 2-byte integer to a 1-byte char somehow, but I can't find out how to do this!? Is this possible, and if so - how??

    Any suggestions would be greatly appreciated!
     
  2. Muncher macrumors 65816

    Muncher

    Joined:
    Apr 19, 2007
    Location:
    California
    #2
    Take the 2-bit numbers, OR them with the unsigned char, then shift the uchar 2 units to the left. Repeat.
     
  3. Dale Cooper thread starter macrumors regular

    Dale Cooper

    Joined:
    Sep 20, 2005
    #3
    Thanks so much for your reply! I'm not sure I completely understand though:eek:

    Say I have an alphabet with the four chars, "\n", "h", "3" and "9".
    Do I do like this?
    Code:
    int main(void) {
    	char test[4];
            test[0] = ' \n';
            test[1] = ':h;
            test[2] = '3';
            test[3] = '9';
        
    	unsigned char c;
    	int i;
    
    	for (i = 0; i < 4; i++) {
    		test[i] | c;
    		c<<2;
    	}
    
    	printf("%d\n", c);
    }
    
    This makes c null/0!

    Or do I first have to assign one combination of the four chars to an integer between 0 and 255, and then do the operations on that one?

    For example
    "\n" "\n" "\n" "\n" --> 0
    "\n" "\n" "\n" "h" --> 1
    [...]
    "9" "9" "9" "9" --> 255

    and then somehow convert each of these integers to a specific char? (This was my original plan).
     
  4. lloyddean macrumors 6502a

    Joined:
    May 10, 2009
    Location:
    Des Moines, WA
    #4
    If there are truly the same four characters across ALL input files ...

    const char encode_match[] = { '\n', 'h', '3', '9' };

    for each input character search for a match within encode_match[]

    if the input character matches an entry in encode_match[] do a bitwise OR (as suggested above) of the INDEX value of the matched character, not the character being encoded.
     
  5. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #5
    Try this:

    Code:
    #include <stdio.h>
    
    unsigned char charValue(char);
    
    int main(int argc, char *argv[]) {
    	char test[4];
    	unsigned char c = 0;
    	int i;
            test[0] = '\n';
            test[1] = 'h';
            test[2] = '3';
            test[3] = '9';
        
    	for (i = 0; i < 4; i++) {
    		c |= charValue(test[i]);
    		c<<2;
    	}
    
    	printf("%d\n", c);
    }
    
    unsigned char charValue(char x) {
    	unsigned char result = 0;
    	switch(x) {
    		case '\n':
    			result = 0;
    			break;
    		case 'h':
    			result = 1;
    			break;
    		case '3':
    			result = 2;
    			break;
    		case '9':
    			result = 3;
    			break;
    	}
    	return result;
    }
    
    You need to get the 2-bit value first, you can't just or in the character values of your "indicators", they will not be 0-3.

    -Lee

    EDIT: Note that anything other than these 4 values passed into charValue will return 0, which matches the result for \n. Without additional error parameters, etc. there's not much to be done about this. I suppose charValue you call exit(-1) or something and kill the program, but that seems sort of extreme.
     
  6. lloyddean macrumors 6502a

    Joined:
    May 10, 2009
    Location:
    Des Moines, WA
    #6
    I prefer a lookup table method. This allows changes, as to which characters to encode, and decode, take place in single line of code.

    The 'encode_char' functions success is its return value, while the 2-bit encoded result is placed into the unsigned char whose address was passed in ''address_encode_result".

    Code:
    //#include <stdio.h>
    //#include <stdlib.h>
    #include <string.h>
    
    static const char encode_table[] =
    {
      '\n', 'h', '3', '9'
    };
    
    int encode_char(char ch, unsigned char* address_encode_result)
    {
      if ( address_encode_result )
      {
        for ( int i = 0; i < sizeof(encode_table); i++ )
        {
          if ( ch == encode_table[i] )
          {
            *address_encode_result = i;
            return true;
          }
        }
      }
      
      return false;
    }
    
    int main()
    {
      unsigned char encoded_temp;
      unsigned char encoded;
      char      signature[] = { '3', '3', '9', 'h' };
      
      // encode 'signature' ...
      encoded = 0;
      for ( int i = 0; i < sizeof(signature); i++ )
      {
        encoded <<= 0x2;
        if ( encode_char(signature[i], &encoded_temp) == false )
        {
          return 1; // failure, non-encodable 'signature' character
        }
    
        encoded |= encoded_temp;
      }
      
      // ... clear 'signature' for reconstruction ...
      memset(signature, 0, sizeof(signature)/sizeof(signature[0]));
      
    
      //  ... reconstruct 'signature' from endcoded form
    
      // don't print result as 'alphabet' contains non-printable 'ASCII'
      for ( int i = (sizeof(signature)-1); i >= 0; --i )
      {
        signature[i] = encode_table[encoded & 0x03];
        encoded >>= 2;
      }
    
      return 0;
    }
    
     
  7. Dale Cooper thread starter macrumors regular

    Dale Cooper

    Joined:
    Sep 20, 2005
    #7
    I really appreciate your help guys, thank you so much! I now have both the encoding and decoding working:)
     

Share This Page