Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Dale Cooper

macrumors regular
Original poster
Sep 20, 2005
218
0
Hi,
I'm trying to find a way to compress four 2-bit values into one byte. I have a file with four different 1 byte ascii characters, but since the "alphabet" has only 4 letters I can in theory code four of these into one byte.

I've made a method that takes four characters at the time, and "codes" them into one integer between 0 and 255. Since 1 byte can represent 256 different values, and one C char is one byte, I was under the impression that I could convert the 2-byte integer to a 1-byte char somehow, but I can't find out how to do this!? Is this possible, and if so - how??

Any suggestions would be greatly appreciated!
 

Dale Cooper

macrumors regular
Original poster
Sep 20, 2005
218
0
Thanks so much for your reply! I'm not sure I completely understand though:eek:

Say I have an alphabet with the four chars, "\n", "h", "3" and "9".
Do I do like this?
Code:
int main(void) {
	char test[4];
        test[0] = ' \n';
        test[1] = ':h;
        test[2] = '3';
        test[3] = '9';
    
	unsigned char c;
	int i;

	for (i = 0; i < 4; i++) {
		test[i] | c;
		c<<2;
	}

	printf("%d\n", c);
}
This makes c null/0!

Or do I first have to assign one combination of the four chars to an integer between 0 and 255, and then do the operations on that one?

For example
"\n" "\n" "\n" "\n" --> 0
"\n" "\n" "\n" "h" --> 1
[...]
"9" "9" "9" "9" --> 255

and then somehow convert each of these integers to a specific char? (This was my original plan).
 

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
If there are truly the same four characters across ALL input files ...

const char encode_match[] = { '\n', 'h', '3', '9' };

for each input character search for a match within encode_match[]

if the input character matches an entry in encode_match[] do a bitwise OR (as suggested above) of the INDEX value of the matched character, not the character being encoded.
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
Try this:

Code:
#include <stdio.h>

unsigned char charValue(char);

int main(int argc, char *argv[]) {
	char test[4];
	unsigned char c = 0;
	int i;
        test[0] = '\n';
        test[1] = 'h';
        test[2] = '3';
        test[3] = '9';
    
	for (i = 0; i < 4; i++) {
		c |= charValue(test[i]);
		c<<2;
	}

	printf("%d\n", c);
}

unsigned char charValue(char x) {
	unsigned char result = 0;
	switch(x) {
		case '\n':
			result = 0;
			break;
		case 'h':
			result = 1;
			break;
		case '3':
			result = 2;
			break;
		case '9':
			result = 3;
			break;
	}
	return result;
}

You need to get the 2-bit value first, you can't just or in the character values of your "indicators", they will not be 0-3.

-Lee

EDIT: Note that anything other than these 4 values passed into charValue will return 0, which matches the result for \n. Without additional error parameters, etc. there's not much to be done about this. I suppose charValue you call exit(-1) or something and kill the program, but that seems sort of extreme.
 

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
I prefer a lookup table method. This allows changes, as to which characters to encode, and decode, take place in single line of code.

The 'encode_char' functions success is its return value, while the 2-bit encoded result is placed into the unsigned char whose address was passed in ''address_encode_result".

Code:
//#include <stdio.h>
//#include <stdlib.h>
#include <string.h>

static const char encode_table[] =
{
  '\n', 'h', '3', '9'
};

int encode_char(char ch, unsigned char* address_encode_result)
{
  if ( address_encode_result )
  {
    for ( int i = 0; i < sizeof(encode_table); i++ )
    {
      if ( ch == encode_table[i] )
      {
        *address_encode_result = i;
        return true;
      }
    }
  }
  
  return false;
}

int main()
{
  unsigned char encoded_temp;
  unsigned char encoded;
  char      signature[] = { '3', '3', '9', 'h' };
  
  // encode 'signature' ...
  encoded = 0;
  for ( int i = 0; i < sizeof(signature); i++ )
  {
    encoded <<= 0x2;
    if ( encode_char(signature[i], &encoded_temp) == false )
    {
      return 1; // failure, non-encodable 'signature' character
    }

    encoded |= encoded_temp;
  }
  
  // ... clear 'signature' for reconstruction ...
  memset(signature, 0, sizeof(signature)/sizeof(signature[0]));
  

  //  ... reconstruct 'signature' from endcoded form

  // don't print result as 'alphabet' contains non-printable 'ASCII'
  for ( int i = (sizeof(signature)-1); i >= 0; --i )
  {
    signature[i] = encode_table[encoded & 0x03];
    encoded >>= 2;
  }

  return 0;
}
 

Dale Cooper

macrumors regular
Original poster
Sep 20, 2005
218
0
I really appreciate your help guys, thank you so much! I now have both the encoding and decoding working:)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.