Converting a char* to a hexadecimal char* using C

Posted on September 16, 2008

Back when I used to work on microprocessors, I would find an infinity of half-working hex-to-decimal converters lurking around in various C repositories. At first glance, it’s such a trivial function to write. Often, in a vain effort to “keep it simple”, programmers go off half-cocked and partially implement another bad version snprintf or strtol.

While it is appears trivial to implement this, making it work correctly for all integer types on your platform is rather tricky. That’s why the POSIX string.h functions exist. Unless you’re feeling brave or stupid, DON’T REWRITE THESE!

Last night, I stumbled across something similar on Stack Overflow. The particular question was a slight variation on the hex conversion formatting provided by snprintf. The caveat is that this new function must convert an arbitrary length character array into a character array of hexadecimal ASCII codes as fast as possible. For example, “DO NOT WANT” becomes “444F204E4F542057” since D corresponds to 0x44, O to 0x4F and so on.

One solution is to take an input character array, cast it to an array of unsigned long longs, and convert the integers one-at-a-time. This is a reasonable strategy, however it works better on PowerPC because x86 is a little-endian architecture that reverses the byte order.

Let me explain. In C, you can cast pointers with reckless abandon. Character arrays can become integer arrays and then be converted back into character arrays again. To C, it’s all just arrays of bytes. You can make those bytes into whatever you want. Consider the following.

  char a[] = "DO NOT WANT";
  size_t length = strlen(a);
  printf("a, living a normal life as a char* ----> %s\n", a);
  printf("a, disguised as an unsigned long long -> %08llX\n", *((unsigned long long*)a));
  // outputs
  // a, living a normal life as a char* ----------> DO NOT WANT
  // a, disguised as an unsigned long long -------> 5720544F4E204F44
  // the bytes are reordered on x86, a should be -> 444F204E4F542057

Unfortunately, it’s actually slower to iteratively call snprintf with a 64-bit stride than it is to just loop through character array one-byte-at-a-time and append the hexadecimal characters to the result. Also, on x86 you have to reverse the byte order of each unsigned long long with htonl.

So, back to the drawing board. The original code had a 16 entry array that mapped individual hexadecimal digits to their character equivalents. This can be improved to use a 255 entry array that maps entire bytes back to its character equivalent. It can be initialized programatically as follows.

  for(int i=0; i<256; i++) {
    snprintf(_hex2asciiU_value[i], 3,"%02X", i);
  }

This look-up routine is very fast. We cast the bytes back into its composite hexadecimal digits by simply performing a read from memory. This is what everything looks like in a runnable C program.

#include <stdio.h>
#include <stdlib.h>

char* char_to_hex( const unsigned char* p_array, unsigned int p_array_len, char** hex2ascii) {
    unsigned char* str = malloc(p_array_len*2+1);
    str[p_array_len*2] = '\0';
    const unsigned char* p_end = p_array + p_array_len;
    size_t pos=0;
    const unsigned char* p;
    for( p = p_array; p != p_end; p++, pos+=2 ) {
       str[pos] = hex2ascii[*p][0];
       str[pos+1] = hex2ascii[*p][1];
    }
    return (char*)str;
}

int main() {
  size_t hex2ascii_len = 256;
  char** hex2ascii;
  int i;
  hex2ascii = malloc(hex2ascii_len*sizeof(char*));
  for(i=0; i<hex2ascii_len; i++) {
    hex2ascii[i] = malloc(3*sizeof(char));
    snprintf(hex2ascii[i], 3,"%02X", i);
  }
  size_t len = 8;
  const unsigned char a[] = "DO NOT WANT";
  printf("%s\n", char_to_hex((const unsigned char*)a, len, (char**)hex2ascii));
}

This is the runtime difference between the original implementation and the one shown above.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. Wilmer Mon, 22 Sep 2008 03:07:53 UTC

    Thank you for this graph, Sir! I shall now rip it out of context and use it as a proof that C++ is inferior to C.

  2. Steve Schnepp Wed, 04 Nov 2009 12:35:42 UTC

    You are missing a str[p_array_len*2] = ”; somewhere after the str malloc, since the area isn’t necessarily zero-filled.

  3. Tony Perrie Thu, 05 Nov 2009 03:13:29 UTC

    @steve you’re totally right.

  4. hasan adil Wed, 10 Mar 2010 19:06:09 UTC

    Hi,
    Thanks for the information. I was really stuck on this one…

Comments