Agile Physics Research

Posted on September 25, 2008

Let’s face it. Physicists are lazy. One way to motivate them is to take away their private offices and pile them into a single room. The idea is that we put two physicists at every desk. That way, if one gets bored and starts looking at lolcats, the other will keep him on track.

This system strictly prohibits physicists from having ideas that aren’t proven to be effective. For instance, if a rogue physicist has an original idea, the other must ridicule this practice. That’s called cowboy research and has been proven to be unmaintainable by other physicists.

Look, hard-working physicists are very busy doing Agile Research. They don’t have time to read papers and independently verify results. Creating new paradigms is not scalable as it requires workers be trained to think for themselves. This is both a costly and dangerous practice.

Consider this. If Caltech would have forced Feynman to share an office with Gell-mann eight-hours-per-day, the quantity of physics completed would have easily doubled. Feynman wouldn’t have had the time to decode the Dresden codex and hang out at strip clubs. He would have just been chained to a desk in Caltech’s new physics factory. Imagine all of the wondrous ideas that weren’t discovered because of his inefficient work environment!

We must presuppose that all researchers are identical and respond fantastically to an Agile mythology created by experts. After all, a society is merely a group of people who believe the same mythology. Why not just enforce the homogeneity of thinking and work practices into physics research departments? Academic conformity must be assured lest the entire civilization devolve into a chaotic diversity of thought. The tyranny of autonomy and independent experimentation must be stopped.

Converting a char* to a hexadecimal char* using C

Posted on September 16, 2008

Back when I used to work on microprocessors, I would find an infinity of half-working hex-to-decimal converters lurking around in various C repositories. At first glance, it’s such a trivial function to write. Often, in a vain effort to “keep it simple”, programmers go off half-cocked and partially implement another bad version snprintf or strtol.

While it is appears trivial to implement this, making it work correctly for all integer types on your platform is rather tricky. That’s why the POSIX string.h functions exist. Unless you’re feeling brave or stupid, DON’T REWRITE THESE!

Last night, I stumbled across something similar on Stack Overflow. The particular question was a slight variation on the hex conversion formatting provided by snprintf. The caveat is that this new function must convert an arbitrary length character array into a character array of hexadecimal ASCII codes as fast as possible. For example, “DO NOT WANT” becomes “444F204E4F542057” since D corresponds to 0x44, O to 0x4F and so on.

One solution is to take an input character array, cast it to an array of unsigned long longs, and convert the integers one-at-a-time. This is a reasonable strategy, however it works better on PowerPC because x86 is a little-endian architecture that reverses the byte order.

Let me explain. In C, you can cast pointers with reckless abandon. Character arrays can become integer arrays and then be converted back into character arrays again. To C, it’s all just arrays of bytes. You can make those bytes into whatever you want. Consider the following.

  char a[] = "DO NOT WANT";
  size_t length = strlen(a);
  printf("a, living a normal life as a char* ----> %s\n", a);
  printf("a, disguised as an unsigned long long -> %08llX\n", *((unsigned long long*)a));
  // outputs
  // a, living a normal life as a char* ----------> DO NOT WANT
  // a, disguised as an unsigned long long -------> 5720544F4E204F44
  // the bytes are reordered on x86, a should be -> 444F204E4F542057

Unfortunately, it’s actually slower to iteratively call snprintf with a 64-bit stride than it is to just loop through character array one-byte-at-a-time and append the hexadecimal characters to the result. Also, on x86 you have to reverse the byte order of each unsigned long long with htonl.

So, back to the drawing board. The original code had a 16 entry array that mapped individual hexadecimal digits to their character equivalents. This can be improved to use a 255 entry array that maps entire bytes back to its character equivalent. It can be initialized programatically as follows.

  for(int i=0; i<256; i++) {
    snprintf(_hex2asciiU_value[i], 3,"%02X", i);
  }

This look-up routine is very fast. We cast the bytes back into its composite hexadecimal digits by simply performing a read from memory. This is what everything looks like in a runnable C program.

#include <stdio.h>
#include <stdlib.h>

char* char_to_hex( const unsigned char* p_array, unsigned int p_array_len, char** hex2ascii) {
    unsigned char* str = malloc(p_array_len*2+1);
    const unsigned char* p_end = p_array + p_array_len;
    size_t pos=0;
    const unsigned char* p;
    for( p = p_array; p != p_end; p++, pos+=2 ) {
       str[pos] = hex2ascii[*p][0];
       str[pos+1] = hex2ascii[*p][1];
    }
    return (char*)str;
}

int main() {
  size_t hex2ascii_len = 256;
  char** hex2ascii;
  int i;
  hex2ascii = malloc(hex2ascii_len*sizeof(char*));
  for(i=0; i<hex2ascii_len; i++) {
    hex2ascii[i] = malloc(3*sizeof(char));
    snprintf(hex2ascii[i], 3,"%02X", i);
  }
  size_t len = 8;
  const unsigned char a[] = "DO NOT WANT";
  printf("%s\n", char_to_hex((const unsigned char*)a, len, (char**)hex2ascii));
}

This is the runtime difference between the original implementation and the one shown above.