performance - How to safely and/or quickly make up an int from n bytes in memory in C? -
assuming little endian architecture , having large (unsigned char *)
memory area, want able interpret n <= sizeof(size_t)
bytes anywhere in area integer (size_t
) value. want fast possible assuming gcc , x64 architecture, able offer safer code other possible scenarios. possible solutions?
is possible faster following?
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) { switch(len) { /* sizeof(size_t) bytes after addr has allocated */ case 5: return *(size_t *) addr & 0x0ffffffffffu; case 4: return *(size_t *) addr & 0x0ffffffffu; case 3: return *(size_t *) addr & 0x0ffffffu; case 2: return *(size_t *) addr & 0x0ffffu; case 1: return *(size_t *) addr & 0x0ffu; case 6: return *(size_t *) addr & 0x0ffffffffffffu; case 7: return *(size_t *) addr & 0x0ffffffffffffffu; case 8: return *(size_t *) addr & 0x0ffffffffffffffffu; } return 0; }
(the order of branches reflects actual probability distribution of possible len
values, in fact not seem have significant impact: compiler uses constant time solution.) moreover, right ub according standard, despite fact, can expect gcc either "correct" interpretation or, -fstrict-aliasing -wstrict-aliasing=2
, warning or error (because pointer aliasing visible compiler) if compiler behavior should happen change in future?
a bit slower (i have compared whole program) solution following:
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) { union { size_t num; unsigned char bytes[8]; } number = { 0 }; switch(len) { case 8: number.bytes[7] = addr[7]; case 7: number.bytes[6] = addr[6]; case 6: number.bytes[5] = addr[5]; case 5: number.bytes[4] = addr[4]; case 4: number.bytes[3] = addr[3]; case 3: number.bytes[2] = addr[2]; case 2: number.bytes[1] = addr[1]; case 1: number.bytes[0] = addr[0]; } return number.num; }
am right using code no alingment problems arise , that, despite still not correct write 1 member of union , read other member (see discussion around https://stackoverflow.com/a/36705613), "union-based" approach widespread supported compilers? there faster "almost correct" solution?
finally, there faster correct solution using shift , add (thanks hurkyl pointing out, of course tried forgot!), somewhere between above , slowest memcpy?
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) { size_t num = 0; int i; (i = len - 1; >= 0; --i) { num <<= 8; num |= addr[i]; } return num; }
footnote: did not mention particular revision of standard , added c++ tag --- code compilable under standard c89 onward, i'd limit myself common subset of standards (possibly optional definitions empty #define inline
etc.)
your last solution simplest. should initialize num 0
, use addr
instead of &addr
:
static inline size_t bytes2num(const unsigned char *addr, size_t len) { size_t num = 0; memcpy(&num, addr, len); return num; }
note if len
greater sizeof(num)
, above code invokes undefined behavior. if need safe solution, need test:
static inline size_t bytes2num(const unsigned char *addr, size_t len) { size_t num = 0; memcpy(&num, addr, len <= sizeof(num) ? len : sizeof(num)); return num; }
note method assumes integers stored in little endian order (least significant byte first).
for portable solution, still assuming little endian byte ordering, 8 bit bytes, , len <= sizeof(size_t)
loop:
static inline size_t bytes2num(const unsigned char *addr, size_t len) { size_t num = 0; (size_t = 0; < len; i++) { num |= (size_t)addr[i] << (i * 8); } return num; }
if code uses function constant values len
, expanded inline without loop ant possibly using single instruction, depending on compiler's configuration , abilities.
Comments
Post a Comment