Chris EK, on life as a continually learning software engineer.
A Clojure Introduction to Binary and Bitwise Operators
I’ve been working in Clojure and ClojureScript lately. A few months back, I came across this one weird trick to master Clojure. TL;DR: it recommends reading the API docs, fully, top to bottom.
I’m not even joking! Go to this URL http://clojure.github.io/clojure/clojure.core-api.html and start reading from top to bottom. If you did not read through that page, you may not know about amap. If you stopped reading before you get to ‘f’ you wouldn’t know about frequencies. However, if you read all the way through, you will be rewarded with knowledge about vary-meta.
In the process of doing this, I was struck by all the bit- functions — of the ~600 vars and functions in clojure.core, there are 12 specifically for bitwise operations.
We can disregard leading zeros in binary, so binary 0010 and binary 10 both represent decimal 2.
Each position is one bit (“bit” means “binary digit”). So 1111 is a 4-bit integer (so we can represent up to 15 different values). A 32-bit integer maxes at 11111111111111111111111111111111 (so we can represent up to 4,294,967,295 values). We’ll see that we don’t always use every position counting up from 0 (the left-most position will be used to determine positive/negative).
This blog post uses 2r.... and Integer/toBinaryString to convert between binary and decimal. Other functions and formatters are discussed here.
This brings us to the bitwise operators. There are 12 in clojure.core (all visible in the source):
bit-and – Bitwise and
;; `bit-and` does a logical AND across columns;; 1010 ; 10;; 1001 ; 9;; ----;; 1000 ; 8 ; by column: 1 AND 1 => 1;; 0 AND 0 => 0;; 1 AND 0 => 0;; 0 AND 1 => 0user=>(bit-and 109)8user=>(bit-and 2r10102r1001)8
bit-or – Bitwise or
;; `bit-or` does a logical OR across columns;; 1010 ; 10;; 1001 ; 9;; ----;; 1011 ; 11 ; by column: 1 OR 1 => 1;; 0 OR 0 => 0;; 1 OR 0 => 1;; 0 OR 1 => 1user=>(bit-or 109)11user=>(bit-or 2r10102r1001)11
bit-xor – Bitwise exclusive or
;; `bit-xor` does an exclusive OR across columns (true when the two values are different);; 1010 ; 10;; 1001 ; 9;; ----;; 0011 ; 11 ; by column: 1 XOR 1 => 0;; 0 XOR 0 => 0;; 1 XOR 0 => 1;; 0 XOR 1 => 1user=>(bit-xor 109)3user=>(bit-xor 2r10102r1001)3
bit-not – Bitwise complement
;; `bit-not` does a complement, negating each position so each 0 becomes 1 and each 1 becomes 0;; this reads 1010 as the 32-bit integer 00000000000000000000000000001010 (including leading 0s),;; which, when negated/complemented, becomes 11111111111111111111111111110101 (represents decimal -11).user=>(bit-not 2r1010)-11
bit-and-not – Bitwise and with complement
;; `bit-and-not` ANDs the first integer with the complement of the second;; (bit-and-not 2r1010 2r1001);; (bit-and 2r1010 (bit-not 2r1001));; (bit-and 2r1010 2r0110) => 0010 => 2user=>(bit-and-not2r10102r1001)2user=>(bit-and 2r1010(bit-not 2r1001))2
bit-clear – Clear bit at index n
;; `bit-clear` sets the bit to 0 at a given index (0-based, right to left)user=>(bit-clear2r10110);; clears the 0th bit from right10;; 2r1010user=>(bit-clear2r10111);; clears the 1st bit from right9;; 2r1001user=>(bit-clear2r10112);; clears the 2nd bit from right11;; 2r1011user=>(bit-clear2r10113);; clears the 3rd bit from right3;; 2r0011
bit-flip – Flip bit at index n
;; `bit-flip` flips the bit (changes 0 to 1 or 1 to 0)at a given indexuser=>(bit-flip2r10100);; flips the 0th bit from right11;; 2r1011user=>(bit-flip2r10101);; flips the 1st bit from right8;; 2r1000user=>(bit-flip2r10102);; flips the 2nd bit from right14;; 2r1110user=>(bit-flip2r10103);; flips the 3rd bit from right2;; 2r0010
bit-set – Set bit at index n
;; `bit-set` sets the bit to 1 at a given indexuser=>(bit-set2r10102);; sets the 2nd bit from right to 114;; 2r1110
bit-test – Test bit at index n
user=>(bit-test2r10100);; checks if the 0th bit from right is 1falseuser=>(bit-test2r10101);; checks if the 1st bit from right is 1true
bit-shift-left – Bitwise shift left
;; `bit-shift-left` shifts the entire integer left, filling in empty spaces as 0user=>(bit-shift-left 2r10101);; shifts the integer left 1 position (filling empty position with 0)20; 2r10100user=>(bit-shift-left 2r10113);; shifts the integer left 3 positions (filling empty positions with 0)88; 2r1011000
bit-shift-right – Bitwise shift right
;; `bit-shift-right` shifts the entire integer right, clearing positions < 0user=>(bit-shift-right 2r10101);; shifts the integer right 1 position5; 2r101user=>(bit-shift-right 2r10113);; shifts the integer right 3 positions1; 2r1
unsigned-bit-shift-right – Bitwise shift right, without sign-extension
;; `unsigned-bit-shift-right` does a `bit-shift-right` (so same behavior for small integers). its;; difference is that all the other operations are for 32-bit signed integers, whereas this accepts;; unsigned (no negative values, only positive) integersuser=>(unsigned-bit-shift-right2r10101)5user=>(bit-shift-right 2r10101)5
Negative numbers can be represented using signed integers, where the left-most position represents whether the integer is positive or negative. In one implementation (sign-magnitude notation), the left-most bit is the sign, and the remaining bits are the value.
This means that in an n-bit integer (e.g., 4-bit), the left-most bit signifies positive/negative, so the remaining n-1 (3) bits can hold the actual value. There are 16 possible values for a 4-bit integer (0000 to 1111), meaning that a signed 4-bit integer can go from -7 to +7 (it is 16 possible values because we can represent both -0 and +0, as binary 1000 and 0000).
Two’s Complement is a slightly more complicated scheme for signed integers (though not overly so), and the one more commonly used. A leading 1 still signifies a negative integer. The difference is that it’s not simply sign (1st bit) and magnitude (remaining bits), because the magnitude is determined using the complement (hence the name).
Up until now, all the code samples have been unsigned integers, which is why all the values are positive (and all the examples with a leading 1 haven’t been negative). Assuming 8-bit signed integers, we interpret things as follows:
00001010;; 10;; to get -10, complement and add 111110101;; first complement11110110;; then add 1;; so 11110110 is the signed 8-bit representation of -10
Idiosyncracies between signed and unsigned integers
I alluded to this in the previous section: all the code samples have been unsigned integers. We’ve been exploring conversions between decimal and binary using Clojure’s radix-based entry format (e.g., 2r1010), which uses Java Longs and Java’s Integer/toBinaryString, which returns strings.
I say there appear to be inconsistencies, because this is the expected result of using Long. A full discussion gets into the intricacies of language design and primitives, as exemplified by threads like this and this.