While desktop computers have tons of computing power and storage, some small CPUs don’t have a lot of space to store things. What’s more is some CPUs don’t do multiplication and division very well. Same can be said for FPGAs. So suppose we are going to grab a bunch of three-digit decimal numbers from, say, a serial port. We want to store as many as we can, and we don’t want to do a lot of math because we can’t, it is slow, or perhaps it keeps our processor awake longer and we want to sleep to conserve power. We want a way to pack the numbers as close to the theoretical maximum as we can but with little or no math.

The simple approach is to store the numbers as ASCII. Great for processing since they are probably in ASCII already. If they aren’t, you just add 30 hex to each digit and you are done. That’s awful for storage space, though, since we can store 999 in 10 bits if it were binary and now we are using 24 bits! Storing in binary isn’t a good option, if you play by our rules, by the way. You need to multiply by 10 and 100 (or 10 twice) to get the encoding. Granted, you can change that to two shifts and an add (8x+2x=10x) but there’s no easy way to do the division you’ll have to do for the decode.

Of course, there’s no reason we can’t just store decimal digits. That’s call binary coded decimal or BCD and that has some advantages, too. It is pretty easy to do math on BCD numbers and you don’t get rounding problems. Some CPUs even have specific instructions for BCD manipulation. However, three digits will require 12 bits. That’s better than 24, we agree. But it isn’t as good as that theoretical maximum. After all, if you think about it, you could store 16 distinct codes in 4 bits, and we are storing only 10, so that 6 positions lost. Multiply that by 3 and you are wasting 18 codes.

But there is a way to hit that ten-bit target without doing any math. Its called DPD or densely packed decimal. You can convert three decimal digits into ten bits and then back again with no real math at all. You could implement it with a small lookup table or just do some very simple multiplexer-style logic which means it is cheap and easy to implement in software or onboard an FPGA.

This packing of bits was the problem that Thedore Hertz and Tien Chi Chen both noticed around 1969-1971. Hertz worked for Rockwell and Chen worked for IBM and consulted with another IBM employee, Irving Tze Ho. Hertz and Chen independently developed what would become known as Chen-Ho encoding. A bit later, Michael Cowlinshaw published a refinement of the encoding called DPD or densely packed decimal that became part of the IEEE floating point standard.

Both Hertz and Chen used slightly different encodings, but the Cowlinshaw scheme had several advantages. You can efficiently grow beyond 3 digits easily. Decimal numbers from 0 to 79 map to themselves. Bit zero of each digit is preserved so you can do things like check for even and odd numbers without unpacking.

## How Does it Work?

Think of three BCD digits written in binary. Because each digit is in the range [0-9], it can be represented with four bits, so our three digits look like `Xabc Ydef Zghi`

. For instance, if the first digit is 9 then `Xabc`

= 1001. Since we’re only encoding numbers up to nine, if `X`

is 1, then `a`

and `b`

must be 0, leaving us some space to pack other bits into. Now consider `XYZ`

as its own three-digit binary number. This leads to eight distinct cases. The table below shows the encoding for all eight cases, with a lower case x indicates a don’t care:

Case (XYZ) |
Xabc |
Ydef |
Zghi |
Encoding 9 8 7 6 5 4 3 2 1 0 |

000 | 0abc | 0def | 0ghi | a b c d e f 0 g h i |

001 | 0abc | 0def | 100i | a b c d e f 1 0 0 i |

010 | 0abc | 100f | 0ghi | a b c g h f 1 0 1 i |

011 | 0abc | 100f | 100i | a b c 1 0 f 1 1 1 i |

100 | 100c | 0def | 0ghi | g h c d e f 1 1 0 i |

101 | 100c | 0def | 100i | d e c 0 1 f 1 1 1 i |

110 | 100c | 100f | 0ghi | g h c 0 0 f 1 1 1 i |

111 | 100c | 100f | 100i | x x c 1 1 f 1 1 1 i |

Notice that `c`

, `f`

, and `i`

always pass through. The other bit positions vary depending on the case, but you don’t need any math to simply rearrange the bits and add the fixed indicator bits in bits 6-5 and 3-1 of the 10-bit encoding.

In the example figure, 105 matches case 000 — all leading bits are zero — so the encoding is 0010000101 or 085 hex. If the number had been, say, 905 that would match case 100 and would encode as 1010001101 or 28D hex. Decoding is just a matter of running the table backward. If bit 3 is zero, that’s case 000. Otherwise, look at bits 2 and 1. Unless they are 11, you can directly find the corresponding row in the table. If they are 11, you’ll have to further look at bits 6 and 5 to find the right entry. Then you just unwind the bits according to the table.

## Implementation

I was mostly interested in this for FPGA designs, so I wrote some simple Verilog to do the work and you can try it online. The testbench included runs through all 1,000 possible numbers and outputs the DPD code in hex, the 3 input digits, a slash, and the 3 output digits like this:

0fd = 9 7 1 / 9 7 1 1fc = 9 7 2 / 9 7 2 1fd = 9 7 3 / 9 7 3 2fc = 9 7 4 / 9 7 4 2fd = 9 7 5 / 9 7 5 3fc = 9 7 6 / 9 7 6

The encoding is very straightforward. Here’s a snippet for two rows of the table:

3'b000: dpd={digit2[2:0],digit1[2:0],1'b0,digit0[2:0]}; 3'b001: dpd={digit2[2:0],digit1[2:0],3'b100,digit0[0]};

You’ll notice the code used no clocks — it is pure logic and makes extensive use of the `case`

and `casez`

statements. These are like `switch`

statements in C although the `casez`

statement can use ? as a don’t care when matching.

I left both Verilog and C implementations on GitHub for you. Both are pretty naive. For example, the Verilog code doesn’t take advantage of the fact that some of the bits “pass through.” However, a good compiler or synthesizer may turn out some pretty good code anyway. But if you were really worried about minimizing floorplan or code space or minimizing power consumption, you’ll need to tune these to fit your architecture and your needs.

## Algorithms

It used to be when you learned about computers one of the first things you were taught was algorithms and data structures. I’m not sure that’s the case anymore. But the world is full of obscure algorithms we might use if we only knew them. I’m surprised there aren’t more catalogs of algorithms like [Sean Anderson’s] famous bit twiddling hacks page. Probably the best one for everything — which means it is overwhelming — is the NIST DADS dictionary. Be warned! You can spend a lot of your day browsing that site. Even as big as that site is, they specifically exclude business, communications, operating system, AI, and many other types of algorithms.

I mean, sure, most of us know a bubble sort and a shell sort, but do you know a cocktail sort or how to do a Richardson-Lucy deconvolution? The DADS doesn’t either, although Wikipedia is helpful. We couldn’t find consistent overhead byte stuffing or Chen-Ho in either place. Seems like this would be a great AI project to catalog and help locate algorithms and data structures for particular uses. But for now, you can at least add dense packed decimal to your bag of known tricks.