After a presentation I gave at a conference, one of the attendees came up and told me about his ASIC design team that consisted of young engineers. They had completed their design and told him that they were done. He then asked the question, “Six months from now when you get the silicon back and it does not work, what are you going to need to diagnose and solve the problems?” They went back to their desks and worked some more.
He taught them a very important principle: Think about what could go wrong and what would be needed to figure that out. Too often, hardware designs assume that nothing will go wrong. It is akin to a software function that does not check the validity of parameters being passed in, or a hardware module that does not synchronize an incoming signal to its clock.
Hardware engineers are good at troubleshooting chips by mounting them to test fixtures and attaching probes. However, when the chip is mounted inside a prototype device, it is often at the expense of not being able to attach the test fixtures. Unfortunately, some problems will not reveal themselves until the chip is inside the actual device running the actual firmware. For situations like this, it’s important for hardware to be designed to assist troubleshooting.
A technique that I have successfully employed is to build firmware-accessible debugging resources into the chip. It is like having a built-in logic analyzer. Here are three types of debugging resources I’ve found particularly useful.
Some devices use a specific number of signal pulses to control certain features. For example, a laser printer generates a horizontal sync pulse to set the size of the paper to be printed. When the printer is operating properly, you probably do not need to know how many pulses occurred. But if something is wrong, you might want to know how many pulses were generated. Maybe only enough pulses for a Letter-size sheet of paper was generated when you are trying to print on a Legal-size sheet. In this case, having a pulse counter on the signal that firmware can read and reset would help solve that problem.
Well-designed UARTs allow firmware to read the current levels of the handshaking signals. That helps troubleshoot the RS-232 communications. The same technique can be used for any other I/O signals where knowing the current levels of those signals could help troubleshoot problems.
As a general rule, firmware does not need to know the current state of a state machine in hardware. However when there are problems, knowing the state can prove useful. A co-worker was trying to get a new chip going and it would not work. He read the current state of the state machine and discovered that it was stuck in a state waiting for an external signal. He looked at the prototype board and discovered that a resister was missing on that signal line. He solved that problem in a matter of minutes whereas it probably would have taken him hours if that state register had not been there.
Until the next bug…