Search code examples
testinghardwarediagnostics

Is hardware impossible to debug without software?


Disclaimer: I am (mostly) hardware ignorant. This is probably my problem. However I find it hard to accept that it is not possible to debug hardware so therefore I just wanted to get some second opinions.

We have an issue. Where certain actions (swapping Usb devices in and out at run-time) can blow either the Usb hub or chip on our Usb board (it's custom hardware). It's a fuzzy problem (it appears that the degree of "blownness" can vary a bit) and the problem manifests itself in intermittent fashions with various symptoms that are very difficult to reliably reproduce (typically random corruption of packets).

This results in difficulty in ascertaining if a newly reported problem is due to this hardware fault or is actually a bug in the software. We have since implemented protection on these devices but if an unprotected device is used with a protected device it has a possibility of then tainting the (now protected) device. One of the ports is also not protected meaning that someone could still "kill off" a unit that should be safe by accidentally using the wrong port.

The upshot of this is that it is impossible to tell which of our devices suffer this issue without completely replacing ALL the hardware (we've bitten the bullet for most of our production hardware but there is still a lot of dev and QA hardware out there with this issue).

I would imagine that it could be possible, given a piece of hardware that one could use some kind of hardware diagnostics tools to determine whether the kit is faulty or not. Am I living in a dream world? My hardware department tell me that the only tests that can prove the fault would be software tests... but as I have stated the symptoms are very difficult to reproduce. As I'm not that experienced with hardware I don't know if this is the only answer or not. I therefore ask the world.


Solution

  • Built In Test Equipment is used for performing a Built In Test

    BITE for BIT

    (No bytes involved.)

    It is completely, utterly normal for military/aerospace equipment to have extra hardware to test itself with.

    The original IBM PC hard a surprising quantity of test hardware built in.

    In the case of your equipment, a test device and some statistical analysis would do the trick. This could be done in hardware in a dongle, but frankly would be easier to with some software. Use two back-to-back USB to RS232 serial converters to make a USB loopback device. Send lots of data , checksum packets and measure error rates.

    I'm assuming your errors occur on the in->out as well as the out-<in side.

    Really, your hardware guys need to look at some application notes; USB IS hotplug-safe IF done according to the book. There is a cool example out on the net of opto-coupling a USB chip's connection to the board it's onto prevent this sort of thing. The USB chip is connected to the host, powered from the host, and the interface to the USB chip is SPI, which is opto-coupled back to the rest of the board.

    As for you, the chips are failing partially. Injured devices may work fine for months then die. An electro-static discharge ("a static zap") can do the same thing that you describe. A device can be injured by shocks too small for you to feel.

    The wires and features in semiconductors are microscopic, and easily damaged by stray electricity. If the hardware design is mostly right probably the liekly cause of the problems you've been experiancing is ESD when the devices are handled to plug/unplug. Your devie has it's own power supply and it's ground voltage floats relative to the other end of the USB cable, until it is connected.

    Hope this helps.