USB: some fixes

Werner Almesberger werner at almesberger.net
Sun Mar 4 15:02:15 EST 2012


During the last days, I looked at USB again, trying to make it behave
a little better. There are basically two syndromes with USB:

1) A lot of timeouts and things that look like framing errors or
   missed bytes.

2) Catastrophic lockup of the entire USB subsystem if one device has
   problems.

After shooting myself in the foot with a small but highly effective
bug in "usb load" (it loaded only part of the firmware, causing it to
fail in quite puzzling ways), I made good use of the new debug
capabilities and tried small code variations in rapid succession,
with the objective of identifying patterns that affected the problem
complex 2).

I found and fixed the following main issues:

- host-initiated bus reset, while working in low-speed mode, has no
  effect in full-speed mode,

- when retrying a transmission due to timeouts or what appeared to
  be low-level glitches, there was no retry limit, and

- we never forced a bus reset if a device wouldn't talk.

The lack a the retransmission limit was the main culprit for 2). If
the device decided for some reason to remain silent, softusb would
loop forever on that transfer and never even look at the other port
or pay attention to disconnects.

I also removed the timeout message. In some tests last year, it had
seemed that removing the message would upset timing even more, but I
found no evidence of such a problem now.

After fixing the retries and a few smaller things, the Faderfox LV3
would enumerate after a softusb reset in about 40-60% of all cases.
I could get slightly better results if I unplugged and replugged the
LV3 too.

I then made the enumeration retry loop attempt bus resets before
giving up. This increased the success ratio to 50 out of 50
enumerations.

I also tried with the Rii wireless combo keyboard added to the mix,
and Rii plus LV3 enumerated and worked happily ever after (in 10 out
of 10 experiments).

So far, so good. I now merged all this into the "master" branch of
milkymist.git, so that it can get some testing. There are still a
few issues, though:

- the low-level trouble is still there. With the tools I have, I'm
  almost blind when it comes to non-trivial USB protocol problems.

  So I need (to make) better tools. My idea is to make M1 act as a
  simple kind of "logic analyzer" that samples signals and transfers
  them to RAM. Then I can pick up the raw data and feed it to a
  variant of an interactive protocol decoder I wrote some years ago:

  http://downloads.qi-hardware.com/people/werner/tmp/teaser3.png

- we stil have no CRC on data or control transfers. I experimentally
  added software CRC and didn't see any errors. It did decrease the
  number of successful enumerations a bit, though, which is why I
  backed that change out again, after seeing that CRC errors weren't
  a big issue at the moment.

  In any case, software CRC meets timing requirements only by
  acknowledging packets before checking their CRC. This is a
  protocol violation that can cause trouble of its own. In the end,
  we'll therefore want to calculate the CRC in the FPGA.

- the LV3 sometimes stops responding even to bus resets. Only power
  cycling (unplug, plug) makes it work again. Not sure if this is a
  bug on our side or if this is just the LV3's way of expressing its
  frustration with our bad USB stack.

  If it's the LV3 locking up, then there's little we can do (except
  for trying its patience not quite so much). In M1r4, we'll have
  the additional possibility of actually switching USB bus power
  under software control.

- I also need to merge Xiangfu's preliminary work on processing HID
  report descriptors. That should help with making more mice and
  keyboards work.

The changes I've committed also include a few small changes to the
SoC. These are only for debugging support and not needed for normal
operation. Likewise, there's a bug fix for RTEMS that only affects
the "usb load" command and is not needed in normal operation.

When testing, if a USB device doesn't enumerate / is unresponsive,
unplugging and re-plugging should now always reset it. If it still
doesn't come to life, that probably means that it's hitting one of
the still unsolved mysteries.

- Werner




More information about the discussion mailing list


interactive