This is the troubleshooting story about me finding out why some packets were getting dropped when running AX.25 over D-Star DV between a Kenwood TH-D74 and an Icom 9700.

Troubleshooting: “Trouble”, from the latin “turbidus” meaning “a disturbance”. “Shooting”, from American English meaning “to solve a problem”.

The end result is this post, and this is the troubleshooting story.

The setup: laptop->bluetooth->D74->rf->9700->usb->raspberry pi.

I’m downloading from the raspberry pi, with the laptop sending back ACKs. But one of the ACKs is not getting through.

axlisten -a clearly showed that the dropped packet was being sent from the laptop:

radio: fm M0XXX to 2E0XXX-9 ctl RR6-

But nothing received on the receiver side. I saw the D74 light up red to TX, and the 9700 light up green on RX, but then nothing. Error counters in ifconfig ax0 were counting up on the receiver side. So something is being sent over the air.

And it wasn’t the first packet. All the ones before it were fine. They were always fine. This packet was always dropped. It was always only that packet that caused it to stall. The window size was set to 2, so session establishment, RR0, RR2, and RR4 went through just fine. But RR6 keeps getting re-sent, and never gets there.

I tried slowing down the sender on the raspberry pi. Now it no longer stalled! Note that I didn’t say that RR6 arrived. It actually didn’t. But because the window size was 2 the raspberry pi would send another data packet, acked by RR7, which would arrive just fine and ACK everything up to there.

Is the packet being sent to the radio correctly? Is the radio actually sending it over the air correctly? Is it received? Is it being sent via serial to the receiving computer?

This sounds like something kernel tracing would help with. I could start sniffing the radio traffic using an SDR, but I wasn’t looking forward to banging my head against demodulating D-Star. Seems like I’ll do that if I have to, but will try pure software first.

And remember this is the kernel (AX.25 layer) sending data to the KISS driver, which in turn sends it on to the serial port (which in turn is USB or Bluetooth, in my case). So it’s not in user space at any point. I can’t just strace.

I search around the kernel source for rfcomm (bluetooth) and ax25, and find some functions that may tell me when something is sent, and what it is.

kprobe:ax25_kiss_rcv {
  printf("rx %p\n", arg1);
}
kprobe:ax25_queue_xmit {
  printf("tx\n");
}

This seems to confirm that I have some good endpoints. I see that the packet is being sent, and not received. Let’s go one level lower on the receiver side.

kprobe:ax25_kiss_rcv {
  printf("ax25 recv %s\n", kstack);
}
kprobe:mkiss_receive_buf {
  printf("mkiss_receive_buf\n");
}

Here I see that the KISS driver is receiving something, but it doesn’t get to the AX.25 layer.

What could the KISS layer on the laptop actually be sending to the radio?

Dumping what’s actually sent to the D74

I couldn’t immediately find any good way to dump payload (but see below), so I just printed the bytes as a string, for decoding in Python. Though this may not work if the packet has nulls in it, I’ve not tested it (again, see below)..

kprobe:rfcomm_tty_write {
  printf("%d\n%s", arg2, str(arg1, arg2));
}
#!/usr/bin/python3
f = open('complete-trace')
while True:
    l = f.readline()
    if l == '\n':
        continue
    if l == '':
        if f.read(1) != '':
            raise "Wat"
        break
    l = l.strip()
    n = int(l)
    data = f.read(n)
    print(' '.join(["%02x" % ord(x) for x in data]))
open('t.dat','w').write(data)

Output:

c0 80 9a 60 a8 90 86 40 e4 9a 60 a8 90 86 40 61 3f 78 a5 c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 51 f9 4f c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 91 f9 1f c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 d1 f8 ef c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0

Note the 5 repeats of the last packet. That’s the RR6 packet that’s not getting there, repeated. Yeah, so far it looks like it’s getting to the sending radio. But what are those two bytes in the end? I don’t see those in my tcpdumps. A checksum? And what’s with that 0x80? That should be 0x00 indicating a data frame, right?

Looking in the kernel source I see that there’s a checksum added to the KISS stream sometimes (also mentioned in dmesg). That’s odd, I didn’t see that on the wikipedia page.

Reading the kernel code it looks like there are two checksum standards (why just one? that’d be too easy). If you send data to TNC port 8 (via command 0x80), then it’s one checksum system. If you send via port 2 using command 0x20 it’s another. Super. This is a different kind of “port” from TCP port and axports, by the way.

Since the checksum is for KISS, meaning for use between the computer and the TNC, it seems useless for me. Bluetooth/USB serial won’t corrupt data, right? I try kissparms -p radio -c 1 to turn off checksums, and it works! So the checksum is calculated wrong? Seems unlikely, since it works for every other packet. Also I seem to still have more intermittent corruption that’s unexpected.

But yeah, maybe the checksum calculation is just wrong? Nah, it wouldn’t affect just this packet, and with both CRC algorithms.

Back to printing the payload.

There’s a better way in the next version of bpftrace (it’s not in v0.10.0, which is the latest release as of time of writing): The buf() function and %r format specifyer. So I download and compile git HEAD.

Here’s a bluetooth serial sniffer bpf program using this new better way:

#include <linux/skbuff.h>
#include <linux/tty.h>
kprobe:rfcomm_tty_write
{
  $tty = (struct tty_struct*)arg0;
  // Optionally print $tty->index
  printf("TX %d %r\n", arg2, buf(arg1, arg2));
}
// Other interesting functions:
// * kprobe:rfcomm_recv_data
// * kprobe:rfcomm_tty_copy_pending
kprobe:rfcomm_dev_data_ready
{
  $skb = (struct sk_buff*)arg1;
  $buf = buf($skb->data, (int64)($skb->len));
  printf("RX %d %r\n", $skb->len, $buf);
}

This is example output on the laptop side, where we see the received probes from the remote end, and the ACKs that get dropped transmitted (it’s not quite the same payload, because I experimented with different SSIDs):

[…]
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0

This shows the same thing. All the way to the Bluetooth layer it’s correct. So I’d say it’s either getting dropped over-the-air, by the receiving radio, or by the receiving linux kernel.

But because the checksum is wrong the packets don’t make it to the AX.25 layer, so they can’t be seen with tcpdump or axlisten there.

So I make another bpftrace program to sniff reception on the serial level.

#include<linux/tty_ldisc.h>
#include<linux/tty.h>
#include<linux/skbuff.h>

/*
// Sniff the serial data on the KISS layer.
kprobe:mkiss_receive_buf
{
  $tty = (struct tty_struct*)arg0;
  $data = arg1;
  $buf = buf($data, (uint64)(arg3));
  printf("RX KISS (pre CRC) %s %d %r\n", arg3, $tty->tty->index, $buf);
}
*/

// Sniff on the tty layer.
kprobe:tty_ldisc_receive_buf
{
  $tty = (struct tty_ldisc*)arg0;
  $p = arg1; // data
  $count = arg3;
  $buf = buf($p, (uint64)(arg3));
  $name = str($tty->ops->name);
  $num = $tty->tty->index;
  if ($name == "mkiss") {
    printf("RX (pre CRC) %s(%d) %d %r\n", $name, $num, arg3, $buf);
  }
}

Annotated results:

# laptop: Probe received
RX (pre CRC) mkiss 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0\x9a`\xa8\x90\x86@s\x11\xc6\xd9\xc0

# laptop: Probe got through CRC check
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0\x9a`\xa8\x90\x86@s\x11\xc6\xd9\xc0

# laptop: Resending ACK
TX 20 \xc0\x80\x9a`\xa8\x90\x86@r\x9a`\xa8\x90\x86@\xe1\x11\x1e\xdf\xc0

# raspberry pi: ACK bytes received
RX (pre CRC) mkiss 3 \xc0\x80\x9a
RX (pre CRC) mkiss 3 `\xa8\x90
RX (pre CRC) mkiss 3 \x86@r
RX (pre CRC) mkiss 3 \x9a`\xa8
RX (pre CRC) mkiss 3 \x90\x86@
RX (pre CRC) mkiss 3 \xe1\x1e\xdf
RX (pre CRC) mkiss 1 \xc0

# Nothing else. ACK didn't get through CRC check.

See the problem? 20 bytes sent. 19 arrives. 0x11 is gone. Of course the checksum fails. But how can it just disappear? And it’s like this every time for this packet.

After some more testing it seems that yes, all 0x11 bytes are lost. I can’t send packets with 0x11 in them!

What’s so special about 0x11? It’s flow control bytes for XON/XOFF.

The radio is stripping 0x11 and 0x13 (also confirmed) out because they are flow control characters.

Some more testing also showed that the radio sends these characters for flow control, so intended ones get dropped, and then extra ones are added.

While searching for a way to escape these bytes (it’s apparently protocol-dependent, and not as simple as “the XON/XOFF way”. In fact wikipedia says “This is frequently done with some kind of escape sequence”. “Some kind of”… thanks.

While looking through the 9700 advanced manual on page 10-22 I come across a note that only ASCII is supported. Oh. Oh they mean this is only for printable characters, don’t they?

This is when I open minicom directly against the ports, start typing, and realize that I’m not working with a KISS TNC-like interface at all, no. Every character I type is immediately and correctly sent and received. The KISS communication I was almost successfully using to send AX.25 packets was actually between the two linux kernels, not between computers and radios.

I was talking KISS to myself, not to the radios!

All the 0xC0 escapes, 0x80 command, and everything else, was just a stream of bytes to the radios, to be sent as-is. Even though they’re not printable ASCII characters, the radio only bothers to drop the XON/XOFF characters.

Oh.

Well, I learned bpftrace. So there’s that.

Also I learned that I should real the manual more carefully. In my defense it’s three manuals, totalling 296 pages.