Troubleshooting KISS with bpftrace
This is the troubleshooting story about me finding out why some packets were getting dropped when running AX.25 over D-Star DV between a Kenwood TH-D74 and an Icom 9700.
Troubleshooting: “Trouble”, from the latin “turbidus” meaning “a disturbance”. “Shooting”, from American English meaning “to solve a problem”.
The end result is this post, and this is the troubleshooting story.
The setup: laptop->bluetooth->D74->rf->9700->usb->raspberry pi.
I’m downloading from the raspberry pi, with the laptop sending back ACKs. But one of the ACKs is not getting through.
axlisten -a
clearly showed that the dropped packet was being sent
from the laptop:
radio: fm M0XXX to 2E0XXX-9 ctl RR6-
But nothing received on the receiver side. I saw the D74 light up red
to TX, and the 9700 light up green on RX, but then nothing. Error
counters in ifconfig ax0
were counting up on the receiver side. So
something is being sent over the air.
And it wasn’t the first packet. All the ones before it were fine. They
were always fine. This packet was always dropped. It was always only
that packet that caused it to stall. The window size was set to 2, so
session establishment, RR0
, RR2
, and RR4
went through just
fine. But RR6
keeps getting re-sent, and never gets there.
I tried slowing down the sender on the raspberry pi. Now it no longer
stalled! Note that I didn’t say that RR6
arrived. It actually
didn’t. But because the window size was 2 the raspberry pi would send
another data packet, acked by RR7
, which would arrive just fine and
ACK everything up to there.
Is the packet being sent to the radio correctly? Is the radio actually sending it over the air correctly? Is it received? Is it being sent via serial to the receiving computer?
This sounds like something kernel tracing would help with. I could start sniffing the radio traffic using an SDR, but I wasn’t looking forward to banging my head against demodulating D-Star. Seems like I’ll do that if I have to, but will try pure software first.
And remember this is the kernel (AX.25 layer) sending data to the KISS
driver, which in turn sends it on to the serial port (which in turn is
USB or Bluetooth, in my case). So it’s not in user space at any
point. I can’t just strace
.
I search around the kernel source for rfcomm
(bluetooth) and ax25
,
and find some functions that may tell me when something is sent, and
what it is.
kprobe:ax25_kiss_rcv {
printf("rx %p\n", arg1);
}
kprobe:ax25_queue_xmit {
printf("tx\n");
}
This seems to confirm that I have some good endpoints. I see that the packet is being sent, and not received. Let’s go one level lower on the receiver side.
kprobe:ax25_kiss_rcv {
printf("ax25 recv %s\n", kstack);
}
kprobe:mkiss_receive_buf {
printf("mkiss_receive_buf\n");
}
Here I see that the KISS driver is receiving something, but it doesn’t get to the AX.25 layer.
What could the KISS layer on the laptop actually be sending to the radio?
Dumping what’s actually sent to the D74
I couldn’t immediately find any good way to dump payload (but see below), so I just printed the bytes as a string, for decoding in Python. Though this may not work if the packet has nulls in it, I’ve not tested it (again, see below)..
kprobe:rfcomm_tty_write {
printf("%d\n%s", arg2, str(arg1, arg2));
}
#!/usr/bin/python3
f = open('complete-trace')
while True:
l = f.readline()
if l == '\n':
continue
if l == '':
if f.read(1) != '':
raise "Wat"
break
l = l.strip()
n = int(l)
data = f.read(n)
print(' '.join(["%02x" % ord(x) for x in data]))
open('t.dat','w').write(data)
Output:
c0 80 9a 60 a8 90 86 40 e4 9a 60 a8 90 86 40 61 3f 78 a5 c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 51 f9 4f c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 91 f9 1f c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 d1 f8 ef c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
c0 80 9a 60 a8 90 86 40 64 9a 60 a8 90 86 40 e1 11 f8 bf c0
Note the 5 repeats of the last packet. That’s the RR6
packet that’s
not getting there, repeated. Yeah, so far it looks like it’s getting
to the sending radio. But what are those two bytes in the end? I don’t
see those in my tcpdumps. A checksum? And what’s with that 0x80
?
That should be 0x00
indicating a data frame, right?
Looking in the kernel source I see that there’s a checksum added to the KISS stream sometimes (also mentioned in dmesg). That’s odd, I didn’t see that on the wikipedia page.
Reading the kernel code it looks like there are two checksum standards
(why just one? that’d be too easy). If you send data to TNC port 8
(via command 0x80
), then it’s one checksum system. If you send via
port 2 using command 0x20
it’s another. Super. This is a different
kind of “port” from TCP port and axports
, by the way.
Since the checksum is for KISS, meaning for use between the computer
and the TNC, it seems useless for me. Bluetooth/USB serial won’t
corrupt data, right? I try kissparms -p radio -c 1
to turn off
checksums, and it works! So the checksum is calculated wrong? Seems
unlikely, since it works for every other packet. Also I seem to still
have more intermittent corruption that’s unexpected.
But yeah, maybe the checksum calculation is just wrong? Nah, it wouldn’t affect just this packet, and with both CRC algorithms.
Back to printing the payload.
There’s a better way in the next version of bpftrace (it’s not in
v0.10.0, which is the latest release as of time of writing): The
buf()
function and %r
format specifyer. So I download and compile
git HEAD
.
Here’s a bluetooth serial sniffer bpf program using this new better way:
#include <linux/skbuff.h>
#include <linux/tty.h>
kprobe:rfcomm_tty_write
{
$tty = (struct tty_struct*)arg0;
// Optionally print $tty->index
printf("TX %d %r\n", arg2, buf(arg1, arg2));
}
// Other interesting functions:
// * kprobe:rfcomm_recv_data
// * kprobe:rfcomm_tty_copy_pending
kprobe:rfcomm_dev_data_ready
{
$skb = (struct sk_buff*)arg1;
$buf = buf($skb->data, (int64)($skb->len));
printf("RX %d %r\n", $skb->len, $buf);
}
This is example output on the laptop side, where we see the received probes from the remote end, and the ACKs that get dropped transmitted (it’s not quite the same payload, because I experimented with different SSIDs):
[…]
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0d\x8a`\xac\x9a\x84s\x11\x05\xff\xc0
TX 20 \xc0\x80d\x8a`\xac\x9a\x84r\x9a`\xa8\x90\x86@\xe1\x11\xea\xd5\xc0
This shows the same thing. All the way to the Bluetooth layer it’s correct. So I’d say it’s either getting dropped over-the-air, by the receiving radio, or by the receiving linux kernel.
But because the checksum is wrong the packets don’t make it to the
AX.25 layer, so they can’t be seen with tcpdump
or axlisten
there.
So I make another bpftrace program to sniff reception on the serial level.
#include<linux/tty_ldisc.h>
#include<linux/tty.h>
#include<linux/skbuff.h>
/*
// Sniff the serial data on the KISS layer.
kprobe:mkiss_receive_buf
{
$tty = (struct tty_struct*)arg0;
$data = arg1;
$buf = buf($data, (uint64)(arg3));
printf("RX KISS (pre CRC) %s %d %r\n", arg3, $tty->tty->index, $buf);
}
*/
// Sniff on the tty layer.
kprobe:tty_ldisc_receive_buf
{
$tty = (struct tty_ldisc*)arg0;
$p = arg1; // data
$count = arg3;
$buf = buf($p, (uint64)(arg3));
$name = str($tty->ops->name);
$num = $tty->tty->index;
if ($name == "mkiss") {
printf("RX (pre CRC) %s(%d) %d %r\n", $name, $num, arg3, $buf);
}
}
Annotated results:
# laptop: Probe received
RX (pre CRC) mkiss 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0\x9a`\xa8\x90\x86@s\x11\xc6\xd9\xc0
# laptop: Probe got through CRC check
RX 20 \xc0\x80\x9a`\xa8\x90\x86@\xe0\x9a`\xa8\x90\x86@s\x11\xc6\xd9\xc0
# laptop: Resending ACK
TX 20 \xc0\x80\x9a`\xa8\x90\x86@r\x9a`\xa8\x90\x86@\xe1\x11\x1e\xdf\xc0
# raspberry pi: ACK bytes received
RX (pre CRC) mkiss 3 \xc0\x80\x9a
RX (pre CRC) mkiss 3 `\xa8\x90
RX (pre CRC) mkiss 3 \x86@r
RX (pre CRC) mkiss 3 \x9a`\xa8
RX (pre CRC) mkiss 3 \x90\x86@
RX (pre CRC) mkiss 3 \xe1\x1e\xdf
RX (pre CRC) mkiss 1 \xc0
# Nothing else. ACK didn't get through CRC check.
See the problem? 20 bytes sent. 19 arrives. 0x11
is gone. Of course
the checksum fails. But how can it just disappear? And it’s like this
every time for this packet.
After some more testing it seems that yes, all 0x11
bytes are
lost. I can’t send packets with 0x11
in them!
What’s so special about 0x11
? It’s flow control bytes for
XON/XOFF.
The radio is stripping 0x11
and 0x13
(also confirmed) out because
they are flow control characters.
Some more testing also showed that the radio sends these characters for flow control, so intended ones get dropped, and then extra ones are added.
While searching for a way to escape these bytes (it’s apparently protocol-dependent, and not as simple as “the XON/XOFF way”. In fact wikipedia says “This is frequently done with some kind of escape sequence”. “Some kind of”… thanks.
While looking through the 9700 advanced manual on page 10-22 I come across a note that only ASCII is supported. Oh. Oh they mean this is only for printable characters, don’t they?
This is when I open minicom
directly against the ports, start
typing, and realize that I’m not working with a KISS TNC-like
interface at all, no. Every character I type is immediately and
correctly sent and received. The KISS communication I was almost
successfully using to send AX.25 packets was actually between the two
linux kernels, not between computers and radios.
I was talking KISS to myself, not to the radios!
All the 0xC0
escapes, 0x80
command, and everything else, was just
a stream of bytes to the radios, to be sent as-is. Even though they’re
not printable ASCII characters, the radio only bothers to drop the
XON/XOFF characters.
Oh.
Well, I learned bpftrace. So there’s that.
Also I learned that I should real the manual more carefully. In my defense it’s three manuals, totalling 296 pages.