A few days ago I was casually watching the 4 MHz band when an unusually long STANAG-4538 (3G-HF) transmission on 4561.0 KHz/USB caught my attention and I decided to record it for later analysis.
Basically, in the usual STANAG-4538 way (ARQ), after the data link connection has been configured, the sending station and the receiving station alternate transmissions: the sending station transmitting xDL PDUs (Protocol Data Unit) containing payload data packets and the receiving station transmitting acknowledgment/control packets of whether or not the data packet in the preceding PDU was received without error (1). In this case the LDL protocol is used (Figure 1).
| Fig. 1 - STANAG-4538 LDL transfer |
As per STANAG-4538, the "original" datagram to be sent is split into fixed-length segments which will be processed into packets by the chosen LDLn protocol. Indeed, a LDL data packet is defined as a fixed-length sequence of n-byte data segment (n = 32,64,96,...,512) followed by a 17-bit Sequence Number plus an 8-bit Control Field (presently unused). During the construction of the LDL BW3, a 32-bit Cyclic Redundancy Check (CRC) value is computed across the data bits of each data packet and then appended. Then, 7 flush bits having the value 0 are added to ensure that the encoder is in the all-zero state upon encoding the last flush bit. Sumarizing, the on-air length of a LDLn BW3 burst is computed in bit as 8n + 64, in this case (LDL512, n = 512) 4096+64 = 4160 bit or 520 bytes (Figure 2).
| Fig. 2 - LDL512 demodulated bitstream |
That said, we can inspect the last 64 bits (17-bit Sequence Number + 8-bit Control Field + 32-bit CRC + 7 flush bits) of the recorded BW3 bursts (Figure 3).
| Fig. 3 - last 64 bits of the demodulated LDL512 bitstream |
The 8-bit reserved field is added after the CRC field and not after the Sequence Number, as specified in Annex C to STANAG-4538; I don't know if it's the modus operandi of the decoder. Moreover, the bits of the Sequence Number field are transmitted starting with the Last Significant Bit (bit 0) rather than the most significant bit (bit 16), most likely it's the modus operandi of the decoder, as above.
In this sample you may see that the Packet Number #3 is sent 20 times and it clarifies the unusually length of the transmission (actually the number of packet retransmissions is greater than 20 since some bursts were not correctly demodulated, hence not present in Figures, due to their bad SNR value). The Packet Byte Count field is always 511, this means it's a LDL512 transfer. The EOM & SOM fields are both set to "0" meaning that's an "interior" packet. The CRC fields show the same value.
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
110000 111111111 00 10110010110011010001111111100111 000000000000000
3 511 EOM CRC field control + flush bits
SOM
I stopped recording after more than 5 minutes, see Figure 4, and most of the time (4 minutes and 4 seconds!) was spent on the transmissions of the single packet #3; also note the repeated transmissions of packets #2 (2 times) and #4 (6 times) at the beginning and end of the recording (Figs. 2,3).
| Fig. 4 - duration of the whole recording (software "audacity") |
A second important cause is the Adaptive Code Combining (Hybrid-ARQ). Indeed, STANAG-4538's data link protocols utilize a form of Hybrid-ARQ with code combining. This mechanism is specifically designed to work in poor channels and directly contributes to retransmissions. Instead of just discarding an unreadable packet, the receiver stores the corrupted data. When a retransmitted copy of that same packet arrives, the receiver combines the information from all received copies (the original and all retransmissions) to try and successfully decode it. The fact that it takes over 20 attempts may mean the combined information only reaches the decoding threshold after a significant number of transmissions.
Perhaps we are observing retransmissions "forced" by the higher-layer protocol running over the STANAG-4538 data link (running at Layer 2), specifically TCP (Transmission Control Protocol, running at Layer 4). The limit of N_tx applies strictly to the LDL/ HDL protocol itself. A high retransmission count may occur because of the interaction between the two layers, particularly when running IP over HF:
the STANAG-4538 data link protocol tries to deliver a single packet and if all attempts fail, it discards the packet and moves on to the next one. Result: The packet is considered lost by the Data Link Layer.
- TCP (Layer 4) action
TCP is the reliable transport protocol often used for applications like web browsing, email, or file transfer, and it runs over IP, when a packet is lost by the lower layer (Layer 2, STANAG-4538), the TCP sender never receives an acknowledgment for that packet, TCP's retransmission timer eventually expires, and it assumes the network failed to deliver the data (TCP is not HF-aware!), crucially, TCP retransmits the data independently of the Layer 2 protocol.
- the domino effect
STANAG-4538 transmits a packet several times, it fails and discards the packet. TCP times out. It retransmits the data. The new TCP segment is given to STANAG-4538 which treats it as a new packet to send. STANAG-4538 transmits this "new" packet up to N_tx times again. If it fails again, TCP times out a second time, and the process repeats until the packet is successfully acknowledged.
Anyway, technically TCP cannot directly cause packet retransmission, only STANAG-4538 does that: TCP increases the likelihood of retransmissions, but it does not initiate them!
Summarizing:
Definitely channel conditions were undoubtedly poor and severe.
large packet → high chance of corruption → many retransmissions
| Fig. 5 - Packet Error Probability (PEP) vs. packet length (theoretical model) |
[1] recording tanks to linkz KiwiSDR #3 (French Alps)
HDL (High-throughput Data Link protocol) waveforms: BW1 for acknowledgement and traffic management, BW2 for traffic data
LDL (Low-latency Data Link protocol) waveforms: BW3 for traffic data, BW4 for acknowledgement and traffic management
HDL+ waveforms: BW6 for acknowledgement and traffic management, BW7 for traffic data
(2) Why the Limit is Necessary
HF Channel Volatility: HF radio conditions are highly variable due to ionospheric fading and noise. While the LDL protocol uses Code Combining (Hybrid-ARQ) to increase the likelihood of decoding a packet with each attempt, a limit is needed to account for extreme or prolonged channel outages.
Preventing Link Stall: Unlimited retransmissions would cause the entire data link to stall, continually trying to send a packet that may be unrecoverable due to a sudden deep fade or interference. By limiting the attempts, the protocol can decide to terminate the link and allow the higher layers (or the operator) to attempt a new link setup on a different, possibly better, frequency.