30 November 2025

STANAG-4538 LDL packets retransmissions, an interesting case

A few days ago I was casually watching the 4 MHz band when an unusually long STANAG-4538 (3G-HF) transmission on 4561.0 KHz/USB caught my attention and I decided to record it for later analysis.
Basically, in the usual STANAG-4538 way (ARQ), after the data link connection has been configured, the sending station and the receiving station alternate transmissions: the sending station transmitting xDL PDUs (Protocol Data Unit) containing payload data packets and the receiving station transmitting acknowledgment/control packets of whether or not the data packet in the preceding PDU was received without error (1). In this case the LDL protocol is used (Figure 1).

Fig. 1 - STANAG-4538 LDL transfer

As per STANAG-4538, the "original" datagram to be sent is split into fixed-length segments which will be processed into packets by the chosen LDLn protocol. Indeed, a LDL data packet is defined as a fixed-length sequence of n-byte data segment (n = 32,64,96,...,512) followed by a 17-bit Sequence Number plus an 8-bit Control Field (presently unused). During the construction of the LDL BW3, a 32-bit Cyclic Redundancy Check (CRC) value is computed across the  data bits of each data packet and then appended. Then, 7 flush bits having the value 0 are added to ensure that the encoder is in the all-zero state upon encoding the last flush bit. Sumarizing, the on-air length of a LDLn BW3 burst is computed in bit as 8n + 64, in this case (LDL512, n = 512) 4096+64 = 4160 bit or 520 bytes (Figure 2).

Fig. 2 - LDL512 demodulated bitstream

That said, we can inspect the last 64 bits (17-bit Sequence Number + 8-bit Control Field + 32-bit CRC + 7 flush bits) of the recorded BW3 bursts  (Figure 3).

Fig. 3 - last 64 bits of the demodulated LDL512 bitstream

The 8-bit reserved field is added after the CRC field and not after the Sequence Number, as specified in Annex C to STANAG-4538; I don't know if it's the modus operandi of the decoder. Moreover, the bits of the Sequence Number field are transmitted starting with the Last Significant Bit (bit 0) rather than the most significant bit (bit 16), most likely it's the modus operandi of the decoder, as above.
In this sample you may see that the Packet Number #3 is sent 20 times and it clarifies the unusually length of the transmission (actually the number of packet retransmissions is greater than 20 since some bursts were not correctly demodulated, hence not present in Figures, due to their bad SNR value). The Packet Byte Count field is always 511, this means it's a LDL512 transfer. The EOM & SOM fields are both set to "0" meaning that's an "interior" packet. The CRC fields show the same value.

110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
110000  111111111 00  10110010110011010001111111100111 000000000000000
3           511          EOM                 CRC field                          control + flush bits
                             SOM

I stopped recording after more than 5 minutes, see Figure 4, and most of the time (4 minutes and 4 seconds!) was spent on the transmissions of the single packet #3; also note the repeated transmissions of packets #2 (2 times) and #4 (6 times) at the beginning and end of the recording (Figs. 2,3).

Fig. 4 - duration of the whole recording (software "audacity")
 
So, why so many retransmissions? 
STANAG-4538 is designed for High-Frequency (HF) radio communication, which is notorious for having a highly unstable and challenging channel environment. The large number of retransmissions may be a direct consequence of the protocol's need to ensure reliable data delivery over this unreliable medium. STANAG-4538 employs an ARQ (Automatic Repeat reQuest) scheme as part of its data link protocols to achieve this reliability. The maximum number of retransmissions in ARQ schemes is specified by the parameter N_tx (or often N_max​ or N_retries​): if the packet is not acknowledged as successfully received after the last retransmission it is typically discarded by the data link layer, and the link connection may be closed (2).  This clause normalizes the trade-off between reliability and efficiency/latency (preventing the channel from being permanently blocked by an unrecoverable packet). 
The usual or default value for the N_tx parameter is not specified in standard documentation summaries, not even in STANAG-4538, as it is a configurable parameter of the HF modem which is set by the manufacturer or even by the system operator. In commercial and military HF implementations of similar ARQ schemes, the maximum retransmission count is often set to a value like 4, 8, or 10, but the actual number should be verified against the specific equipment's configuration settings.

One of the causes of retransmissions is poor HF channel conditions: fading, multipath distortion, noise, interference and man-made interference (from electronics or other users). Any of these factors can cause bit errors in the received packet, which the receiver detects via a checksum (CRC) and then requests a retransmission of the corrupted packet.
A second important cause is the Adaptive Code Combining (Hybrid-ARQ). Indeed,  STANAG-4538's data link protocols utilize a form of Hybrid-ARQ with code combining. This mechanism is specifically designed to work in poor channels and directly contributes to retransmissions. Instead of just discarding an unreadable packet, the receiver stores the corrupted data. When a retransmitted copy of that same packet arrives, the receiver combines the information from all received copies (the original and all retransmissions) to try and successfully decode it. The fact that it takes over 20 attempts may mean the combined information only reaches the decoding threshold after a significant number of transmissions.
In short, the retransmissions aren't necessarily a sign of failure, but rather the core mechanism of STANAG-4538 actively working to provide error-free data transfer despite the inherent unreliability of the HF radio channel.
It should also be considered that the used KiwiSDR receiver [1] is a sort of "man in the middle", i.e. it sees all the traffic on the channel, but this does not necessarily apply to the stations actually involved (not all data/ACK packets can be correctly received by the two peers).

So, it is likely that the N_tx parameter is set to a value greater than 20 and channel conditions are very poor... but we could also consider another cause besides  STANAG-4538 ARQ.
Perhaps we are observing retransmissions "forced" by the higher-layer protocol running over the STANAG-4538 data link (running at Layer 2), specifically TCP (Transmission Control Protocol, running at Layer 4). The limit of N_tx applies strictly to the LDL/ HDL protocol itself. A high retransmission count may occur because of the interaction between the two layers, particularly when running IP over HF:
 
- STANAG-4538 (Layer 2) action
the STANAG-4538 data link protocol tries to deliver a single packet  and if all attempts fail, it discards the packet and moves on to the next one. Result: The packet is considered lost by the Data Link Layer.
- TCP (Layer 4) action
TCP is the reliable transport protocol often used for applications like web browsing, email, or file transfer, and it runs over IP, when a packet is lost by the lower layer (Layer 2, STANAG-4538), the TCP sender never receives an acknowledgment for that packet, TCP's retransmission timer eventually expires, and it assumes the network failed to deliver the data (TCP is not HF-aware!), crucially, TCP retransmits the data independently of the Layer 2 protocol.
- the domino effect
STANAG-4538 transmits a packet several times, it fails and discards the packet. TCP times out. It retransmits the data. The new TCP segment is given to STANAG-4538 which treats it as a new packet to send. STANAG-4538 transmits this "new" packet up to N_tx times again. If it fails again, TCP times out a second time, and the process repeats until the packet is successfully acknowledged.

If the channel conditions are so poor that the STANAG-4538 link fails more than N_tx times in a row before TCP gives up, the total number of retransmissions for a single packet could easily assume a large value, as the higher layer continually resubmits data to the persistently failing lower layer.
Anyway, technically TCP cannot directly cause packet retransmission, only STANAG-4538 does that: TCP increases the likelihood of retransmissions, but it does not initiate them!

Summarizing: 
N_tx parameter is set to a large value (>20) to prioritize ultimate delivery reliability over speed and to support Hybrid-ARQ mechanism
or
N_tx parameter is set to a low value (often in the range of 3 to 10 or similar small number) and TCP times out and re-injects the data.

Definitely channel conditions were undoubtedly poor and severe.
 
Perhaps - in my opinion - it would have been better to use LDL packets size smaller than 512 bytes (4160 bits). Under poor channel conditions, a smaller packet size is generally better, even though the number of packets increases. Shorter transmissions (reduced airtime) reduce exposure to fades, have lower probability of interference and collisions, enable faster ARQ cycles and improve latency:
large packet → high chance of corruption → many retransmissions
small packet → lower chance of corruption → fewer retransmissions (even though more packets)
 
Fig. 5 - Packet Error Probability (PEP) vs. packet length (theoretical model)

The airtime, or duration, of STANAG-4538 LDLn BW3 bursts is variable and depends on the amount of data being sent, specified by the parameter n. The total burst duration is given by the formula: duration (ms) =373.33 + n×13.33, (n = 32,64,96,...,512).
The possible total airtimes for a single BW3 burst for the four standard BW3 burst sizes (n=64,128,256,512) are:

n ValuePayload DurationTotal Burst Duration
64853.12 ms1226.45 ms (≈1.23 seconds)
1281706.24 ms2079.57 ms (≈2.08 seconds)
2563412.48 ms3785.81 ms (≈3.79 seconds)
5126824.96 ms7198.29 ms (≈7.20 seconds)
 
Longer packets like transmit more data per burst, leading to higher throughput, but they are also significantly more vulnerable to errors (higher PEP) since require a much longer airtime (up to seconds).

[1] recording tanks to linkz KiwiSDR #3 (French Alps)
 
(1) STANAG-4538 xDL protocols:
HDL (High-throughput Data Link protocol) waveforms: BW1 for acknowledgement and traffic management, BW2 for traffic data
LDL (Low-latency Data Link protocol) waveforms: BW3 for traffic data, BW4 for acknowledgement and traffic management
HDL+ waveforms: BW6 for acknowledgement and traffic management, BW7 for traffic data

(2) Why the Limit is Necessary
HF Channel Volatility: HF radio conditions are highly variable due to ionospheric fading and noise. While the LDL protocol uses Code Combining (Hybrid-ARQ) to increase the likelihood of decoding a packet with each attempt, a limit is needed to account for extreme or prolonged channel outages.
Preventing Link Stall: Unlimited retransmissions would cause the entire data link to stall, continually trying to send a packet that may be unrecoverable due to a sudden deep fade or interference. By limiting the attempts, the protocol can decide to terminate the link and allow the higher layers (or the operator) to attempt a new link setup on a different, possibly better, frequency. 

 

No comments:

Post a Comment