Revert "Revert "trx_toolkit/transceiver.py: implement the transmit burst (details)
trxcon: Advance Uplink TDMA Fn by default again (details)
trx_toolkit/transceiver: Do not forward nor log from under tx_queue_lock (details)
trx_toolkit/transceiver: Do not scan tx_queue twice on tx path (details)
trx_toolkit/transceiver: Use with tx_queue_lock instead of manual (details)
trx_toolkit/*: Represent bursts as arrays instead of lists (details)
trx_toolkit/*: Try to avoid copying burst data where possible (details)
Commit
0f4714776a9c9b64c4a7268eb8a346f304835565
by Kirill Smelkov
Revert "Revert "trx_toolkit/transceiver.py: implement the transmit burst queue""
This reverts commit d4ed09df57b3461470af501e9687ddd80eb78838, reinstating tx queue into fake_trx.
It is ok to do so because, as explained in abc63d8d (trx_toolkit/clck_gen.py: Fix clock generator not to accumulate timing error), the reason for GSM clock jitter problem was timing error accumulation in CLCKgen, not problems with py threading.Event.
Note: this restores original tx queue implementation basically as-is with only resolve minor conflicts during the revert. The original tx queue implementation wastes CPU cycles though because it linearly scans the whole tx queue at every TDMA frame. If that CPU usage becomes a real problem it should be straightforward to fix by reworking tx queue to use priority queue instead of unordered array via heapq module from standard library. See https://docs.python.org/3/library/heapq.html for details.
The follow-up patches will make necessarry adjastments for tx-queue to function properly.
Commit
c80e193f6d95367e764684a6021ede981f44ebbd
by Kirill Smelkov
trxcon: Advance Uplink TDMA Fn by default again
This essentially reverts 923e9b0b (trxcon: do not advance Uplink TDMA Fn by default; I838b1ebc54e4c5d116f8af2155d97215a6133ba4) for the following reason:
In trxcon TRX clock is unused, because the signal from BTS is used as the master clock source instead (see 45c821ae/Ic8a5b6277c6b16392026e0557376257d71c9d230 "trxcon: get rid of the timer driven clock module" for details".
Before restoration of tx-queue in fake_trx this was working ok even with fn-advance=0 on Ms side, but after I41291708effdd2c767be680fff22ffbd9a56815e (Revert "Revert "trx_toolkit/transceiver.py: implement the transmit burst queue"") fake_trx is sending frames having Fn when exactly same Fn happens corresponding on fake_trx clock. This results in BTS frames (that are sent with fn-advance=2 by default (see I7da3d0948f38e12342fb714b29f8edc5e9d0933d in osmo-bts.git and OS#4487) to be queued, waited to be sent, and then actually sent to Ms on fn=msg.fn . And then even if Ms replies immediately with that same fn, that message will be dropped by fake_trx as stalled, because fake_trx thinks that the message is too late since that fn already happened according to fake_trx clock.
Here is a trace of how that looks like with 1 BTS and 1 MS(*):
10.009.987 MS <- fn=709 tn=0 # messages of BTS queued previously with that fn=709 are forwarded to Ms 10.010.696 MS <- fn=709 tn=1 10.010.904 MS -> fn=709 tn=0 # <-- MS sends UL message with that same fn=709 _before_ CLOCK fn=710 10.011.397 BTS -> fn=712 tn=0 10.011.507 MS <- fn=709 tn=2 10.011.770 MS <- fn=709 tn=3 10.011.968 MS <- fn=709 tn=4 10.012.156 MS <- fn=709 tn=5 10.012.342 MS <- fn=709 tn=6 10.012.527 MS <- fn=709 tn=7 10.012.914 BTS <- fn=709 tn=0 10.013.166 BTS -> fn=712 tn=1 10.013.524 MS -> fn=709 tn=1 # <-- MS sends UL message with that same fn=709 _before_ CLOCK fn=710 10.013.832 BTS -> fn=712 tn=2 10.013.949 MS -> fn=709 tn=2 # <-- MS sends UL message with that same fn=709 _before_ CLOCK fn=710 10.014.081 BTS -> fn=712 tn=3 10.014.177 MS -> fn=709 tn=3 # <-- MS sends UL message with that same fn=709 _before_ CLOCK fn=710 10.014.361 BTS -> fn=712 tn=4
10.014.475 CLOCK fn=710 # but most of those messages of MS with fn=709 are not picked up 10.014.713 MS -> fn=709 tn=4 # instantly and so become dropped as stale on CLOCK fn=710 10.014.815 MS <- fn=710 tn=0 10.015.032 BTS -> fn=712 tn=5 10.015.687 MS <- fn=710 tn=1 10.016.189 MS <- fn=710 tn=2 10.016.464 MS <- fn=710 tn=3 10.016.648 MS <- fn=710 tn=4 10.016.882 MS <- fn=710 tn=5 10.017.110 MS <- fn=710 tn=6 10.017.336 MS <- fn=710 tn=7 [WARNING] transceiver.py:321 (MS) Stale TRXD message (fn=710): fn=709 tn=1 pwr=0 [WARNING] transceiver.py:321 (MS) Stale TRXD message (fn=710): fn=709 tn=2 pwr=0 [WARNING] transceiver.py:321 (MS) Stale TRXD message (fn=710): fn=709 tn=3 pwr=0 [WARNING] transceiver.py:321 (MS) Stale TRXD message (fn=710): fn=709 tn=4 pwr=0
So without adding some fn-advance it is practically not possible for Ms to be on time with tx-queueing on TRX even if Ms sends its uplink frames right immediately after receiving downlink ones.
This way Ms fn-advance has to be 1 at the minimum, so that immediate UL replies can in principle arrive before fn+1 happens on fake_trx side, even for tn=7. And it is also better to increase fn-advance once more by another +1, to compensate for possible jitter due to OS scheduling latencies and similar things. This way default fn-advance=2 on Ms side becomes symmetric to default fn-advance on BTS side and Ms<->BTS exchange starts to work ok even with tx-queueing activated on fake_trx.
In theory it should be possible to reduce those fn-advances to 1 on both sides, but that will likely require to switch clock granularity from Fn to Tn increasing precision by an order of magnitude, which will likely also result in the need to make architectural change of moving trx to work inside BTS and MS instead of being separate service processes. That's a big task and I'm not delving into that here.
Note: Uplink Fn advance > 0 is needed for Ms when working with regular TRX'es as well. The reason is exactly the same as explained above. In 923e9b0b the reason for setting fn-advance=0 by default was that trxcon is usually being used with fake_trx, and that with fake_trx it is not needed. But after reenabling tx-queueing we have to revisit even fake_trx case again.
(*) the trace was captured with the help of the following debugging patch:
+ trace("%s\t<- fn=%d\ttn=%d" % (trx, rx_msg.fn, rx_msg.tn)) # Transform from TxMsg to RxMsg and forward tx_msg = rx_msg.trans(ver = trx.data_if._hdr_ver) trx.handle_data_msg(src_trx, rx_msg, tx_msg)
--- b/src/target/trx_toolkit/fake_trx.py +++ a/src/target/trx_toolkit/fake_trx.py @@ -29,7 +29,7 @@ import re
from app_common import ApplicationBase -from burst_fwd import BurstForwarder +from burst_fwd import BurstForwarder, trace from transceiver import Transceiver from data_msg import Modulation from clck_gen import CLCKGen @@ -473,6 +473,7 @@ def run(self):
# This method will be called by the clock thread def clck_handler(self, fn): + trace("CLOCK\tfn=%d" % fn) # We assume that this list is immutable at run-time for trx in self.trx_list.trx_list: trx.clck_tick(self.burst_fwd, fn)
--- b/src/target/trx_toolkit/transceiver.py +++ a/src/target/trx_toolkit/transceiver.py @@ -25,6 +25,7 @@ from data_if import DATAInterface from udp_link import UDPLink from trx_list import TRXList +from burst_fwd import trace
Commit
fc9044895d23393f0fb81843012b83221e6183b7
by Kirill Smelkov
trx_toolkit/transceiver: Do not forward nor log from under tx_queue_lock
Even though for 1 BTS + 1 MS fake_trx works ok with tx-queuing, when I try to run two ccch_scan's with 1 BTS fake_trx starts occupy ~ 100% of CPU and emits lots of "Stale ..." messages:
Inspecting a bit with a profiler showed that fake_trx simply cannot keep up with the load.
Let's try to fix this with optimizing things a bit where it is easy to notice and easy to pick up low-hanging fruits.
This is the first patch in that optimization series. It moves blocking calls from out of under tx_queue_lock on transmit path. The reason for this move is not to block receive path while the transmit path is busy more than necessary. I originally noticed tx_queue_lock.acquire being visible in profile of the rx thread which indicates that tx/rx contention on this lock can really happen if we do non-negligible tasks from under this lock. Here, in particular, it was forward_msg that was preparing and actually sending RxMsg to destination. tx_queue_lock is needed only to protect tx_queue itself and synchronize rx and tx threads access to it. Once necessary items are appended or popped, we can do everything else out of this lock.
-> Move everything on the tx codepath, not actually needing access to tx_queue out of this lock:
- only collect messages to be sent under the lock; actually forward them after releasing the log; - same for logging.
Commit
abfd60b3ee7b6763661f59fce76c1e45fb9c0012
by Kirill Smelkov
trx_toolkit/*: Represent bursts as arrays instead of lists
Continuing fake_trx profiling story I noticed that on rx path a noticeable time is spent in converting from ubits to sbits via list comprehensions. By changing burst representation from py list, which stores each item as full python object, to an array, which stores each item as just byte, and by leveraging bytearray.translate, we can speed up that conversion by ~ 10x:
Commit
06456f118d6fcd6d60a9e50df1d8f07b5fde2c8b
by Kirill Smelkov
trx_toolkit/*: Try to avoid copying burst data where possible
Conveying burst data is the primary flow in data place of what fake_trx does, so the less copies we do, the less we make CPU loaded.
After this change I can finally run 1 BTS + 2 Mobile + 1 ccch_scan without hitting "Stale message ..." on fake_trx side. However fake_trx cpu load is close to 100% and there are internal clock overruns often:
[WARNING] clck_gen.py:97 CLCKGen: time overrun by -1385us; resetting the clock [WARNING] clck_gen.py:97 CLCKGen: time overrun by -2657us; resetting the clock [WARNING] clck_gen.py:97 CLCKGen: time overrun by -1264us; resetting the clock [WARNING] clck_gen.py:97 CLCKGen: time overrun by -2913us; resetting the clock [WARNING] clck_gen.py:97 CLCKGen: time overrun by -1836us; resetting the clock ...
This suggests that even though fake_trx.py + tx-queue started to work somehow, the rewrite of fake_trx in C, as explained in OS#6672, is still better to do.