s1gw: fix TC_e_rab_setup: handle PFCP Session related PDUs
S1GW_Tests.TC_e_rab_setup is failing since we introduced the PFCP support to osmo-s1gw. The IUT now requires co-located UPF, which we need to emulate in the testsuite.
Move the inline shell commands from the Makefile to a separate script, so they are easier to edit and maintain. Proper syntax highlighting, no need for all the backslashes + &&, etc.
deps/update: don't fetch repos where COMMIT exists
Instead of unconditionally fetching each git repository, check if the commit we want to checkout already exists in the git repository. If that is the case, then don't fetch it.
Instead of having a silent fetch and commits printed to stdout without information about the repository they belong to, change the output to have one line per git action and to include the repository name in each of them.
Example output:
[titan.ProtocolEmulations.M3UA] Checking out b58f92046e48a7b1ed531e243a2319ebca53bf4c [titan.ProtocolModules.IP] Checking out 1be86705f39ae38f3c04b2109806ee20d25e91d0 [titan.ProtocolModules.GTP_v13.5.0] Checking out 6b769f985eb91bf5a4332f29faa4a043b23ce62e [titan.ProtocolModules.ICMP] Checking out e49d9fb9f7de637b4bf4803dc6b6e911a8661640 [osmo-uecups] Initial git clone [titan.ProtocolModules.DIAMETER_ProtocolModule_Generator] Checking out ffd939595a08da1b8c8176aaa1f8578bfe02a912 [titan.ProtocolModules.L2TP] Checking out 17e76d3662bd0bb815158e8a9de1ec413f21b530 [titan.ProtocolModules.ICMPv6] Checking out 46f4d9b6e1e3c794294a92588401a81e4881dd27 [titan.ProtocolModules.LLC_v7.1.0] Checking out 09817f113255d7fb56f1d45d3dd629a093d9248d [titan.ProtocolModules.M3UA] Checking out c496d298876fed55c2b730278b7ee77982555563 [titan.ProtocolModules.PFCP_v15.1.0] Checking out d550ad9ddb6f9c823c9a555254cd76cf0e738d18 [titan.ProtocolModules.MobileL3_v13.4.0] Checking out b6602eb357673f097ea1a1d22edd568ecd239da1 [titan.TestPorts.TELNETasp] Checking out 873fe539642542cd9a901c208f1ec11c6d2f5387 [titan.TestPorts.SIPmsg] Checking out 78bf0daf8c599d374089d97a054914d8439d133a [titan.TestPorts.UDPasp] Checking out 54176e95850654e5e8b0ffa2f1b5f35c412b949c [titan.ProtocolModules.BSSGP_v13.0.0] Checking out e97d92a8b66bec399babea52f593771b76cb175a [titan.ProtocolModules.BSSMAP] Checking out 4acb6ab5f058477f0b90c2da182d52054e3614b0 [osmo-uecups] Updating URL to https://gerrit.osmocom.org/osmo-uecups [osmo-uecups] Checking out 8362efef7c6fa341eb947a75786878e0685767b7
Running `make deps` for the first time fetches all the dependencies. Running `make deps` again currently results in unnecessary git-fetch and get-checkout operations for each dependency.
This is not as bad as cloning dependencies from scratch every time, but still takes time and triggers unnecessary requests to the servers. It's also creating problems when building testsuites offline.
This patch makes the build system a bit smarter in a way that it would only try to update dependencies if the 'deps/Makefile' was changed.
When running in podman, the source files from the testsuite get copied to a temporary directory to build the testsuites out-of-tree (avoiding conflicts with possibly incompatible binary objects that may exist from previously building the testsuites on the host).
This also copies additional scripts for preparation / clean up that may be used in testenv.cfg. Use the --archive flag with rsync to ensure that the executability is the same. I could have also used --executability, but --archive contains two other existing flags and more flags which may help us from running into unexpected situations such as this one.
Without this patch, there was a bug when: * first creating a shell script but not making it executable * running testenv with podman (where rsync runs and creates the file initially without executable permissions) * making the script executable * running testenv with podman again, rsync will not adjust the permissions for the copy of the file * user wonders why there is a "sh: 1: script.sh: Permission denied" error
Testenv may try to run a comand in podman after the container was stopped, if there is a bug in the shutdown logic. Give a meaningful error in that case, instead of failing later on with a cryptic error in subprocess.run() because None was passed inside cmd (for the container name) instead of a string.
In preparation for adding the inital testenv.cfgs for ggsn, allow copying full directories with copy= too. This will make the ggsn testenv.cfg files easier to maintain.
Add a file in the root dir of the repository to allow running "ruff format" in order to auto-format the code with expected max line length, PEP-8, etc.
Replace _testenv/pyproject.toml with .ruff.toml in the root directory of the repository, so we can exclude "compare-results.py" which doesn't follow that code style. Otherwise it would get formatted too when running "ruff format" in the root dir of the repository.
* Support using wildcards for the config names via fnmatch as that makes it much easier to run the ggsn tests against all osmo-ggsn config variations, and update the examples in "testenv.py -h" to illustrate this. * Fix that it didn't complain about an invalid --config argument, as long as there was a valid --config argument before it. * Let raise_error_config_arg only output the invalid --config argument instead of all of them. * Complain if "--config all" is used in combination with another --config argument. * Sort testenv*.cfg files found alphabetically, so they are always executed in the same order.
I had moved osmo-ggsn related files to the osmo-ggsn directory and forgot to adjust testenv_osmo_ggsn_{v4,v6,v4v6}_only.cfg. Fix it by changing it like it is in testenv_osmo_ggsn_all.cfg
Add libosmocore-utils, so osmo-config-merge is installed when running with --binary-repo too. The osmo-config-merge program is used in osmo-ggsn/testenv.sh to merge the configs.
library: add generic Mutex API for parallel components
In certain scenarios, it's required to ensure that only one of multiple parallel components executes a specific code block at any given time.
This, for example, is the case for the S1GW testsuite, where we want to simulate multiple eNBs establishing E-RABs. Each new E-RAB triggers the IUT (osmo-s1gw) to send a PFCP Session Establishment Request, and there is no way for the PFCPEM to correlate which session belongs to which eNB. This problem can be solved by ensuring that only one eNB is triggering the PFCP Session Establishment Request(s) at a time.
This patch implements a generic Mutex API, which can also be used by other testsuites that orchestrate multiple parallel components.
library/PFCP_Emulation: a better PDU routing concept
In recently merged 2962d170 I wrongly assumed, that SEID of outgoing PFCP PDUs can be used to correlate and route the incoming PDUs. In fact, the PFCP peers use two different SEID values, negotiating them using the F-SEID IE.
We could have implemented a logic to look for F-SEID in the outgoing PDUs, store and then use it for routing. However, a more flexible approach is to allow the the PFCP_ConnHdlr components to subscribe and unsubscribe to/from specific SEID values explicitly.
In this spirit, let's allow the PFCP_ConnHdlr components to subscribe and unsubscribe to/from broadcast PDUs (i.e. those, for which the PFCPEM component could not find a single recipient) explicitly.
Implicit routing using the SeqNr remains unchanged and will be performed by the PFCPEM component automatically like before.
Change-Id: I25802471519fa297ad4cb2b056adaa6748b00af2 Related: 2962d170 "library/PFCP_Emulation: fix routing of incoming PDUs"
library: as_pfcp_ignore(): log SeqNr of received PDUs
Printing the PFCP PDU template ('?' by default) is not very informative when reading logs. Printing the message type of the received PDU is not informative either, because message types are defined as numbers in PFCP_Types.ttcn. Printing the whole PDU is way too verbose, and would be redundant given that the PFCPEM component already does print all received PDUs. Let's print the sequence number.
Revert "s1gw: cache PFCP Recovery Timestamp in ConnHdlr"
This reverts commit 7ad95e1cfb00d269069bd052c44a9cae9027f763.
A follow-up commit will remove the need for each ConnHdlr to call f_ConnHdlr_register_pfcp(), that among with handling the PFCP association retrieves a PFCP Recovery Timestamp from the PFCPEM.
Caching the PFCP Recovery Timestamp value is not really worth it, since it's rarely used and can always be retrieved on demand.
s1gw: move PFCP association handling into a dedicated ConnHdlr
Previously, the PFCP association request from the IUT was handled by the first ConnHdlr component (idx := 0). While this approach has worked, it fails when multiple ConnHdlr instances (idx > 0) are spawned.
The problem arises when other ConnHdlr (idx > 0) instances initiate PFCP procedures before the first ConnHdlr (idx := 0) has established the association, so we end up playing races.
This patch introduces a dedicated ConnHdlr component to handle the PFCP association independently. Once the association is established, the actual test ConnHdlr instances are spawned, ensuring a more reliable and orderly process.
The idea is to simulate multiple eNBs establishing one or more E-RAB(s) simultaneously. In order to achieve that, use the new Mutex API to ensure that only one ConnHdlr component is triggering PFCP session establishment at any given time.
The problem is that there is no way for the PFCPEM component to correlate which PFCP session belongs to which eNB when multiple ConnHdlr instances establish E-RAB(s) in parallel. This can be solved by making a part of the test scenario synchronous.
When starting podman, set the following sysctls to avoid ICMP redirects. ICMP redirects lead to test failures (TC_pdp4_clients_interact in the GGSN testsuite), and should not be sent in the test environment in general.
It is really needed to set both "all" and "default", or otherwise ICMP redirects still show up. I've seen setting both in this patch: https://patchwork.kernel.org/project/linux-kselftest/patch/1570719055-25110-4-git-send-email-yanhaishuang@cmss.chinamobile.com/
Fix that testenv complains about a missing setcap program, if it is in /usr/sbin/setcap and /usr/sbin is not in PATH as it is the case with Debian. We actually run setcap with sudo when it is needed, and in that case /usr/sbin gets added to PATH in Debian.
Remove mongodb-org.list at the end of building the podman image, as we only need to install mongodb once in the container but won't use the repository afterwards. This avoids checking the mongodb repository in "apt update".
Install erlang-nox and use the pre-built rebar3 as linked from rebar3.org, instead of using the Debian package to avoid pulling in ~600 MB of GUI dependencies.
* Print log levels. * Don't print categories as hex. * Print the basename at the end of the line. * Remove "logging level lgtp debug", there already is "logging level lgtp info" further above, and this is a more sensible setting. With "debug" there are way too many log messages in e.g. TC_lots_of_concurrent_pdp_ctx.
Replace the dummy netdev that was used as network device reachable through the GTP tunnel that can answer ICMP, with a bridge device. The bridge device fulfils the same purpose, plus it can be used in a future patch to connect osmo-ggsn when it is running in QEMU with the testsuite.
During code review it was decided that we want to keep the 127.0.0.1 (and other 127.0.0.x) IPs in the configs, so one can start the testsuite with osmo-ggsn directly on the host without using testenv scripts too, with the same config.
The testenv script for osmo-ggsn will replace 127.0.0.x with 172.18.3.x on the fly before the testsuite starts, so we can run osmo-ggsn optionally in QEMU on 172.18.3.2, which will be bridged to the host.
172.18.3.1 will be used by the GGSN testsuite now, instead of 172.18.3.201 as previously planned, so change the default IP of the bridge. The bridge is not used for another testsuite yet.
Add the 201 IPs as EXTRA_IPS for the non-QEMU case, as they are configured as DNS IPs and tests need to be able to reach them.
Replace IPs in testenv.sh so the SUT runs on 172.18.3.2 (testenv0 bridge) instead of 127.0.0.2 (lo). Later on when we can optionally use QEMU with osmo-ggsn to test kernel GTP-U. It will then run on this IP as well. So with this change we can use the same IP for both the QEMU and non-QEMU case.
Access the VTY of osmo-ggsn via 172.18.3.2 (127.0.0.2 if running without testenv), so the testsuite can access the VTY when osmo-ggsn optionally runs in QEMU too (through the bridge).
Add two new arguments -C|--custom-kernel and -D|--debian-kernel. If any of these is set, pass an environment variable TESTENV_QEMU_KERNEL with the path to the kernel when running commands from testenv.cfg.
These commands can then source the new qemu_functions.sh and use it to build an initramfs with the SUT and depending libraries on the fly, and start up QEMU to boot right to starting the SUT. All of that takes about ~1s on my system with kvm. Without kvm ~5s.
A follow-up patch will adjust the ggsn testenv configs to optionally run osmo-ggsn in QEMU for testing kernel GTP-U.
These scripts are based on scripts/kernel-tests from docker-playground.
library/GTPv1U_Templates: support sending ext hdrs
Replace the seq (sequenceNumber) parameter in ts_GTP1U_PDU with opt_part (GTPU_Header_optional_part). opt_part contains seq:
type record GTPU_Header_optional_part { OCT2 sequenceNumber, OCT1 npduNumber, OCT1 nextExtHeader, GTPU_ExtensionHeader_List gTPU_extensionHeader_List optional }
With this change it is possible to set the extension headers too when sending GTPU packets. This is in preparation for a GGSN test case with extension headers.
When running testsuites with multiple configurations in a row, as it is the case with the ttcn3-ggsn jobs in jenkins, the podman container gets restarted whenever switching to the next config.
Use a different name for each container by appending a restart count. This should fix that podman sometimes didn't fully shutdown the container yet and complains that the container name is already in use. This happens even though we use "podman kill" and "podman wait" on the previous container. When checking later, the container is really gone and the same name can be used, it seems that it just needs some more time to shutdown in some cases.
Fix for: > Error: error creating container storage: the container name > "testenv-ggsn_tests-osmo_ggsn_-osmocom-nightly-20241012-0752-2eb85125" is > already in use by "8b7ea42371a922ffbf4e966b853124b98cd25c9905ae443fefb4115a103d7779". > You have to remove that container to be able to reuse that name.: that name is already in use
It is possible to run the GGSN testsuite in a lot of ways (as it was ported from docker-playground, which had the same variations but with a less consistent way of running them).
Document how it is typically run for development / in jenkins. This should make it easier for users, in addition to testenv already telling which configs are available if trying to run the ggsn testsuite without the -c argument, and to the general help output in "./testenv.py run -h".
$ ./testenv.py run ggsn [testenv] Using testsuite ggsn_tests (via alias ggsn) [testenv] Found multiple testenv.cfg files: [testenv] * testenv_open5gs.cfg [testenv] * testenv_osmo_ggsn_all.cfg [testenv] * testenv_osmo_ggsn_v4_only.cfg [testenv] * testenv_osmo_ggsn_v4v6_only.cfg [testenv] * testenv_osmo_ggsn_v6_only.cfg [testenv] Select a specific config (e.g. '-c open5gs') or all ('-c all')
cosmetic: library/GTPv1C_Templates: remove extra indentation level
The extra first indentation level around 99% of the file just loses space which makes it difficult to keep templates at an acceptable width. Do similarly to what we already do in lots of other template files which were added later than this one.
We already have an IPCP_Types.ttcn, and the GTPv1C_Types from ProtocolModules dep we use doesn't actually specify any record for IPCP, so those are totally protocol independent.
That enum is PAP related, plus it doesn't really match the section describing, plus it's not used anywhere. Looks like a development artifact which was not dropped during submit.
This tests so far only test retrieval of MTU over PCO, which is only used for IPv4 APNs. When IPv6 is in used, it is expected to be retrieved over IPv6 SLAAC RA. Such tests will be done in a follow-up patch once the related procedure is implemented in osmo-ggsn.
- instead of "-1", print "not present", so humans know what is happening. - the comma separated args in setverdict() create a lot of weird quotes. Use string concatenation to have only one set of quotes around the entire error message.
Related: OS#6545 Tweaked-by: Oliver Smith <osmith@sysmocom.de> Change-Id: I672fcef819a6542a5b3bcfa0a6d9c84d34b468f3
If the SUT crashes inside QEMU, copy the coredump via 9p to the outer system (either host or podman) where we have the same binaries and also debug symbols, and run gdb there to show the backtrace.
When a testsuite has multiple testenv.cfg files, the user needs to explicitly choose a config, or "-c all" for all configs. Improve the help output to directly print the arguments that need to be passed, instead of printing the config file names. Mention that wildcards can be used too.
Old: [testenv] Found multiple testenv.cfg files: [testenv] * testenv_generic.cfg [testenv] * testenv_sccplite.cfg [testenv] * testenv_vamos.cfg [testenv] Select a specific config (e.g. '-c generic') or all ('-c all')
New: [testenv] Found multiple testenv.cfg files, use one of: [testenv] -c generic [testenv] -c sccplite [testenv] -c vamos [testenv] You can also select all of them (-c all) or use the * character as wildcard.
When gen_makefile.py from osmo-dev fails, it is likely that the osmo-dev git clone is outdated, for example if a new file with configure options was added to osmo-dev.git and is now being used by testenv. Display a hint for pulling this repository to the user.
testenv: remove dead code for [testsuite] prepare=
Remove some WIP code that I intended to use for enabling the mongodb repository before installing binary packages, to be able to dynamically install mongodb from there. I solved it differently by just always having mongodb in the podman image.
This was a dead code path because configs with prepare= in [testsuite] are currently not valid, and therefore testenv refuses to use these configs (see keys_valid_testsuite in testenv_cfg.py's verify()).
I have a different use case for running prepare= before running the testsuite, to replace a module parameter in the testsuite's config. This will be done in the next patch.
Move the execution time of prepare and clean scripts in testdir.prepare() after the testsuite config has been copied to the testdir, so it can be modified by the prepare script.
RAN_Emulation: Introduce field ranap_connect_ind_auto_res
This field allows skipping automatic response of the connect_ind, hence allowing ConnHdlr to totally skip it, delay it, or generate a CREF by means of sending RAN_Conn_Prim:MSC_CONN_PRIM_DISC_REQ to RAN_Emulation, as per ITU Q.711 Figure 8.
RAN_Emulation: Allow setting reason in primitive MSC_CONN_PRIM_DISC_REQ
This allows setting a specific reason in the CREF transmitted on the wire, other than "End user originted (0)". A follow-up commit will add a test in HNBGW_Tests where an emulated MSC answers with CREF reason "End user failure (0x02)".
RAN_Emulation: Support registering IuSigConId for connectionless RANAP messages
This allows dispatching received RANAP connectionless (UNITDATA) messages which target potentially existing connections identified by IuSigConId, like RANAP Reset Resource (Ack) messages. Dispatching it to relevant ConnHdlrs allows explicitly waiting to receive the message and answer from there.
Store pars into component field "g_pars" before caling void_fn. This simplifies ConnHdlr test functions and also avoids potential problems modifying pars vs g_pars. This is the same as we do in lots of other testsuites.
In very rare cases it seems podman is just crashing with no reason in jenkins. Add logging to the main script we run inside podman, and run podman with a logfile attached to figure out why.
3GPP TS 36.413, section 9.1.3.2 "E-RAB SETUP RESPONSE" defines the following two IEs as optional:
* E-RAB Setup List IE: 0..1 in the Range column means that it can either be omitted (0) or included only once (1); * E-RAB Failed to Setup List: 'O' in the Presence column.
Our templates for this S1AP PDU require the former to be a value (as if it was mandatory) and do not allow passing the later.
hnbgw: Send meaningful RANAP messages in f_tc_ranap_mo_disconnect()
Fix the code to send the messages that were fore sure intended, where an MO disconnect is triggered. This allows attaching more to reality, plus making it easier to follow up the test in wireshark and in the code.
Terminate the watchdog process before killing the podman container. This avoids bogus errors from a race condition where the container gets killed first, and then the watchdog process tries to feed the watchdog and fails:
[testenv] Stopping podman container [testenv] + ['podman', 'kill', 'testenv-hnbgw-all-osmocom-latest-20241031-1222-f34534a5-1'] e41700779a8ca5daf18ac5daa27d59a84d8442196e352f2756a19baf0592cf89 Error: no container with name or ID testenv-hnbgw-all-osmocom-latest-20241031-1222-f34534a5-1 found: no such container [testenv] podman container crashed!
While at it, use "check=False" with the "podman kill" command, so we avoid additional error messages if the container was already killed at that point (could happen through a bug). If we fail to kill it here, it is not a problem because the watchdog will ensure it terminates shortly after the watchdog process was stopped.
Two tests are failing if the MTU is 65536 instead of 1500. This is an upstream bug in titan.TestPorts.SCTPasp: https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.SCTPasp/-/issues/3
Add a workaround so the behavior of the test environment is the same as with docker-playground and the tests can pass again.
* When the testsuite stopped and using podman, check if it stopped because the container crashed and raise an exception.
* Even after 9eb5e696 ("testenv: make podman stop more robust") it happens sometimes in jenkins that the container gets stopped on purpose because the testsuite is done, but then the watchdog process prints an error saying it crashed (without actually stopping testenv at this point). Change the message to a debug message that just says it stopped, this should not be an error.
Increase the timer from 10s to 60s, as with 10s I see jobs failing with:
ERROR: /tmp/watchdog was not created, exiting
In theory 10s should already be enough, my guess is that if a jenkins node is currently under a lot of load then the feed command may take several seconds and so we hit the previous timer. Even if this is not the cause, I think it is good to rule it out.
Exiting after 60s if the jenkins job was (manually / with connection loss) aborted is still relatively quick.
statsd: Support f_statsd_snapshot() API when using VTY-triggered stats report
Until now that API was only used in testsuites which relied on time-trigerred reports. This commit also supports getting a given snapshots if the IUT is configured to only trigger a report based on VTY request.
Show the testsuite results from junit-*.log not after each testenv*.cfg file is through, but for all of them at the end. This way the results are in one place when running with multiple configs, we don't need to scroll to the middle of the huge log to find out what tests passed with a previous config.
Adjust the podman container stop and restart logic, so the last container is still running until we use it for showing the results.
* Don't do "podman wait" when restarting the container. The idea was to really wait before the current container was shutdown before restarting one with the same name. But even with the wait we got "the container name ... is already in use" errors and so we use different names when restarting the container since 6fe837de ("testenv: podman: restart_count in container_name"). This means "podman wait" is not needed anymore.
* feed_watchdog_loop: change sleep from 5s to 2s, as we wait up to that long after the container was shutdown, before testenv stops. 5s is quite noticable compared to 2s when running the script locally.
* feed_watchdog_loop: hide stderr of "podman exec" since it will print "Error: container ... does not exist in database: no such container" during shutdown. This is expected as we stop the container, but it looks like an actual error. We already have a more userfriendly message "feed_watchdog_loop: podman container has stopped" that will appear when the "podman exec" fails during shutdown.
Prepare for follow-up patches reworking SS related GSUP templates. Avoid passing "omit" for parameter 'ss' of the receive templates because this will no longer work as expected. Clean up code flow.
Seen while running lots of components concurrently: "RUA_Emulation.ttcn:315 Dynamic test case error: Port CLIENT has more than one active connections. Message can be sent on it only with explicit addressing."
deps: Update titan.ProtocolEmulations.SCCP to upstream master
Until now we were using our own fork with an extra patch with a fix for SCCP conn id 0. This patch, together with other patches was merged upstream today. Hence, update our dependency to point to current upstream master.
Now that jenkins uses the osmo-*.cfg files from osmo-ttcn3-hacks for the testsuites that were ported to testenv, make sure that these configs enable logging to gsmtap log as it was the case in docker-playground. This gives useful additional context in the pcap files.
bts: use proper ActType in f_TC_data_rtp_loopback()
For the sake of correctness, use c_RSL_IE_ActType_ASS (assignment) when activating TCH/[FH] channels in TC_data_rtp_*. This is the kind of ActType that would normally be used by the BSC.
bsc: Fix sporadic failure in .TC_ho_in_fail_ipa_crcx_timeout
The code path was not waiting to receive DLCX if parameter ignore_mgw_mdcx was set to false. It should wait for DLCX in any case.
Since it didn't wait, te ConnHdlr would finish earlier than expected and MGCP_Emulation would fail when forwarding the DLCX to ConnHdlr: """ MGCP_Emulation.ttcn:257 Dynamic test case error: Sending data on the connection of port MGCP_CLIENT to 2023:MGCP failed. (Broken pipe) """
Recent commit 51490419 uncovered a problem of passing 'dom := *' to tr_GSUP_CL_REQ, which calls f_gen_tr_ies(), which in its turn does not properly handle the '*' template kind:
''' Dynamic test case error: Restriction `present' on template of type @GSUP_Types.GSUP_CnDomain violated. '''
The old code was basically equivalent of passing 'dom := ?', i.e. expecting the OSMO_GSUP_CN_DOMAIN_IE to be present.
Work the problem around by having two alternatives:
Add an argument to run a specific test (if using --test) or a whole testsuite until it fails with "failure" or "error". This helped me in reproducing a race condition in the mgw testsuite (related issue).
I've added 'ruff check' to my pre-commit script. Make it pass initially, so it can detect future bugs. The missing f-string is a bug that causes ggsn testsuites with a custom kernel path to not work.
When using --binary-repo, figure out the -dbg and -dbgsym packages for all dependencies of packages to be installed, and install them as well.
This will make debug symbols available in jenkins, useful for the related issue. Before this patch debug symbols were only available when building locally without --binary-repo.
Pass TESTENV_BINARY_REPO=1 to the podman container if the --binary-repo argument is set. This will be used for the BTS testsuite to figure out from where we need to run fake-trx.
When a program fails to start up, look for the coredump and print a backtrace if it was found (instead of only doing it if a program crashes later on).
testenv: build virtphy from src with --binary-repo
Add logic to build virtphy from source when running with --binary-repo. This extra code path is needed because we currently don't have virtphy packaged (like trxcon and sccp_demo_user), and we need to build the libosmocore binary package instead of building completely from source as we would do it with osmo-dev.
Use ".split(" ", 1)[0]" on the program= value to only look at its first word, so we can later on use it in testenv.cfg file as follows:
The MS Radio Capabilities must include A5 bits to inform the network of supported encryption capabilities. The a5bits of the first access network must be present, later ones can be omited, meaning the ones of the first one also applies.
The module titan.ProtocolModules.RTP received a fix that avoids crashing (segfault) on the reception of short RTP packets. Let's make sure that this fix is included in our builds as well.
bsc: Fix missing teardown in TC_ho_in_fail_msc_clears_after_ho_detect
Missing handling of teardown messages made the test fail sporadically due to ttcn3 side already closing the SCCP connection when it was still expected to receive messages.
msc: fix a race condition in f_mt_call_establish()
It can happen that the MSC sends a paging request quicker than function f_ran_register_imsi() returns (e.g. when a node executing the testsuite is under significant load). In this case the BSSMAP PAGING message is dropped by the RAN_Emulation CT:
CommonBssmapUnitdataCallback: IMSI/TMSI not found in table
This can be avoided by calling f_ran_register_imsi() *before* sending the MNCC SETUP.req, which is triggering paging.
bsc: osmo-bsc.cfg: Listen IPA Abis and CTRL interfaces on 127.0.0.1
CTRL seems to bind to 127.0.0.1 by default, but IPA Abis listents to 0.0.0.0 by default, which is not needed and may create problems with concurrent instances.
SGSN: BSSGP_ConnHdlr: GMM Service Request: handle PMM IDLE UE correct
24.008: 4.7.13.3: explicitly mention the completion of the low layer security to be an implicit Security Command Accept if the UE is in PMM Idle. Extend the as_service_request() to handle both cases when UE is in PMM-IDLE as well in PMM-CONNECTED.
SGSN: BSSGP_ConnHdlr: GMM Service Request: add support to expect authentication
On a Service Request, the authentication is optional. Either an authentication must happen or the key material from the previous authentication has to be used. The default behavior is still the same.
SGSN: TC_attach_auth_id_timeout: set TMSI to provoke a ID Request
This test case simulates a MS which ignores Identity Requests. To ensure the SGSN will ask for the IMSI, do an Attach Request with id TMSI. Later this ID Request (type IMSI) will be ignored and the test case expect a Attach Reject.
SGSN: f_TC_cell_change_different_*: always expect the auth
The new SGSN will always ask for authentication when receiving Attach or RA which is the correct behaviour as long the LLC layer doesn't indicate integrity or encryption protection.
Only triplets which hasn't been used should be included. The MME will only request and sent back a single set of tuples. There shouldn't any left overs.
When using --binary-packages, use the osmocom-bb-trxcon etc. binary packages that are now available, instead of only installing the dependencies as binary packages and building these components from source.
PCU_Tests_SNS: SNS Add: handle NS_Alive while waiting for SNS_Ack
Similiar to 61ccea9ecadc ("PCU_Tests_SNS: del/change weight: don't fail on NS"), the SNS Add procedure might have to handle a NS Alive PDU in the receiving queue while waiting for a SNS ACK.
Currently copy= in testenv.cfg creates copies of files under the full source file name under the target directory: copy=dir/file.cfg creates dir/file.cfg (like "cp -a --parents"). This is not very intuitive, change it to create a "file.cfg" instead without subdirectory. With this change, it behaves the same as "cp -a".
"server" testsuite is working as goos as in docker-playground.git "bankd" testsuite is currently failing due to bankd exiting early after starting. "client" is not currently working/running in docker-playground, initial config copied here for completeness.
This commit hence allows already quickly running "server" testsuite.
When we receive the PCU_VERSION using tr_PCUIF_TXT_IND we must ignore the included BTS number because the PCU_VERSION is not addressed to a specific BTS. When we send a PCU_VERSION using ts_PCUIF_TXT_IND, we should always use the bts number 0 to be consistent (the BSC/BTS will ignore this number anyway).
Let's fix the usage of tr_PCUIF_TXT_IND and put comments, to make clear why the above applies.
This is needed for test RemsimBankd_Tests.TC_createMapping_exchangeTPDU to work. Add require_vsmartcard_vpcd.sh to give a meaningful error message when running without --podman, if the user doesn't have it installed.
Co-authored-by: Oliver Smith <osmith@sysmocom.de> Change-Id: Ib5ba5075eff4955354fa25d1c605f277e8a6962a
sccp: Let some time for SCCP RLC to reach IUT before finishing test
Otherwise tear down of the test component immediatelly afterwards creates a race condition where sometimes the RLC message is not sent before closing the socket. As a result, the SCCP-SCOC stays in DISCONN_PEND until T(rel) fires a while afterwards, generating a new RLSD in a follow-up test, disrupting expectancies of that unrelated test.
Since sccp_demo_user doesn't implement a Layer Manager, the recv() 0 from the socket doesn't automatically tear down the SCCP conn, since it could have several ASPs: """ 0: asp-asp-srv-client: ss7_asp_xua_srv_conn_rx_cb(): sctp_recvmsg() returned 0 (flags=0x80) ... asp-srv-client: connection closed XUA_ASP(asp-srv-client){ASP_ACTIVE}: Received Event SCTP-COMM_DOWN.ind XUA_ASP(asp-srv-client){ASP_ACTIVE}: state_chg to ASP_DOWN XUA_AS(as-srv-client){AS_ACTIVE}: Received Event ASPAS-ASP_DOWN.ind XUA_AS(as-srv-client){AS_ACTIVE}: state_chg to AS_PENDING Delivering N-PCSTATE.indication to SCCP User 'SCCP Management' Ignoring SCCP user primitive N-PCSTATE.indication [Here same 2 lines for SCCP User 'refuser', 'echo', 'callback', 'test_client_vty'] XUA_ASP(asp-srv-client){ASP_DOWN}: No Layer Manager, dropping M-ASP_DOWN.indication XUA_ASP(asp-srv-client){ASP_DOWN}: No Layer Manager, dropping M-SCTP_RELEASE.indication """
This way we have all ports in more or less the same state when handling messages. It should also speed up tests and mitigate sporadic failures under some scenarios where we already accept the SCTP conn instead of rejecting it and waiting for reconnect from client.
Add a new argument that avoids the problem that ./configure refuses to run if it has already been executed in the source dir. (It aborts and asks the user to run "make distclean" first, which is especially annoying if it has to be done in multiple source dirs before being able to build successfully.)
Put the new logic behind an "experimental argument" for now. I think it improves usability greatly and plan to make it the default later when it has been well tested.
I have also considered making the source dir read-only when mounted into podman and this argument is set. This was the original goal of Lynxis' related patch, on which idea this one is based. But osmo-dev still needs to write into the source dir in case it clones a new repository, so making the sources dir read-only with this trade-off should be a separate flag and could be added in another patch later.
MGCP_Emulation: Make sure peer is running before Tx
This avoid DTE with "Broken pipe" if messages are being transmitted while tear down process has already started, even if components are created as "alive".
HNBGW_Tests.TC_hnb_disconnected_timeout needs modification since it expects the component to drop the underlaying conn towards the IUT when the component is stopped. This is not longer the case when the component is created as "alive". In order to make sure its resources are destroyed, one needs to kill it.
The test TC_dl_cs1_to_cs4 failed sporadically in ttcn3-pcu-test-asan. Due to how the DL data arriving at Gb is split in chunks over RLC/MAC (also based on how CS changes over time), it may happen that the full PDU content doesn't finish at the exact block number where PCU expects the DL ACK/NACK. As a result, since PCU delays finishing the DL TBF and some data for that DL TBF has already not been ACKed (and since there's no more active DL TBFs), it will decide to retransmit some of the RLC/MAC blocks which haven't been yet ACKed, instead of transmitting nothing. This is an optimization to increase the probabilities the MS has received all the data. We need to account for this possibility in f_dl_data_exp_cs(), used in the mentioned test. In there, it needs to be checked whether the received DL data block is a retransmission, and use that knowledge to resolve that all data has been transmitted and hence the final condition can be checked.
pcu: Fix dummy DL block received due to timer race conditions
Timer X2002, which manages delay at PCU between sending DL TBF Ass over CCCH and start transmitting for it over PDCH, is clock-time based. As a result, timer at PCU process and ttcn3 process may time out slightly differently. Hence, it can happen that we request a DL block immediatelly *before* the timer triggers at the PCU. In that scenario, PCU transmits a dummy block instead of a data block. Account for this race condition in several tests; some tests already used this formula.
s1gw: f_ConnHdlr_session_delete(): respect any order
It's not guranteed (nor required) that PFCP Session Deletion Request PDUs are sent in the same order as their respective ERab records are organized in the given ERabList. They can be emitted in any order.
Make f_ConnHdlr_session_delete() more flexible:
* Expect to receive N PFCP Session Deletion Request PDUs; * For each received PFCP PDU, find the matching E-RAB; * Make sure that an E-RAB is never released twice; * Send PFCP Session Deletion Response.
% Deprecated 'sdp audio-payload number <0-255>' config no longer has any effect % Deprecated 'sdp audio-payload name NAME' config no longer has any effect % Deprecated 'loop (0|1)' config no longer has any effect % Deprecated 'allow-transcoding' config no longer has any effect % Deprecated 'loop (0|1)' config no longer has any effect % Deprecated 'allow-transcoding' config no longer has any effect
Fix expected behavior of STP according to specs (RFC 4666 4.3.4.5), after osmo-stp got several related fixes in libosmo-sigtran.git Change-Id I85948ab98623a8a53521eb2d2e84244011b39a93 and Change-Id I3dffa2e9c554f03c7c721b757ff33a89961665b5.
These tests allows testing behavior of scenarios related to dynamic ASP/AS/RKM improved/fixed in libosmo-sigtran.git Change-Ids: I986044944282cea9a13ed59424f2220fee6fe567 I85948ab98623a8a53521eb2d2e84244011b39a93 I3dffa2e9c554f03c7c721b757ff33a89961665b5
stp: Fix expectancies of TC_clnt_quirk_snm_inactive
The test STP_Tests_M3UA.TC_clnt_quirk_snm_inactive validates the snm_inactive quirk by sending a DAUD before the link being activated, and expecting a DAVA to make sure osmo-stp did indeed process the SNM message. However, osmo-stp used to lack proper route validation based on link state, which means it would incorrectly assumed the link for the affected PC (55) in the test was active and hence would answer with a DAVA. After libosmo-sigtran.git Change-Id I928fb1ef5db6922f1386a188e3fbf9e70780f25d this wrong behavior is fixed, and hence osmo-stp starts answering with a DUNA instead of a DAVA, since AS "as-client" has not yet been activated during the test. Fix the test expectancies by expecting a DUNA instead of a DAVA.
stp: Fix brokeness in STP_Tests_M3UA.TC_tmt_loadshare
The test was not even setting the traffic-mode in the IUT. Furthermore, it was expecting pure round-robin behavior, which was the older behavior of osmo-stp when loadshare traffic-mode was selected.
Actually split the test into 2, naming them properly (since round robin is not a AS traffic mode in itself, but a possible implementation of the loadshare traffic-mode.
The new test validates the usual loadshare traffic-mode based on SLS distribution.
stp: Fix brokeness in STP_Tests_IPA.TC_tmt_loadshare
Similar to previous commit for M3UA, this time for IPA. Since in IPA so far the SLS is fixed per ASP, we need to add an extra sender ASP which will get a new asp_id (and hence SLS) so that we can also test traffic being sent/distributed to the 2nd receiver.
es9p_Types_JSON: split headers into separate module
The headers used in the JSON binding of ES9+ are also used in ES2+, ES11 and ES12. Let's split the headers into a separate module, so that we can re-use them in other definitions too.
The host that is requested via the HTTP_Adapter is configured once on initialization. This is fine if the test scenario only has exactly one destination to query. For multiple destinations, this model does not work. Let's add an http_pars parameter to the request functions, so that the user can direct the requests to different hosts dynamically.
Pass --autoreconf-in-src-copy to osmo-dev's gen_makefile.py by default, so we can always avoid errors related to:
* running "./configure" in-tree and out-of-tree (results in "configure: error: source directory already configured; run "make distclean" there first") * running "./configure" / "autoreconf" with different autotools versions (on host system and in podman container)
I've kept is as experimental flag at first for better testing, but make it the default now as it seems to work reliably.
The old make dir is cleaned up when the user runs "./testenv.py clean" the next time.
This patch doesn't contain an update hash because it was merged as fast-forward so the commit hash now in master HEAD did not change from the one in our repo fork branch.
The SLS is the same for all messages in conn being sent in one direction, but doesn't need to be the same value on both directions. Since the SLS value on the other direction is not selected by the test itself, we cannot expect a given specifi value. Update the test expectancies.
This started to fail since recently libosmo-sigtran started properly setting SLS values, eg libosmo-sigtran.git 7781eb275da41a9b6b1ea5d8b0e802e87a8e9d53 and 0061e8d0bcba3b0ed5ea255588619627d0975380.
SCCP_Templates: Expect either proto class0 or class1 upon rx SCCP
Until recently, libosmo-sigtran only sent class0, but it is now able to send class1 too (0061e8d0bcba3b0ed5ea255588619627d0975380). Adapt the test expectancies.
use a real time prio since it really needs to do stuff in ral time with high prio. Use lower rt prio than fake_trx since that one is the most important piece providing clock.
asterisk: Rework test TC_ims_call_mo_after_tcp_conn_closed with new expectancies
Previous expected behavior (and Asterisk-UE implementation) was wrong. Since recently, Asterisk behaves better, that is, whenever the TCP conn is dropped by the peer, it will attempt re-connecting and re-registering.
deps: fix overriding recipe for target 'titan.ProtocolEmulations.SCCP'
This patch fixes the following warnings:
Makefile:188: warning: overriding recipe for target 'titan.ProtocolEmulations.SCCP' Makefile:185: warning: ignoring old recipe for target 'titan.ProtocolEmulations.SCCP' Makefile:188: warning: overriding recipe for target 'titan.ProtocolEmulations.SCCP/clean' Makefile:185: warning: ignoring old recipe for target 'titan.ProtocolEmulations.SCCP/clean' Makefile:188: warning: overriding recipe for target 'titan.ProtocolEmulations.SCCP/distclean' Makefile:185: warning: ignoring old recipe for target 'titan.ProtocolEmulations.SCCP/distclean'
The problem is that 'titan.ProtocolEmulations.SCCP' is listed in both ECLIPSEGITLAB_REPOS and OSMOGITHUB_REPOS.
Change-Id: Ia215f02fc08d66fb56e7e0e452b75d6e2f6c59c5 Fixes: 207ce0370 ("deps: Update titan.ProtocolEmulations.SCCP to upstream master")
The UEMux is built upon the ConnHdlr component, allowing to simulate concurrent activity of multiple virtual UEs. This new component will be used in follow-up patches.
So far all of our *_multi TCs have been running the test logic in multiple eNB connections. This is the first TC simulating activity of multiple virtual UEs within a single eNB connection.
Use --disable-remsim-client-ifdhandler as configure argument for osmo-remsim. We don't need this for running tests and this prevents the buildsystem from trying to write to /usr/lib/pcsc/drivers/ which fails the build.
stp: TC_tmt_loadshare*: Use new vty command 'binding-table reset'
Reset the eSLS binding table state before starting the test, to run it with a clean state.
This test also fixes TC_unknown_client_dynamic_tmt_loadshare since it now resets the table after connecting the 2nd dynamic ASP, which allows re-distributing all seeds in the table into the new available set of ASPs.
The current path only worked for with podman and with osmo-dev. Make it work for the following use cases too:
* without podman, with osmo-dev * with podman, with binary packages (instead of osmo-dev)
Removing package=no is required, so testenv builds sccp_demo_user from source when running with --binary-packages. This is needed as sccp_demo_user is not packaged (OS#5899).
sccp: testenv: fix run with asan + latest binaries
When running against osmocom:nightly:asan, build sccp_demo_user with --enable-sanitize. Otherwise this code is not running with asan and doesn't even start (as the libraries we link against are built with --enable-sanitize).
When running against osmocom:latest, check out the latest tag instead of current master.
The idea is to have two variants of the MT-Forward-SM.Err:
* _MS: originated by the MS/UE (via RP-ERROR), * _NET: originated by the network (MSC) itself.
In both testcase scenarios we expect the network to indicate the MT_FORWARD_SM_ERROR on its own, due to the lack of response from MS/UE. Use the right template kind for that, expect a specific Cause value.
testenv: clone_project: fix getting latest version
Extend the logic for getting the last version, so it doesn't only work with libosmo-sigtran (where the last version happens to be the last one returned by "git ls-remote --tags") but also for libosmocore where this isn't the case. Filter the versions by the relevant ones and then sort them to get the highest one.
Don't try to build a PATH that contains the testsuite dir if running the "clean" action, because then no testsuite is defined.
Fix for: $ ./testenv.py clean [testenv] + ['rm', '-rf', '/home/user/.cache/osmo-ttcn3-testenv/git'] Traceback (most recent call last): File "/home/user/code/osmo-dev/src/osmo-ttcn3-hacks/./testenv.py", line 137, in <module> main() File "/home/user/code/osmo-dev/src/osmo-ttcn3-hacks/./testenv.py", line 133, in main clean() File "/home/user/code/osmo-dev/src/osmo-ttcn3-hacks/./testenv.py", line 117, in clean testenv.cmd.run(["rm", "-rf", path]) File "/home/user/code/osmo-dev/src/osmo-ttcn3-hacks/_testenv/testenv/cmd.py", line 106, in run env=generate_env(env), ^^^^^^^^^^^^^^^^^ File "/home/user/code/osmo-dev/src/osmo-ttcn3-hacks/_testenv/testenv/cmd.py", line 72, in generate_env path += f":{os.path.join(testenv.testsuite.ttcn3_hacks_dir, testenv.args.testsuite)}" ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Namespace' object has no attribute 'testsuite'
Clean up the main directory by moving all buildsystem related scripts into a _buildsystem subdirectory.
Rename gen_links.sh.inc to gen_links.inc.sh while at it, so vim does syntax highlighting as shell script and not bitbake.
The rest of these patches in this series lead up to changing the buildsystem to build out-of-tree (so we don't clutter the source dirs with symlinks and build artifacts) and making the build output more readable.
Make the regen_makefile script more consistent with gen_links.inc.sh by also turning it into an include script. By doing this all previously declared variables are available in regen_makefile, which means export and passing as arguments is not needed anymore, making the resulting users simpler.
Use #!/bin/sh -e while at it and remove empty CPPFLAGS_TTCN3 vars.
The related debian bug has been fixed in 2018, so remove the workaround. I've also verified that the binary is called "compiler" in Arch Linux (as some developers are on Arch).
Make it more obvious that the various gen_links.sh scripts are running with "set -e" by adding it to the #! line instead of setting it through an included file.
Rename ignore_pp_results to gen_links_finish in preparation for the next patch where the function will be used for generating more symlinks instead of writing to a gitignore file. This is a separate commit to make the next one more readable.
The buildsystem used to create symlinks to dependency source files in the testsuite directories, and then building inside that source directory. This lead to many unrelated files being in the source directory.
Change the logic to create symlinks to all sources in a separate $BUILDDIR instead (default: _build) and do the build there.
Advantages: * Source directories are not cluttered with other files anymore. * Clean up logic becomes much simpler and faster (rm -rf _build instead of generating a Makefile and running "make clean" in every testsuite directory). * No need to generate gitignore files on the fly anymore. * Using a separate $BUILDDIR is now possible, this will be used by testenv in a follow-up patch when running with podman, to make sure that build artifacts from podman and not using podman are not mixed as they are incompatible.
CC IPL4asp_PT.o CCLD TCCInterface.so CCLD TELNETasp_PT.so CCLD MGCP_Test
Instead of the very verbose messages we would get otherwise. Especially the linking message clutters a whole page of terminal output without this path:
if ... g++ ... $ALL_OBJ_FILES; then : ; else ... $ALL_OBJ_FILES; fi
When running with podman, set a separate builddir to avoid conflicts with build objects generated from running "make" outside of podman. As the buildsystem supports setting a differerent builddir directly now, remove the copy_ttcn3_hacks_dir logic that was used to emulate this feature.
Fix that testenv / ttcn3_start kept running after the testsuites were already done. This was caused by passing an empty string to ttcn3_start as test argument, which causes it to still use the config file, but run in a single test mode:
After the first test ran, ttcn3_start sends "emtc" to the MTC, which replies with "MTC cannot be terminated." as it is still in MC_EXECUTING_TESTCASE instead of MC_READY:
Move f_{dec,enc}_mcc_mnc() API BSSMAP_Templates.ttcn -> GSM_Types.ttcn
The GsmMcc and GsmMnc types used in the function are defined in GSM_Types.ttcn, which is also incldued by BSSMAP_Templates. Hence, move the function there so that it can be used in other testsuites including more generic GSM_Types.ttcn but not BSSMAP_Templates.ttcn.
As the testsuites are now in the _build directory, running them might not be as obvious. Add an example to the README. While at it, explain a bit more what the testenv script does and where one can read more about it.
The Osmocom jenkins nodes run inside LXCs. When we get a coredump it appears on the host, fetch it from there via testenv-coredump-helper, which gets added to the hosts in the related patch.
Change the scheduling priority from 10 to 30, as we are currently see osmo-bts suffering from scheduling latency in jenkins even though we don't run other jobs at that time:
20250425034138405 DL1C ERROR PC clock skew: elapsed_us=387574, error_us=382959 (scheduler_trx.c:449)
This should fix that the kernel prioritizes other (userspace or kernel) processes running on the same machine that have a higher priority. We have seen such an improvement after increasing scheduler priority for osmo-bts-sysmo too (see I2394e6bbc00a1d47987dbe7b70f4b5cbedf69b10).
Priority 30 is higher than 10. From sched(7):
> Processes scheduled under one of the real-time policies (SCHED_FIFO, > SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high).
This testsuite currently gets executed through docker-playground and it fetches this config from osmo-ttcn3-hacks (see If15461240f3037c142c176fc7da745a1701ae3f8).
Move kill_rm_pidfile out of the 4 ttcn3 tcpdump/dumpcap scripts into a shared include file. Use the version of the function that only tries to kill the command with sudo if it was started with sudo.
This fixes dumpcap not stopping if:
* it was started with ttcn3-tcpdump-start.sh (despite the name it will start dumpcap instead of tcpdump if dumpcap was found), where it gets started without sudo, and
* no rule is set in the user's sudoers file to run kill as root with NOPASSWD.