pfcp_peer: add configurable heartbeat fail threshold + association reset
Introduce heartbeat_fail_threshold (default: 3) that tracks consecutive unanswered PFCP Heartbeat Requests via the heartbeat_fail_count field and resets the association when the threshold is reached. The counter applies to both periodic heartbeats and those triggered via the REST API, and resets to zero on any successful Heartbeat Response.
pfcp_peer: detect UPF restart via RTS mismatch in Heartbeat Response
Check the Recovery Timestamp (RTS) in each Heartbeat Response against the value stored during Association Setup (`rem_rts`). A mismatch means the UPF has restarted; log a warning, increment the new `pfcp.peer_restart counter`, and reset the PFCP association so it is re-established with the restarted peer.
The `is_integer(ExpRTS)` guard prevents a spurious reset when a stale Heartbeat Response arrives in the connecting state (where `rem_rts` is still undefined).
Reduce verbosity of the Tx/Rx Heartbeat Req/Resp log messages from LOG_INFO to LOG_DEBUG in preparation for periodic heartbeat support. Without this change, periodic heartbeats would flood the logging.
Add a heartbeat_interval parameter to the pfcp_peer config section. When non-zero, pfcp_peer sends a periodic Heartbeat Request to the UPF at the configured interval using a named gen_statem timeout (hb_timer). The timer is started on entry to the connected state and cancelled on re-entry to the connecting state. Default is 10000 ms (10 seconds).
Add a heartbeat_interval parameter to the pfcp_peer config section. When non-zero, pfcp_peer sends a periodic Heartbeat Request to the UPF at the configured interval using a named gen_statem timeout (hb_timer). The timer is started on entry to the connected state and cancelled on re-entry to the connecting state. Default is 0 (disabled).
pfcp_peer: add configurable heartbeat miss count + association reset
Introduce heartbeat_miss_count (default: 3) that tracks consecutive unanswered PFCP Heartbeat Requests and resets the association when the threshold is reached. The counter applies to both periodic heartbeats and those triggered via the REST API, and resets to zero on any successful Heartbeat Response.
pfcp_peer: make assoc_setup and heartbeat_req timeouts configurable
Add assoc_setup_timeout and heartbeat_req_timeout as optional fields in the pfcp_peer config map, with 2000 ms defaults. Store the full cfg() map in #peer_state{} and read values from it with maps:get/3 at the point of use.
pfcp_peer: detect UPF restart via RTS mismatch in Heartbeat Response
Check the Recovery Timestamp (RTS) in each Heartbeat Response against the value stored during Association Setup (`rem_rts`). A mismatch means the UPF has restarted; log a warning, increment the new `pfcp.peer_restart counter`, and reset the PFCP association so it is re-established with the restarted peer.
The `is_integer(ExpRTS)` guard prevents a spurious reset when a stale Heartbeat Response arrives in the connecting state (where `rem_rts` is still undefined).
config/sys.config: group pfcp_peer params into a map
Following the same pattern as sctp_{client,server}, group the flat pfcp_loc_addr/pfcp_rem_addr environment variables into a pfcp_peer map. The old flat keys are still supported for backwards compat.
Changes: * osmo_s1gw_sup: add pfcp_cfg(), merging legacy flat keys with the new pfcp_peer map (new takes priority); store the resolved config back via set_env(pfcp_peer, ...) so all consumers see a single canonical map * pfcp_peer: change start_link/2 to start_link/1 taking a cfg() map; simplify init() using sctp_common:parse_addr/1; add cfg() type * rest_server: read pfcp laddr/raddr from the pfcp_peer map
As a bonus, `tried_mmes` now only serves its actual purpose - tracking which MMEs have already been tried for selection filtering - rather than being abused as a way to retrieve the current MME name.
rest_server: fix TOC/TOU race when listing/fetching E-RABs
The list of E-RAB FSM pids is a snapshot taken at one point in time. By the time we interrogate each erab_fsm process individually, any of them may have already terminated (e.g. bearer released mid-request). The current code fails to generate a response if this happens.
* fetch_erab_info/1: add a pid() clause that wraps erab_list_item/1 in a try/catch, returning 'error' if the process is gone.
* fetch_erab_info/1: catch both exit forms ** `{noproc, _}` raised by gen_statem:call/2 on a monitored pid, and ** the bare noproc atom for other code paths.
* fetch_erab_list/1: switch from lists:map to lists:filtermap and call fetch_erab_info/1 per E-RAB, silently dropping any that died between the snapshot and the per-process interrogation.
enb_{proxy,registry}: signal MME conn info on SCTP comm_up
Previously, mme_aid/mme_saddr/mme_sport were only signalled to the enb_registry once the S1 Setup procedure completed. This meant the REST API could not show MME connection details for eNBs stuck in wait_s1setup_rsp state (e.g. due to a slow or retrying MME).
Add notify_mme_comm_up/2, called at SCTP comm_up, which stores the full conn_info (including mme_aid, mme_saddr, mme_sport) in the registry as soon as the SCTP connection is established.
notify_mme_connected/2 is simplified to notify_mme_connected/1: it now only flips the state to 'connected', since conn_info is already stored.
enb_proxy: split conn_info() into mme_conn_info() and proxy_info()
The old conn_info() conflated two distinct concerns: the MME SCTP connection info stored in enb_registry (aid, saddr, sport) and the broader operational state used for introspection (handler pid, enb connection info, etc.). Mixing them forced enb_registry to hold a handler pid it has no business knowing about, and required rest_server to extract that pid just to reach s1ap_proxy for E-RAB listing.
Split into two distinct types:
* mme_conn_info() - pure MME SCTP connection info (aid, saddr, sport), stored in the enb_registry and signalled via notify_mme_comm_up/2. The `mme_` prefix is dropped from field names as the type name provides the context.
* proxy_info() - richer operational snapshot (handler, enb_handle, enb_conn_info, mme_conn_info, genb_id_str, mme_info), returned by fetch_info/1 for introspection/debugging purposes.
Additionally:
* Add fetch_erab_list/1 to enb_proxy, delegating internally to s1ap_proxy:fetch_erab_list/1 via the cached handler pid. This allows the rest_server to obtain a list of E-RAB without having to obtain pid of the s1ap_proxy and interact with it.
* Remove separate enb_aid/mme_aid/mme_saddr/mme_sport state fields; enb_aid is now read directly from enb_conn_info, and the MME fields are grouped in mme_conn_info.
The full enb_list table with all address columns is too wide to fit on a page and does not render well in PDF. Collapse the address columns with '...'; add a note that they are omitted for readability.
rest_server: fix TOC/TOU race when listing/fetching E-RABs
The list of E-RAB FSM pids is a snapshot taken at one point in time. By the time we interrogate each erab_fsm process individually, any of them may have already terminated (e.g. bearer released mid-request). The current code fails to generate a response if this happens.
* fetch_erab_info/1: add a pid() clause that wraps erab_list_item/1 in a try/catch, returning 'error' if the process is gone.
* fetch_erab_info/1: catch both exit forms ** `{noproc, _}` raised by gen_statem:call/2 on a monitored pid, and ** the bare noproc atom for other code paths.
* fetch_erab_list/1: switch from lists:map to lists:filtermap and call fetch_erab_info/1 per E-RAB, silently dropping any that died between the snapshot and the per-process interrogation.
enb_proxy: split conn_info() into mme_conn_info() and proxy_info()
The old conn_info() conflated two distinct concerns: the MME SCTP connection info stored in enb_registry (aid, saddr, sport) and the broader operational state used for introspection (handler pid, enb connection info, etc.). Mixing them forced enb_registry to hold a handler pid it has no business knowing about, and required rest_server to extract that pid just to reach s1ap_proxy for E-RAB listing.
Split into two distinct types:
* mme_conn_info() - pure MME SCTP connection info (aid, saddr, sport), stored in the enb_registry and signalled via notify_mme_comm_up/2. The `mme_` prefix is dropped from field names as the type name provides the context.
* proxy_info() - richer operational snapshot (handler, enb_handle, enb_conn_info, mme_conn_info, genb_id_str, mme_info), returned by fetch_info/1 for introspection/debugging purposes.
Additionally:
* Add fetch_erab_list/1 to enb_proxy, delegating internally to s1ap_proxy:fetch_erab_list/1 via the cached handler pid. This allows the rest_server to obtain a list of E-RAB without having to obtain pid of the s1ap_proxy and interact with it.
* Remove separate enb_aid/mme_aid/mme_saddr/mme_sport state fields; enb_aid is now read directly from enb_conn_info, and the MME fields are grouped in mme_conn_info.
The full enb_list table with all address columns is too wide to fit on a page and does not render well in PDF. Collapse the address columns with '...'; add a note that they are omitted for readability.
As a bonus, `tried_mmes` now only serves its actual purpose - tracking which MMEs have already been tried for selection filtering - rather than being abused as a way to retrieve the current MME name.
enb_{proxy,registry}: signal MME conn info on SCTP comm_up
Previously, mme_aid/mme_saddr/mme_sport were only signalled to the enb_registry once the S1 Setup procedure completed. This meant the REST API could not show MME connection details for eNBs stuck in wait_s1setup_rsp state (e.g. due to a slow or retrying MME).
Add notify_mme_comm_up/2, called at SCTP comm_up, which stores the full conn_info (including mme_aid, mme_saddr, mme_sport) in the registry as soon as the SCTP connection is established.
notify_mme_connected/2 is simplified to notify_mme_connected/1: it now only flips the state to 'connected', since conn_info is already stored.
enb_{proxy,registry}: signal MME conn info on SCTP comm_up
Previously, mme_aid/mme_saddr/mme_sport were only signalled to the enb_registry once the S1 Setup procedure completed. This meant the REST API could not show MME connection details for eNBs stuck in wait_s1setup_rsp state (e.g. due to a slow or retrying MME).
Add notify_mme_comm_up/2, called at SCTP comm_up, which stores the full conn_info (including mme_aid, mme_saddr, mme_sport) in the registry as soon as the SCTP connection is established.
notify_mme_connected/2 is simplified to notify_mme_connected/1: it now only flips the state to 'connected', since conn_info is already stored.
As a bonus, `tried_mmes` now only serves its actual purpose - tracking which MMEs have already been tried for selection filtering - rather than being abused as a way to retrieve the current MME name.
enb_proxy: split conn_info() into mme_conn_info() and proxy_info()
The old conn_info() conflated two distinct concerns: the MME SCTP connection info stored in enb_registry (aid, saddr, sport) and the broader operational state used for introspection (handler pid, enb connection info, etc.). Mixing them forced enb_registry to hold a handler pid it has no business knowing about, and required rest_server to extract that pid just to reach s1ap_proxy for E-RAB listing.
Split into two distinct types:
* mme_conn_info() - pure MME SCTP connection info (aid, saddr, sport), stored in the enb_registry and signalled via notify_mme_comm_up/2. The `mme_` prefix is dropped from field names as the type name provides the context.
* proxy_info() - richer operational snapshot (handler, enb_handle, enb_conn_info, mme_conn_info, genb_id_str, mme_info), returned by fetch_info/1 for introspection/debugging purposes.
Additionally:
* Add fetch_erab_list/1 to enb_proxy, delegating internally to s1ap_proxy:fetch_erab_list/1 via the cached handler pid. This allows the rest_server to obtain a list of E-RAB without having to obtain pid of the s1ap_proxy and interact with it.
* Remove separate enb_aid/mme_aid/mme_saddr/mme_sport state fields; enb_aid is now read directly from enb_conn_info, and the MME fields are grouped in mme_conn_info.
The full enb_list table with all address columns is too wide to fit on a page and does not render well in PDF. Collapse the address columns with '...'; add a note that they are omitted for readability.
The enb_proxy now captures the local address and port of the S1GW-MME SCTP connection (mme_saddr/mme_sport) at comm_up and includes them in conn_info(). Expose this info through the REST API (EnbItem schema), show it as a new column/row in the CLI (enb_list/enb_info).
The enb_proxy now captures the local address and port of the S1GW-MME SCTP connection (mme_saddr/mme_sport) at comm_up and includes them in conn_info(). Expose this info through the REST API (EnbItem schema), show it as a new column/row in the CLI (enb_list/enb_info).
The full enb_list table with all address columns is too wide to fit on a page and does not render well in PDF. Collapse the address columns with '...'; add a note that they are omitted for readability.
For each eNB connection, include the name of the MME that was selected from the pool. Update the OpenAPI spec, CLI (enb_list/enb_info tables), and user manual accordingly.
Add support for selecting an MME or eNB by remote address and port in the REST API and CLI. The selector format is `addr:IP:PORT` for MMEs and `enb-conn:IP:PORT` for eNBs, where IP can be an IPv4 or IPv6 address. The colon is used as the address/port separator.
For each eNB connection, include the name of the MME that was selected from the pool. Update the OpenAPI spec, CLI (enb_list/enb_info tables), and user manual accordingly.
enb_proxy: obtain sctp_client sockopts from the env directly
Instead of passing the MmeConnCfg (sctp_client:cfg()) all the way from osmo_s1gw_sup through sctp_server (as priv) into enb_proxy (as state), read the sctp_client configuration in-place via osmo_s1gw:get_env/2 when it is actually needed (connecting/enter).
This works because osmo_s1gw_sup already normalizes and writes back the complete sctp_client config to the application env (set_env/2) before starting the supervision tree.
As a result, mme_conn_cfg is removed from enb_proxy's state record, start_link/2 no longer uses its Priv argument, and server_cfg/1 is simplified to server_cfg/0.
When an MME is selected from the pool, pass the mme_registry:mme_info() to the enb_registry via notify_mme_connecting/2 (replacing /1). This makes all MME configuration details (name, address, port, TAC list) available to consumers such as the REST server, without having to look them up separately from the mme_registry.
This endpoint returns the effective runtime configuration that OsmoS1GW is currently using, with all defaults applied. This reflects the values read via `osmo_s1gw:get_env/2` at startup.
config/sys.config: increase StatsD reporter interval to 10s
When running ttcn3-s1gw-test locally, I noticed OsmoS1GW consuming 30-40% of a CPU core while idle (not serving any eNBs). Profiling with etop pointed to `exometer_report_statsd` as the culprit: with ~1720 metrics registered, it was sending ~1720 datagrams per second.
A 1-second reporting interval causes noticeable CPU overhead as the number of active metrics grows. With per-eNB and per-MME counters, the metric set scales with the number of connected eNBs/MMEs.
Increase the default StatsD reporting interval from 1s to 10s to reduce that overhead. Update the documentation accordingly.
The new version is using ETS instead of dict for counter lookups. This significantly reduces performance impact when multiple eNBs are registered, since the ETS provides O(1) average-case hash lookups and in-place mutation.
This endpoint returns the effective runtime configuration that OsmoS1GW is currently using, with all defaults applied. This reflects the values read via `osmo_s1gw:get_env/2` at startup.
When an MME is selected from the pool, pass the mme_registry:mme_info() to the enb_registry via notify_mme_connecting/2 (replacing /1). This makes all MME configuration details (name, address, port, TAC list) available to consumers such as the REST server, without having to look them up separately from the mme_registry.
For each eNB connection, include the name of the MME that was selected from the pool. Update the OpenAPI spec, CLI (enb_list/enb_info tables), and user manual accordingly.
enb_proxy: obtain sctp_client sockopts from the env directly
Instead of passing the MmeConnCfg (sctp_client:cfg()) all the way from osmo_s1gw_sup through sctp_server (as priv) into enb_proxy (as state), read the sctp_client configuration in-place via osmo_s1gw:get_env/2 when it is actually needed (connecting/enter).
This works because osmo_s1gw_sup already normalizes and writes back the complete sctp_client config to the application env (set_env/2) before starting the supervision tree.
As a result, mme_conn_cfg is removed from enb_proxy's state record, start_link/2 no longer uses its Priv argument, and server_cfg/1 is simplified to server_cfg/0.
The new version is using ETS instead of dict for counter lookups. This significantly reduces performance impact when multiple eNBs are registered, since the ETS provides O(1) average-case hash lookups and in-place mutation.
config/sys.config: increase StatsD reporter interval to 10s
When running ttcn3-s1gw-test locally, I noticed OsmoS1GW consuming 30-40% of a CPU core while idle (not serving any eNBs). Profiling with etop pointed to `exometer_report_statsd` as the culprit: with ~1720 metrics registered, it was sending ~1720 datagrams per second.
A 1-second reporting interval causes noticeable CPU overhead as the number of active metrics grows. With per-eNB and per-MME counters, the metric set scales with the number of connected eNBs/MMEs.
Increase the default StatsD reporting interval from 1s to 10s to reduce that overhead. Update the documentation accordingly.
config/sys.config: increase StatsD reporter interval to 10s
A 1-second reporting interval causes noticeable CPU overhead as the number of active metrics grows. With per-eNB and per-MME counters, the metric set scales with the number of connected eNBs/MMEs.
Increase the default StatsD reporting interval from 1s to 10s to reduce that overhead. Update the documentation.
The new version is using ETS instead of dict for counter lookups. This significantly reduces performance impact when multiple eNBs are registered, since the ETS provides O(1) average-case hash lookups and in-place mutation.