cuart: Fix waiting time to be per-byte instead of total timeout
The previous code multiplied WT by the number of expected bytes, creating a total timeout proportional to the transfer size. This works fine for (currently unsupported) high baud rates, but it makes it look like the reader "freezes" at default rates due to the very long delay.
Just reset it upon rx and do not multiply it so it behaves as expected.
7816fsm: reset stale cuart state on FSM RESET entry
Reset paths reached without power-cycling (WTIME_EXP, HW_ERR, CARD_REMOVAL during a warm reset) leave the cuart with stale tx_busy, rx_threshold and wtime_etu from the prior transaction. The next ATR then hits card_uart_tx tx_busy assertion, or the ATR receive stalls because the 33-byte ATR can never reach a multi-byte rx_threshold left from a TPDU.
The new card_uart_tx_abort() clears tx_busy + rx_after_tx_compl + WT, without driving a synthetic TX_COMPLETE through the FSM.
iso7816_3_reset_onenter is the right place to do this alongside rx_threshold=1 and wtime_etu=default, this mirrors what card_uart_ctrl(POWER_*=0) already does, but for the warm-reset paths that don't touch power.
libosmo_emb: type-safe tearfree_u64_t wrapper for LDRD/STRD access
Although types are frowned upon because memorizing all differerent flavors of void* is the usual way to get acquainted with any mature C code base some heretics decided to introduce generics in C11, which can be used to make aligned access (which guarantees tear-free/restartable 64 bit access on cortex m4) less exciting.