[sr-dev] new dialog module design: tm trigger for dialog management

Wed Mar 24 17:32:44 CET 2010

Hello,

the new dialog module proposal seems to imply that dialog_out management
(and thereby, implicitly, dialog_in management) is to be carried out
when responses arrive at a proxy, i.e., on execution of the tm callback
type TMCB_RESPONSE_IN.

However, as the comment to this callback type in modules/tm/t_hooks.h
illustrates, such callbacks will also be run on retransmissions. This
means that the dialog module will have to take precautions and either

- check whether the response being observed is a retransmission, for
instance by means of checking whether a dialog_out entry corresponding
to the response already exists, or
- ignore retransmissions and stupidly (over-)write the dialog_out entry,
i.e., in case of a retransmission re-write the same entry.

The first option induces some delay for CPU cycles and memory accesses,
and adds some complexity w.r.t. correct retransmission checking which
must not miss possible corner cases. The second option, in addition to
wasting even more resources (maybe even I/O when persistent storage of
dialog data is enabled), requires that overwriting of dialog state does
not have any side-effects. I am not sure if this is always the case.

Instead of thinking about which option to prefer and what needs to be
taken care of, I'd rather re-propose a different approach that I have
brought forward before in a slightly different context: Instead of using
TMCB_RESPONSE_IN, run the dialog state machinery on
TMCB_RESPONSE_PRE_OUT. It's guaranteed to be retransmission-free and
should cover everything we have discussed so far.

Moreover, it will render dialog_out state needless and ease state
keeping: As long as provisional responses are being forwarded (either by
a local or remote proxy), the dialog_in state must be "early" because
the proxy would only stop informing the UAC about provisional responses
when it received a final response, either due to exhaustion of all
branches or a positive OK response. Once the forking proxy decides to
forward a response with code 200+, all other early dialogs can be safely
deleted and the dialog_in state set accordingly.

Consequently, step 2 of the algorithm that changes dialog_in state
("Algorithm for dialog_in state") given in the proposal boils down to a
simple matching rule as follows:

Set dialog_in state to the state corresponding to the return code of the
forwarded message. That is, if the response code is
  * greater than 100, but smaller than 200: set it to "early" (may be
    skipped if there is a dialog_out entry already as this implies that
    another early branch has already been established and the
    dialog_in state set to "early"),
  * 200: set it to "confirmed" (step 1 will have deleted the remaining
    branches),
  * greater than 200: set it to "terminated" (step 1 will have deleted
    the remaining branches).

In-dialog BYE requests will just have to set the dialog_in state to
"terminated", trimming down the state machine further.

Considering the corner case of two branches' 200 responses arriving at
exactly the same time (let's name them concurrently confirmed dialogs),
I believe that this should yield two completely independent dialog
entries, with one dialog_in and dialog_out each. Managing them under the
same dialog_in as two dialog_outs gives the impression that they are
somewhat related but this is not true as each carries a unique dialog
identifier. Furthermore, under the current proposal, I believe that
callbacks registered for the yet non-confirmed dialog will be effective
for the concurrently confirmed dialog too because both OK responses
should be forwarded by the same transaction, resulting in the same
dialog callbacks being run for both dialogs. If the transaction module
handles simultaneously forwarded OK responses differently, then there
will be no callbacks at all for one of the dialogs which is equally bad.

Instead, there should be a clean separation of the two dialogs supported
by a new dialog callback type (e.g., DLGCB_CONCURRENTLY_CONFIRMED) which
allows the user to decide whether he wants to track the additionally
confirmed dialog as well, and how long he wants to track it based on
further dialog callbacks he chooses. For example, he may register
DGLCB_TERMINATED on the additional dialog or opt that getting to know
about its generation is sufficient. Generally, in-dialog requests will
be mapped to the respective dialogs (which may involve some more effort
because dialog hashes are stored in the route headers during processing
of the initial request, but it's doable) and managed individually.

Concluding, this modified approach will keep the new dialog module from
doing unnecessary and hard-to-do duplicate checks, eases state
management, and keeps concurrently confirmed dialogs cleanly separated.

Opinions?

Cheers,

--Timo