[sr-dev] dedicated mutex for t_continue()

Daniel-Constantin Mierla miconda at gmail.com
Wed May 13 08:59:26 CEST 2015


Hi Jason,

the commit doesn't seem correct, because looks like you removed the
locking completely. It should still stay with locking on reply mutex.

Can you revert the last commit? Then I can take care of removing the
unnecessary code -- note that the code before your commit was already
using the reply lock, so was like it should just be, with dedicated
mutex code being inside defines which were not enabled.

Cheers,
Daniel

On 13/05/15 08:56, Jason Penton wrote:
> Hey Daniel,
>
> You are correct this is no longer needed - we actually only used this
> to be able to set a flag on the transaction that we later used for
> branch picking. We made a fix a while back for doing the correct
> branch picking which no longer required this extra flag and therefore
> no longer required the mutex. I have tested in our env and all looks good.
>
> p.s. I have committed the cleanup.
>
> Cheers
> Jason
>
> On Tue, 12 May 2015 at 10:34 Daniel-Constantin Mierla
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>     Hi Jason,
>
>     ok, would be great to sort it out properly, thanks for your time!
>
>     Cheers,
>     Daniel
>
>
>     On 12/05/15 10:10, Jason Penton wrote:
>>     Hey Daniel,
>>
>>     Okay great, let me look into this. It will be great if we have
>>     one less mutex to worry about ;) - If not required I will remove
>>     and commit.
>>
>>     Cheers
>>     Jason
>>
>>     On Mon, 11 May 2015 at 09:55 Daniel-Constantin Mierla
>>     <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>         Hi Jason,
>>
>>         over the weekend I pushed a patch that disables the use of
>>         dedicated
>>         mutex for t_continue(). It can be enabled by defining
>>         ENABLE_ASYNC_MUTEX.
>>
>>         While investigated some reports of crash when removing from
>>         time, I
>>         found the potential of a race when t_coninue() is executed at
>>         the time
>>         the fr_timer for suspended transaction elapsed. The timer
>>         process will
>>         get the transaction out and remove it from timer under the
>>         reply lock
>>         and the worker doing t_continue() will get it out under the
>>         async lock.
>>
>>         I looked at the commit you did when introducing the dedicated
>>         async
>>         mutex, the note being:
>>
>>                 - "dedicated lock to prevent multiple invocations of
>>         suspend on
>>         tz (reply lock used to be used)"
>>
>>         Perhaps tz is tx and stands for transmission - however, the
>>         reply lock
>>         should be safe for this case as well. Moreover, the continue
>>         is like the
>>         suspended branch got a reply and transaction continues
>>         processing, which
>>         implies the reply lock is aquired (like execution of
>>         failure_route,
>>         which can also happen if fr_timer elapses before t_continue()
>>         is executed).
>>
>>         Given those, I don't see anymore a reason for dedicated async
>>         mutex.
>>         Also, it protects to races of using two mutexes, which can
>>         easily lead
>>         to deadlocks (e.g., one process acquires the reply lock and
>>         tries to get
>>         the async lock while another one wanted first the reply lock
>>         and later
>>         the async lock).
>>
>>         For now I disabled the code with defines, as I wanted to
>>         discuss and be
>>         sure I haven't overlooked any issue you tried to avoid with the
>>         dedicated mutex. Let me know what you think about.
>>
>>         Cheers,
>>         Daniel
>>
>>         --
>>         Daniel-Constantin Mierla
>>         http://twitter.com/#!/miconda
>>         <http://twitter.com/#%21/miconda> -
>>         http://www.linkedin.com/in/miconda
>>         Kamailio World Conference, May 27-29, 2015
>>         Berlin, Germany - http://www.kamailioworld.com
>>
>
>     -- 
>     Daniel-Constantin Mierla
>     http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda
>     Kamailio World Conference, May 27-29, 2015
>     Berlin, Germany - http://www.kamailioworld.com
>

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20150513/d1079f10/attachment.html>


More information about the sr-dev mailing list