<div dir="ltr">Hey Daniel,<br><br><div>Ok great, I will test. No I don't see a need to make async_mutex re-entrant but let me check.</div><div><br></div><div>I will test your change too. Do you have a commit ref for your change?</div><div><br></div><div>Cheers</div><div>Jason</div></div><br><div class="gmail_quote">On Mon, 30 Mar 2015 at 17:03 Daniel-Constantin Mierla <<a href="mailto:miconda@gmail.com">miconda@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jason,<br>

<br>

On 03/02/15 15:27, Daniel-Constantin Mierla wrote:<br>

> Hi Jason,<br>

><br>

> On 03/02/15 08:29, Jason Penton wrote:<br>

>> Hi Daniel,<br>

>><br>

>> I have found another instance of a similar bug where the a process<br>

>> gets itself into deadlock. The sequence of events:<br>

>><br>

>> 1. INVITE request is relayed downstream<br>

>> 2. error response or locally generated 408 occurs<br>

>> 2. Failure route in cfg is armed and tries to relay to alternate<br>

>> destination (voicemail/annoucnement server, etc) - using t_relay()<br>

>> 3. t_relay fails to destination for whatever reason (in this case the<br>

>> result of the destination being blacklisted in TM)<br>

>> 4. Original reply is attempted to be relayed upstream (this is all<br>

>> done safely - ie using t_reply_unsafe() as the REPLY_LOCK has already<br>

>> been acquired in failure route<br>

>> 5. At or around line 562 of t_reply.c we check if the reply is > 200<br>

>> (final response). If it is and if its an INVITE transaction, we clean<br>

>> up timers and cancel all UAC branches. This is where we hit a problem:<br>

>><br>

>> cancel_uacs( trans, &cancel_data, F_CANCEL_B_KILL, lock );<br>

>><br>

>> 6. cancel_uacs calls cancel_branch if there was no response on the<br>

>> branch by calling cancel_branch()<br>

>> 7. cancel branch calls LOCK_REPLIES(t) on the transaction and you hit<br>

>> deadlock.... (original call to LOCK_REPLIES(t) happened in failure<br>

>> route processing)<br>

>><br>

>> This use case is slightly different to the one you fixed related to<br>

>> retransmission timers... (the beginning of this thread)<br>

>><br>

>> Suggestion: What about a pkg/process variable that can be used to<br>

>> check if you have already acquired the reply lock to combat these<br>

>> non-reentrant deadlocks. A side-effect of this would save on passing<br>

>> variables around in various functions indicating intention to lock or<br>

>> not.<br>

>><br>

>> Thoughts?<br>

> What about making the reply lock re-entrant? I think it should be safer<br>

> in long term to allow parts of code executed by same process to safely<br>

> acquire the lock without worrying if the same process did it before.<br>

> Global variables in pkg would need more complex management to set, test<br>

> and reset.<br>

><br>

I made the reply lock re-entrant, this should handle the situation you<br>

were exposing it. if you have a chance to simulate it, then let me know<br>

the result.<br>

<br>

On a slightly different idea, I wonder if async_mutex should be made<br>

re-entrant as well, what do you think? I guess it is unlikely to get two<br>

async attempts in the same process, but you know better how the mutex is<br>

planned to be used.<br>

<br>

Cheers,<br>

Daniel<br>

<br>

--<br>

Daniel-Constantin Mierla<br>

<a href="http://twitter.com/#!/miconda" target="_blank">http://twitter.com/#!/miconda</a> - <a href="http://www.linkedin.com/in/miconda" target="_blank">http://www.linkedin.com/in/<u></u>miconda</a><br>

Kamailio World Conference, May 27-29, 2015<br>

Berlin, Germany - <a href="http://www.kamailioworld.com" target="_blank">http://www.kamailioworld.com</a><br>

<br>

</blockquote></div>