<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Zunnun</b> <span dir="ltr"><<a href="mailto:zunnun@gmail.com">zunnun@gmail.com</a>></span><br>Date: Wed, Mar 30, 2011 at 10:51 AM<br>
Subject: crash & 100% CPU usage problem with kamailio 3.1.2<br>To: <a href="mailto:sr-dev@lists.sip-router.org">sr-dev@lists.sip-router.org</a><br><br><br><div><b>kamailio 3.1.2 issues</b></div><div><br></div><div><b>Problem 1:</b></div>
<div><br></div><div>Running heavy stress (for few hours), we have seen 100 % CPU usage </div><div>Reason: the linked list is circular. next pointer points itself & the loop never breaks.</div>
<div><br></div><div>file: tcp_main.c</div><div><br></div><div>function: </div><div><br></div><div>inline static int _tcpconn_add_alias_unsafe(struct tcp_connection* c, int port,struct ip_addr* l_ip, int l_port,int flags)</div>
<div><br></div><div>for (a=tcpconn_aliases_hash[hash], nxt=0; a; a=nxt){</div><div> nxt=a->next;</div><div><br></div><div>here a->next points to a & loop never breaks</div><div><br></div>
<div>
<br></div><div><br></div><div><br></div><div><b>Problem 2: </b></div><div>kamailio process terminates (heavy stress for over 24 hours)</div><div><br></div><div>Reason: it calls abort()</div><div><br></div><div>file: tcp_main.c</div>
<div>function</div><div><br></div><div>inline static int tcpconn_chld_put(struct tcp_connection* tcpconn)</div><div>{</div><div> if (unlikely(atomic_dec_and_test(&tcpconn->refcnt))){</div><div> DBG("tcpconn_chld_put: destroying connection %p (%d, %d) "</div>
<div> "flags %04x\n", tcpconn, tcpconn->id,</div><div> tcpconn->s, tcpconn->flags);</div><div> /* sanity checks */</div><div>
membar_read_atomic_op(); /* make sure we see the current flags */</div><div> if (unlikely(!(tcpconn->flags & F_CONN_FD_CLOSED) ||</div><div> (tcpconn->flags &</div>
<div> (F_CONN_HASHED|F_CONN_MAIN_TIMER|</div><div> F_CONN_READ_W|F_CONN_WRITE_W)) )){</div><div> LOG(L_CRIT, "BUG: tcpconn_chld_put: %p bad flags = %0x\n",</div>
<div> tcpconn, tcpconn->flags);</div><div> abort(); //CALLS abort</div><div> }</div><div> _tcpconn_free(tcpconn); /* destroys also the wbuf_q if still present*/</div>
<div> return 1;</div><div> }</div><div> return 0;</div><div>}</div><div><br></div><div><br></div><div><b>Problem 3: </b></div><div>kamailio crashed (heavy stress, seen it twice after 4 days 8 hours)</div>
<div>Reason: Circular link list is bad, prev pointer is NULL & kamailio access it</div><div><br></div><div>#0 local_timer_list_expire (lt=0x82eea0, saved_ticks=1295476481) at local_timer.c:221</div><div>221 _timer_rm_list(tl); /* detach */</div>
<div>(gdb) bt</div><div>#0 local_timer_list_expire (lt=0x82eea0, saved_ticks=1295476481) at local_timer.c:221</div><div>#1 local_timer_expire (lt=0x82eea0, saved_ticks=1295476481) at local_timer.c:250</div><div>#2 local_timer_run (lt=0x82eea0, saved_ticks=1295476481) at local_timer.c:274</div>
<div>#3 0x0000000000510c3e in tcp_timer_run () at tcp_main.c:4384</div><div>#4 tcp_main_loop () at tcp_main.c:4564</div><div>#5 0x0000000000469eba in main_loop () at main.c:1641</div><div>#6 0x000000000046c04f in main (argc=<value optimized out>, argv=0x7fff3d28a3c8) at main.c:2398</div>
<div><br></div><div>(gdb) print tl</div><div>$1 = <value optimized out></div><div>(gdb) print h</div><div>$2 = (struct timer_head *) 0x855eb8</div><div>(gdb) print *h</div><div>$3 = {next = 0x0, prev = 0x2acbac5398b8}</div>
<div>(gdb) print *h->prev</div><div>$4 = {next = 0x0, prev = 0x855eb8, expire = 1295476481, initial_timeout = 1920, data = 0x2acbac5397d0, f = 0x4f8310 <tcpconn_main_timeout>, flags = 512, slow_idx = 0}</div><div>
(gdb)</div><div><br></div><div>once prev pointer was NULL & next crash next pointer was NULL</div><div><br></div><div><b>Problem 4:</b> </div><div>kamailio process terminated (heavy stress, found it twice, after 16 hours)</div>
<div>Reason: it calls abort()</div><div><br></div><div>file : mem/q_malloc.c</div><div><br></div><div>function</div><div><br></div><div>void qm_free(struct qm_block* qm, void* p)</div><div><br></div><div>partial code:</div>
<div><br></div><div>#ifdef DBG_QM_MALLOC</div><div> qm_debug_frag(qm, f);</div><div> if (f->u.is_free){</div><div> LOG(L_CRIT, "BUG: qm_free: freeing already freed pointer,"</div>
<div> " first free: %s: %s(%ld) - aborting\n",</div><div> f->file, f->func, f->line);</div><div> abort(); //CALLS ABORT</div>
<div> </div><div> }</div><div> MDBG("qm_free: freeing frag. %p alloc'ed from %s: %s(%ld)\n",</div><div> f, f->file, f->func, f->line);</div><div>
#endif</div>
<div><br></div><div><b>problem 5</b>:</div><div>infinite loop - log file is full of these messages 100% CPU at that time</div><div><div>/kamailio[21562]: : <core> [io_wait.h:617]: BUG: io_watch_del: invalid fd -1, not in [0, 2)</div>
<div>//kamailio[21562]: : <core> [tcp_read.c:1218]: ERROR: tcpconn_receive: handle_io: io_watch_del failed for 0x2acbac5397d0 </div></div>
</div><br>