[SR-Users] High availability

Moacir Ferreira moacirferreira at hotmail.com
Wed Jun 8 21:39:28 CEST 2016


Thanks Chabert. Very comprehensive post. Maybe it is very easy for me to understand because my strongest background is networking. Actually, I have been involved in few projects where I used BGP to do what you mention to flip an entire datacenter over 400Kms distantance from each other. What we really need to care about is building, in somehow, a process that will remove the route to the faulty unity.

Regarding VxLAN, Linux iproute2 is Linux Foundation and it delivers VxLAN. Very stable and very used, specially on SDN environments where you may have thousands VxLAN tunnels over different name spaces interconnecting racks, buildings, cities, countries. So, don't bother using it! It is very safe and easy to use.

My starting point was trying to understand what it takes to run Kamailio over a HA environment. But my final "recreation" goal is to create a NFV - Network Virtual Function - and deploy Kamailio using OpenStack. In this way OpenStack will be responsible to launch (and relaunch it if it fails) an instance wherever I have a OpenStack Compute running. As OpenStack relies on VxLAN for inter-site L2 interconnection, I wanna start from there, and that is why I choose to use VxLAN.

Now, to be really effective in a SDN environment, your NFV should allow for "scaling out" (launching more instances) if the server workload is too high. This brings the need to really understand HA in Kamailio so you can use the right technique to do this. Just as "food for thought" think about having a virtual machine running Kamailio that is able to handle up to 1.000 calls. Should the number of calls go over 1.000, then OpenStack would just launch another instance of Kamailio and load-balance in between (or amongst) them. This could scale out as long as you have available psychical resources, that could be anywhere in the world. That is why in a previous post I have asked if two Kamailio servers could use a single database (if it has no conflicts). So when your new instance come alive it would be able to reach any registered user/client using the user location feature. Of course, this would also allow for load-balancing.

However, I did not get an answer yet. So can anyone tell me how many features that we define on the "#!define with" config file can be enabled if 2 (or more) Kamailio servers are accessing the very same database? Please consider that in a MySQL cluster only one client at time has permission to write the same record.

Cheers!
Moacir

From: chabert.loic.74 at gmail.com
Date: Wed, 8 Jun 2016 09:45:15 +0200
To: sr-users at lists.sip-router.org
Subject: Re: [SR-Users] High availability

Hi Moaci,

If you want to provide Kamailio HA in two different datacenters, BGP will be the best tool to achieve that.
Passing keepalived packets throught VXLAN could be a solution, but if you have any packet loss or bugs in VXLAN implementation in linux kernel, your service will be down.

I have tested a setup in lab using ExaBGP, and please find steps bellow:
Same IP address configured on loopback address of kamilio hostKamailio listen on loopback and on physical nicDiffusing kamailio service using DNS (A entry -> @loopback)Configure bgp router to created the following comportement (using bgp communities):Community 65000:1, this server is master, and request must be routed to himCommunity 65000:2, this server is slave, router must wait removal of community master to insert this route on routing table (using bgp localpref)
Using sipsak to send "ping" request to kamailio (physical nic):If sipsak return 0 and server is master, ask to exabgp to announce loopback address with bgp community master.If sipsak return 0 and server is slave, ask to exabgp to announce loopback address with bgp community slave.If sipsak return 1, ask to exabgp to remove bgp announces

With this setup, kamailio public address will be announced in two datacenter, and without using any L2 keepalived technologies (ucarp, vrrp ...).
SipSak check if sip service is available on node, and if kamailio is not running, loopback address will be removed from routing table. If the entiere node fail (hardware failure), loopback address will be removed too, due to BGP keepalived not recieved by PE router. You can ajust BGP timers to make HA convergence faster. In my lab, i have set bgp keepalive to 5s and timeout to 15s. So if my kamailio host failed, maximum 15s without any sip service.
You can have more than 2 node: create many bgp community (so priorities) as you want, announce loopback address on many datacenter you want. Full flexibility and agility.
More informations: https://vincent.bernat.im/en/blog/2013-exabgp-highavailability.html



My lab is running since 3 weeks, and no "crash" or any bgp flapping.
Regards.

2016-06-07 22:06 GMT+02:00 Moacir Ferreira <moacirferreira at hotmail.com>:



Sorry... It is a long answer.

Keepalived is a kind of VRRP (RFC5798) like implementation. It is being used all over and it is distributed as a standard package in, for example, CentOS. So no compiling issues, you get upgrades, fixes, etc., in a very simple way.

I got this nice blog on Kamailio HA, and I will start my tests from it. Now, for MySQL my test bed is already prepared and I am suing MariaDB Galera. I will have 2 MySQL servers, synchronizing the databases in between them all the time. Each MySQL server will have Kamailio on it where Kamailio server 1 will use the MySQL on server 1; Kamailio server 2 will use the MySQL on server two. However, while the MySQL services will be up and running on both servers and so synchronized, Kamailio will run only on server 1 over the VIP address. Should it fail, the keepalived script will then kill local Kamailio instance and start the Kamailio on server 2 (the same it will do if the complete server fails). Please notice that this is crucial to prevent brain split where both Kamailios could "think" they were the master. As the databases will be synchronized the service interruption should be quite short.

Now, the VIP solution rely on a L2 connectivity in between (or amongst) involved servers and, to be really redundant, the servers should be geographically distant. So, if the above initial tests work, my next step will separating the two Kamailio servers using a L3 router and, using iproute2, create a VxLAN tunnel in between the 2 emulated remote sites. This would eliminate the need of buying a L2 extension in between the two involved datacenters.

While it solves the problem of a "small" system, if I were providing SIP services to the Internet my approach would be another one: I would have 2 Kamailios, using the same public IP address and use BGP to flip from one location to another. In this case there would be no Kamailio VIP interface but still I would use keepalived to monitor the services and drop the local BGP connection so forcing the Internet to reach my second server in another location. In this case your ISP would set their BGP local preference to reach your Kamailio to the main site; should the main site fails, BGP would use the second path to reach your server that, in fact, would be another Kamailio instance using the same IP of the main one. While this puts more complexity on the design, it would also prevent the brain split and the geographical distance limitation would be just how far your ISP could go.

If anyone is interested on this kind of techniques just drop into this thread and we keep the discussion going.

You could help me a lot responding to a question that I have regarding all this is: How Kamailio would behave if I have one active Kamailio writing in the database and a second "live" Kamailio using the same database but not receiving any connection/request. Would this jeopardize the way Kamailio works?

Cheers,
Moacir

To: sr-users at lists.sip-router.org
From: miconda at gmail.com
Date: Tue, 7 Jun 2016 09:01:23 +0200
Subject: Re: [SR-Users] High availability


  
    
  
  
    Just to mention that ucarp can be a very basic alternative to
      corosync/pacemaker to manage the virtual ip. A simple app, but
      usually keeps working...

    Cheers,

      Daniel

    

    

    On 06/06/16 02:21, Fred Posner wrote:

    
    
      
      If it's just 2 servers, consider as Juha said,
        corosync/pacemaker with drbd. 
      

      
      Fred Posner
        direct:
            +1 (224) 334-FRED (3733)
      
      

        On Jun 5, 2016, at 5:26 PM, Moacir Ferreira <moacirferreira at hotmail.com>
        wrote:

        

      
      
        
          
          Hi,

            

            Sorry... I should have mentioned before. You guys are
            thinking on the standard Internet SIP calls' behavior while
            I am trying to use Kamailio on a large "industrial" project.
            This said:

            

            Assuming that the end-point is "smart", the DNS method is
            functional but it would take quite a while before the UA
            (phone) recovers from the previous name/IP binding it has in
            cache;

            SRV is good for a "smart" UA that, unfortunately, is not the
            case;

            Same for the phone units as they are industrial "Help
            Points" and so quite "dummy".

            

            While I never tested it, I thought I could use two Kamailio
            servers with a mysql cluster like mariadb-galera where, for
            Kamailio functions, one server would be "active" and another
            "passive" server. Then use keepalived for monitoring the
            "active" Kamailio and starting the "passive" server if the
            active Kamailio fails. Without any testing, tests that I
            think I should have done before putting questions in here,
            my questions are:

            

            Suppose that  I have two Kamailio servers, one "active" and
            another one "passive" (not running) where the mysql
            databases are synchronized in between two servers using
            MySQL Galera. Using keepalived I would monitor the active
            Kamailio instance. Should it fails, start the "passive"
            Kamailio instance using the same MySQL database that were
            supposed to be synchronized. Would this new Kamailio
            instance be able to find a called number? Why this question?
            As long as I understand, Kamailio will always challenge the
            UA for authentication before making a call, so if this
            second server gets a call request it would just challenge
            and authenticate the caller. The "key point" would be having
            this new Kamailio instance aware about the called
            destinations. So, delivering a MySQL database, with the
            latest data the active Kamailio had, to this new Kamailio
            instance would be enough to allow it find the called party?

            

            Anyway, can you guys comment on my "thoughts"? Is it
            possible? Am I missing something? Would you suggest another
            approach for such scenario?

            

            Cheers!

            Moacir 

            

            
              Date: Sun, 5 Jun 2016 21:07:41 +0200

              From: chabert.loic.74 at gmail.com

              To: sr-users at lists.sip-router.org

              Subject: Re: [SR-Users] High availability

              

              Hello Bill,
              I have made kamailio ha using exabgp with
                loopbacks.
              Check https://github.com/Exa-Networks/exabgp
              With bgp, kamailio cluster can be splited on
                severals datacenters.
              Regards.
              Le 5 juin 2016 20:53, "Bill"
                <bill at novatrope.us>
                a écrit :

                
                   Hi Moacir

                    

                    We have only found three ways to handle failover.

                    1. Change the DNS entry whenever a failure is
                    detected.

                    2. Use SRV records to display an alternate route.

                    3. Use the failover mechanism in the phone itself

                    

                    1. works, but it may take some time for your ua's to
                    become aware of the change

                    2. never have been able to get this to work as
                    advertised.

                    3. Works pretty well depending on the phone. (We use
                    mostly Yealink's and they seem to handle the
                    failover pretty well.)

                    

                    Hope this helps

                    

                    On 06/05/2016 07:41 AM, Moacir Ferreira wrote:

                    
                    
                      Hi,

                        

                        I got two questions regarding high availability:

                        

                        1 - Should my Kamailio server fail, I would like
                        another Kamailio "box/server" to take over with
                        minimum services disruption. What is the
                        "community" advice for such environment?

                        

                        2 - Should my main PSTN gateway fail, what would
                        be the best mechanism to redirect calls to a
                        second PSTN gateway?

                        

                        Cheers!

                        Moacir

                      
                      

                      
                      

                      _______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users at lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

                    
                    

                  
                  

                  _______________________________________________

                  SIP Express Router (SER) and Kamailio (OpenSER) -
                  sr-users mailing list

                  sr-users at lists.sip-router.org

                  http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

                  

                
              
              

              _______________________________________________
              SIP Express Router (SER) and Kamailio (OpenSER) - sr-users
              mailing list
              sr-users at lists.sip-router.org
              http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
          
        
      
      
        _______________________________________________

          SIP Express Router (SER) and Kamailio (OpenSER) -
            sr-users mailing list

          sr-users at lists.sip-router.org

          http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

        
      
      

      
      

      _______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users at lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

    
    

    -- 
Daniel-Constantin Mierla
http://www.asipto.com - http://www.kamailio.org
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
  


_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users at lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users 		 	   		  

_______________________________________________

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list

sr-users at lists.sip-router.org

http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users





_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users at lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20160608/787d0b96/attachment.html>


More information about the sr-users mailing list