[sr-dev] git:master: db_cluster: new module for generic database clustering

Fri Mar 30 12:04:04 CEST 2012

Hello,

On 3/30/12 11:47 AM, Marius Zbihlei wrote:
> On 03/27/2012 03:44 PM, Daniel-Constantin Mierla wrote:
>> Module: sip-router
>> Branch: master
>> Commit: 201fc2d600e48fbb717531c79013c1b971f82d76
>> URL:    
>> http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=201fc2d600e48fbb717531c79013c1b971f82d76
>>
>> Author: Daniel-Constantin Mierla<miconda at gmail.com>
>> Committer: Daniel-Constantin Mierla<miconda at gmail.com>
>> Date:   Tue Mar 27 14:38:57 2012 +0200
>>
> Hello Daniel,
>
> I have a few questions regarding the db_cluster module and especially 
> the way it deals with errors:
>
>  For serial operation , lets consider two handlers DB1 and DB2 with 
> the same priority. For the first write operation DB1 lets presume that 
> itfails so the insert is done on DB2 (network congestion, mysql 
> deadlock etc). But for a serial select the first DB1 is chosen (I 
> looked thru the code and I see no ways of caching the initial error) 
> and this means that the info returned from DB1 (it might be 
> insert_update or update so info might be also present in DB1). How 
> does the module handle this?
>
> Same scenarios and question  I think it applies with round-robin mode 
> as well.
>
> The way we do this in p_usrloc is by keeping a error counter per each 
> handler that is associated with a state (on -off) and a timestamp 
> (when it failed). This info can be used to disable usage of the DB 
> handler( and later put the handler in Write-Only mode until the data 
> is synchronized )

error detection and connection (auto-) enable/disable are not 
implemented yet. They were in mind, but couldn't decide quickly what 
solution to do.

Each connection links to a structure in shared memory - dbcl_shared_t - 
one field there being 'state', planned to be used to mark the connection 
active/inactive.

Marking inactive is easy, when a command fails. Bring it back active is 
more complex, I thought of:
- counting how many commands would have been sent when connection is 
inactive and if a threshold is reached, then try reconnect
- keep the timestamp when connection became inactive and try to bring it 
active when a specified interval elapsed

Other suggestions/contributions are welcome -- the code is in the repo, 
so if anyone wants to jump in development, feel free to do it...

To your questions, doing serial or round robin writes in some databases 
and also using them for reads in the same fashion would require a 
replication at database layer.

I thought of cases such as:
- master db servers for writes in parallel (e.g., location/presence in 
db only mode)
- slave db servers (replicated from masters) for reads in round robin

Of course one could write to all servers and read from all servers, but 
if the traffic is very high, might be good to reuse some db layer 
features for scalability.

Cheers,
Daniel

>
> Cheers,
> Marius
>
>> Zbihlei Marius
>>
>> Head of
>> Linux Development Services Romania
>>
>> 1&1 Internet Development srl    Tel KA: 754-9152
>> Str Mircea Eliade 18            Tel RO: +40-31-223-9152
>> Sect 1, Bucuresti               mailto: marius.zbihlei at 1and1.ro
>> 71295, Romania
>
> _______________________________________________
> sr-dev mailing list
> sr-dev at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- 
Daniel-Constantin Mierla
Kamailio Advanced Training, April 23-26, 2012, Berlin, Germany
http://www.asipto.com/index.php/kamailio-advanced-training/