|
@@ -10,6 +10,12 @@ Anca Vamanu
|
|
|
|
|
|
<[email protected]>
|
|
|
|
|
|
+Edited by
|
|
|
+
|
|
|
+Alex Balashov
|
|
|
+
|
|
|
+ <[email protected]>
|
|
|
+
|
|
|
Copyright © 2012 1&1 Internet AG
|
|
|
__________________________________________________________________
|
|
|
|
|
@@ -59,7 +65,7 @@ Chapter 1. Admin Guide
|
|
|
|
|
|
Db_cassandra is one of the SIP Router database modules. It does not
|
|
|
export any functions executable from the configuration scripts, but it
|
|
|
- exports a subset of functions from the database API and thus other
|
|
|
+ exports a subset of functions from the database API, and thus, other
|
|
|
modules can use it as a database driver, instead of, for example, the
|
|
|
Mysql module.
|
|
|
|
|
@@ -67,35 +73,37 @@ Chapter 1. Admin Guide
|
|
|
SQL interface to be used by other modules for storing and retrieving
|
|
|
data. Because Cassandra is a NoSQL distributed system, there are
|
|
|
limitations on the operations that can be performed. The limitations
|
|
|
- concern the indexes on which queries are performed as it is only
|
|
|
- possible to have simple conditions(equal only) and only two indexation
|
|
|
- levels that will be explain in an example bellow.
|
|
|
+ concern the indexes on which queries are performed, as it is only
|
|
|
+ possible to have simple conditions (equality comparison only) and only
|
|
|
+ two indexing levels. These issues will be explained in an example
|
|
|
+ below.
|
|
|
|
|
|
Cassandra DB is especially suited for storing large data or data that
|
|
|
requires distribution, redundancy or replication. One usage example is
|
|
|
a distributed location system in a platform that has a cluster of SIP
|
|
|
- Router servers, with more proxies and registration servers accesing the
|
|
|
- same location database. This was actually the main usage we had in mind
|
|
|
- when implementing this module. Please NOTE that it has only been tested
|
|
|
- with usrloc and auth_db modules.
|
|
|
+ Router servers, with more proxies and registration servers accessing
|
|
|
+ the same location database. This was actually the main use case we had
|
|
|
+ in mind when implementing this module. Please NOTE that it has only
|
|
|
+ been tested with the usrloc and auth_db modules.
|
|
|
|
|
|
You can find a configuration file example for this usage in the module
|
|
|
- kamailio_cassa.cfg.
|
|
|
|
|
|
Because the module has to do the translation from SQL to Cassandra
|
|
|
- NoSQL queries, the schemes for the tables must be known by the module.
|
|
|
+ NoSQL queries, the schemas for the tables must be known by the module.
|
|
|
You will find the schemas for location, subscriber and version tables
|
|
|
in utils/kamctl/dbcassandra directory. You have to provide the path to
|
|
|
the directory containing the table definitions by setting the module
|
|
|
parameter schema_path. NOTE that there is no need to configure a table
|
|
|
metadata in Cassandra cluster.
|
|
|
|
|
|
- Special attention was given to the performance in Cassandra. Therefore,
|
|
|
- the implementation uses only the native row indexation in Cassandra and
|
|
|
- no secondary indexes that are costly. Instead, we simualte a secondary
|
|
|
- index by using the column names and putting information in them, which
|
|
|
- is very efficient. Also for deleting expired records, we let Cassandra
|
|
|
- take care of this with its own mechanism (by setting ttl for columns).
|
|
|
+ Special attention was given to performance in Cassandra. Therefore, the
|
|
|
+ implementation uses only the native row indexing in Cassandra and no
|
|
|
+ secondary indexes, because they are costly. Instead, we simulate a
|
|
|
+ secondary index by using the column names and putting information in
|
|
|
+ them, which is very efficient. Also, for deleting expired records, we
|
|
|
+ let Cassandra take care of this with its own mechanism (by setting the
|
|
|
+ TTL for columns).
|
|
|
|
|
|
2. Dependencies
|
|
|
|
|
@@ -122,9 +130,9 @@ Chapter 1. Admin Guide
|
|
|
3.1. schema_path (string)
|
|
|
|
|
|
The directory where the files with the table schemas are located. This
|
|
|
- directory has to contains the directories corresponding to the
|
|
|
- databases names(name of the directory = name of the database). And
|
|
|
- these directories contain the files with the table schema. See the
|
|
|
+ directory has to contain the subdirectories corresponding to the
|
|
|
+ databases' names (name of the directory = name of the database). These
|
|
|
+ directories, in turn, contain the files with the table schemas. See the
|
|
|
schemas in utils/kamctl/dbcassandra directory.
|
|
|
|
|
|
Example 1.1. Set schema_path parameter
|
|
@@ -141,11 +149,11 @@ Chapter 1. Admin Guide
|
|
|
|
|
|
Because it dependes on an external library, the db_cassandra module is
|
|
|
not compiled and installed by default. You can use one of these
|
|
|
- options.
|
|
|
+ options:
|
|
|
* - edit the "Makefile" and remove "db_cassandra" from
|
|
|
"excluded_modules" list. Then follow the standard procedure to
|
|
|
install SIP Router: "make all; make install".
|
|
|
- * - from command line use: 'make all include_modules="db_cassandra";
|
|
|
+ * - from command line, run: 'make all include_modules="db_cassandra";
|
|
|
make install include_modules="db_cassandra"'.
|
|
|
|
|
|
6. Table schema
|
|
@@ -166,12 +174,12 @@ Chapter 1. Admin Guide
|
|
|
for the entire record, we must ensure that when the ttl is updated,
|
|
|
it is updated for all columns for that record. In other words, to
|
|
|
update the expiration time of a record, an insert operation must be
|
|
|
- performed from the point of view of db_cassandra module (insert in
|
|
|
- Cassandra means replace if exists or insert new record otherwise).
|
|
|
- So, if you define a table with a timestamp column, the update
|
|
|
- operations on that table that also update the timestamp must update
|
|
|
- all columns. So these update operations must in fact be insert
|
|
|
- operations.
|
|
|
+ performed from the point of view of the db_cassandra module
|
|
|
+ ("insert" in Cassandra means "replace if exists or insert new
|
|
|
+ record otherwise"). So, if you define a table with a timestamp
|
|
|
+ column, the update operations on that table that also update the
|
|
|
+ timestamp must update all columns. So, these update operations must
|
|
|
+ in fact be insert operations.
|
|
|
* Second row: the columns that form the row key separated by space.
|
|
|
* Third row: the columns that form the secondary key separated by
|
|
|
space.
|
|
@@ -187,41 +195,42 @@ cket(string) user_agent(string) username(string)
|
|
|
...
|
|
|
|
|
|
Observe first that the row key is the username and the secondary index
|
|
|
- is the contact. And that we have also defined a timestamp column -
|
|
|
- expires. In this example, both the row key and the secondary index are
|
|
|
- defined by only one column, but they can be formed out of more columns,
|
|
|
+ is the contact. We have also defined a timestamp column - expires. In
|
|
|
+ this example, both the row key and the secondary index are defined by
|
|
|
+ only one column, but they can be formed out of more columns. You can
|
|
|
list them separated by space.
|
|
|
|
|
|
- To understand why the schema looks like this we must first see which
|
|
|
- are the queries performed on the location table. (The 'callid'
|
|
|
- condition was ignored as it doesn't really have a well defined role in
|
|
|
- the SIP RFC).
|
|
|
+ To understand why the schema looks like this, we must first see which
|
|
|
+ queries are performed on the location table. (The 'callid' condition
|
|
|
+ was ignored as it doesn't really have a well defined role in the SIP
|
|
|
+ RFC).
|
|
|
* When Invite received, lookup location: select where username='..'.
|
|
|
* When Register received, update registration: update where
|
|
|
username='..' and contact='..'.
|
|
|
|
|
|
- So the relation between these keys is the following:
|
|
|
+ So, the relation between these keys is the following:
|
|
|
* The unique key for a table is actually the combination of row key +
|
|
|
secondary key.
|
|
|
* A row defined by a row key will contain more records with different
|
|
|
secondary keys.
|
|
|
|
|
|
- The timestamp column that leaves the Cassandra cluster deal with
|
|
|
- deleting expired record can be used only with a modification in usrloc
|
|
|
- module that replaces the update performed at re-registration with an
|
|
|
- insert operation (so that all columns are updated). This behavior can
|
|
|
- be enabled by setting a parameter in usrloc module db_update_as_insert:
|
|
|
+ The timestamp column that leaves the Cassandra cluster to deal with
|
|
|
+ deleting expired record can be used only with a modification to the
|
|
|
+ usrloc module that replaces the update performed at re-registration
|
|
|
+ with an insert operation (so that all columns are updated). This
|
|
|
+ behavior can be enabled by setting a parameter in the usrloc module
|
|
|
+ db_update_as_insert:
|
|
|
|
|
|
...
|
|
|
modparam("usrloc", "db_update_as_insert", 1)
|
|
|
...
|
|
|
|
|
|
The alternative would have been to define an index on the expire column
|
|
|
- and run a external job to delete periodically the expired records. But
|
|
|
- this obviously is more costly.
|
|
|
+ and run a external job to periodically delete the expired records.
|
|
|
+ However, obviously, this would be more costly.
|
|
|
|
|
|
7. Limitations
|
|
|
|
|
|
- The module can be used used only when the queries use only one index
|
|
|
- which is also the unique key, or have two indexes that form the unique
|
|
|
- key like in the usrloc usage.
|
|
|
+ The module can be used only when the queries use only one index, which
|
|
|
+ is also the unique key, or have two indexes that form the unique key
|
|
|
+ like in the usrloc usage.
|