How are Secondary Indices really stored ?
This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive
Let’s just create a simple table
1 2 3 4 5 |
CREATE TABLE customer ( id int PRIMARY KEY, city text, name text ) |
Or visualized as a table :
Column | Type | Key |
id | int | Primary Key |
city | text | |
name | text |
If we then create an index like this
1 |
CREATE INDEX customer_city_idx ON customer (city); |
Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key
Column | Type | Key |
city | text | Primary Key |
id | int | Clustering Key |
With some data it would be like this for the “customer” table.
Id | Name | City |
1 | Italia Pizzeria | Kalmar |
2 | Thai Silk | Kalmar |
3 | Royal Thai | Stockholm |
4 | Indian Corner | Malmö |
And the index which then is a “table” would thus be like this
City | Id |
Kalmar | 1 |
Kalmar | 2 |
Stockholm | 3 |
Malmö | 4 |
When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.