For a long time it has not been possible to do a SELECT * FROM table WHERE firstname like ‘t%’; in Cassandra like you could in eg.. MySQL or any other Relation Database for that matter.
In Cassandra v3.4 this is now possible, BUT it requires some extra to do it right, and that is why I created this blog post cause I had trouble finding it.
The solution is to create a separate index, and not the secondary indexes that Cassandra came with, but a different index, called a SASI index.
This is what I have
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
CREATE TABLE bth.employee ( id int, lastname text, firstname text, dateofbirth date, PRIMARY KEY (id, lastname, firstname) ) WITH CLUSTERING ORDER BY (lastname ASC, firstname ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'bucket_high': '1.5', 'bucket_low': '0.5', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'enabled': 'true', 'max_threshold': '32', 'min_sstable_size': '50', 'min_threshold': '4', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'unchecked_tombstone_compaction': 'false'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; |
And the content of it looks like this
1 2 3 4 5 6 7 8 9 10 11 |
cqlsh:bth> select * from bth.employee; id | lastname | firstname | dateofbirth ----+-------------+-----------+------------- 1 | eriksson | tobias | 1973-06-11 2 | holgersson | marcus | 1972-08-01 7 | eriksson | linda | 1974-06-11 3 | abrahamsson | ingvar | 1959-10-05 (4 rows) cqlsh:bth> |
And now I would like to search for all the rows that has a first name that starts with a ‘t’
In SQL that would have been :
SELECTÂ * FROMÂ bth.employee WHEREÂ firstname LIKEÂ ‘t%’;
In fact we could have done that on any column …. but in Cassandra it would result in something like this:
1 2 3 |
cqlsh:bth> SELECT * FROM bth.employee WHERE firstname like 't%'; InvalidRequest: code=2200 [Invalid query] message="firstname LIKE '<term>%' restriction is only supported on properly indexed columns" cqlsh:bth> |
In Cassandra we first has to decide on which columns this should be possible, by creating an index like this:
1 |
CREATE CUSTOM INDEX employee_firstname_idx ON bth.employee (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'}; |
And so you can now do the following
1 2 3 4 5 6 7 8 |
cqlsh:bth>SELECT * FROM bth.employee WHERE firstname LIKE 't%'; id | lastname | firstname | dateofbirth ----+----------+-----------+------------- 1 | eriksson | tobias | 1973-06-11 (1 rows) cqlsh:bth> |
But what if you decide that I would like to know all the employees that ends with an ‘s’ in their name, so something like this:
1 2 3 |
cqlsh:bth> SELECT * FROM bth.employee WHERE firstname like '%s'; InvalidRequest: code=2200 [Invalid query] message="firstname LIKE '%<term>' restriction is only supported on properly indexed columns" cqlsh:bth> |
So to be able to search for something that contains we have to change the index like this instead:
1 |
CREATE CUSTOM INDEX employee_firstname_idx ON bth.employee (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'}; |
And now you can run that query again:
1 2 3 4 5 6 7 8 9 |
cqlsh:bth> SELECT * FROM bth.employee WHERE firstname like '%s'; id | lastname | firstname | dateofbirth ----+------------+-----------+------------- 1 | eriksson | tobias | 1973-06-11 2 | holgersson | marcus | 1972-08-01 (2 rows) cqlsh:bth> |
You can read more about the SASI index here https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html
Enjoy!
-Tobias