Does comparison operator =(equal) case sensitive?

Hello guys,

After reading this topic I have a question, does comparison operator =(equal) case sensitive for varchar or text values? And can we change this behavior for table/query/server.

I know that I can use ILIKE operator like solution, but I interest behavior like in MSSQL where operator =(equal) is case insensitive but can be case sensitive if we will change database/table collation i.e. we can change this behavior but could we do the same in the CrateDB?

Thanks

1 Like

HI @Valeriy_Dzhura

It is case sensitive yes. I am not aware of any table or general settings on cluster level.

  • ILIKE
  • using LOWER()
  • use a fulltext index with e.g. ‘simple’ analyze
CREATE TABLE tab (
  txt TEXT INDEX using FULLTEXT with (type='simple')
 );

INSERT INTO tab VALUES ('Hello');

SELECT txt FROM tab 
WHERE txt = 'hello'
--> 'Hello'
1 Like

Hello again,

I used next analyzer for my field:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

and

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

And as you can see I used lowercase in TOKEN_FILTERS, but when I compare my field with this analyzer using operator =(equal) it is still case sensitive. Looks like lowercase not work for this field or maybe I do something wrong?

For example, these queries returns different results but should be the same:

select * from x.x where name = 'John'
select * from x.x where name = 'john'

Thanks!

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

select * from x.x where name = 'john'

or use lower() in the comparison

select * from x.x where name = lower('John')

if you need to have both options you could create a 2nd index.

Can you show me please an example how I can use two indexes for 1 field in my test case:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

Thanks!

e.g. like that:

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX using plain,
INDEX "name_ft" USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

also see Fulltext indices — CrateDB: Reference

-- exact match
select name from x.x where name = 'John'

-- using full_text index
select name from x.x where name_ft = 'john'
1 Like

Not sure that I understand you right, but why when I use lowercase in TOKEN_FILTERS my queries works wrong, for example you can see next statement:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

create table if not exists x.x(
  "name" text index using fulltext with (analyzer = 'myAnalyzer'),
  "test" varchar(8)
);

insert into x.x(name, test)
values('John','John'),('JOHN','JOHN'),('jOhn','jOhn'),('john','john');

select _score, name from x.x where name = 'john' limit 100; -- NO HIT
select _score, name from x.x where name = 'John' limit 100; -- NO HIT
select _score, name from x.x where name = 'JOHN' limit 100; -- NO HIT

In each select returns NO HIT but as you can see we have this records in the table and in each select I should get all four rows.

According to this:

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

My first query select _score, name from x.x where name = 'john' limit 100; -- NO HIT should return 1 row, but nothing. This is fully misconfused me.

Ah … I overlooked that you used a ngram tokenizer.
Then you should use a MATCH instead Fulltext search — CrateDB: Reference

with

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX using plain,
   INDEX "name_ft" USING FULLTEXT WITH (analyzer = 'myAnalyzer'),
"test" varchar(8)
)

you could do

select _score, name from x.x where name = 'john' limit 100; 

or

select _score, name from x.x where MATCH(name_ft, 'john') limit 100; 

Thank you, but I still not understand how lowercase works and why it works like this. According to this:

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

In index it will be stored in lowercase but when I will use WHERE condition we will not look into index when will use =(equal) operator. =(equal) operator will look only in fields values, not in index, right? Or is it only for fulltext search?

1 Like