ANALYZER with two and more TOKENIZER/TOKEN_FILTER

Hello guys,

Could someone help me please with next question, I created custom analyzer:

create ANALYZER bankruptcies_ngram_and_synonym (
TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem)
);

but it is not work well, when I populated and run this:

select * from <my table> WHERE MATCH (firstname,'WILLIMA') AND STATE = 'NY' limit 100;

or this:

select * from bankruptcies.bankruptcies WHERE MATCH (firstname,'william') AND STATE = 'NY' limit 100;

I got no hit, but this data 100% exist in this table and this looks like lowercase and ngram not work. Maybe I did this analyzer wrong or I should to do additional steps or something else?

Is your table defined to use the analyzer in a fulltext index?

create table <my table> (
  firstname TEXT INDEX using fulltext with (analyzer = 'bankruptcies_ngram_and_synonym')
);

or using a separate fulltext index:

create table <my table> (
  firstname TEXT,
 INDEX firstname_ft using fulltext(firstname) with (analyzer = 'bankruptcies_ngram_and_synonym')
);

Second variant:

CREATE TABLE IF NOT EXISTS "bankruptcies"."test" (
   "bankruptciesid" BIGINT,
   "firstname" TEXT,
   "middlename" TEXT,
   "lastname" TEXT,
   PRIMARY KEY ("bankruptciesid"),
  INDEX firstname_ft USING FULLTEXT (firstname) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
  , INDEX middlename_ft USING FULLTEXT (middlename) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
  , INDEX lastname_ft USING FULLTEXT (lastname) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
)

P.S. I right now tried create as first variant and it works, but what difference between these ones?

If you define separate indexes you need to use them in the query i.e.:

select * from bankruptcies.bankruptcies WHERE MATCH (firstname_ft,'william') AND STATE = 'NY' limit 100;
1 Like