I noticed that the Order by function doesn't work correctly in Turkish in the TEXT field

Hello,

I’m new to CrateDb and trying to work on a project. I noticed that the Order by function doesn’t work correctly in Turkish in the TEXT field.

Is there a way to fix this?

My Test;

create table test(
test_field text INDEX using fulltext with (analyzer = ‘turkish’)
);

insert into test(test_field) values(‘A’);
insert into test(test_field) values(‘Ş’);
insert into test(test_field) values(‘Z’);

select * from test order by test_field limit 100;

Result;
A
Z
Ş

According to Turkish, it should have been as follows;
A
Ş
Z

I’ll be happy if you can help me.

Thanks

Good morning Murat,
Specifying a language analyzer for a text field in CrateDB has an effect on matching but it does not have any effect on sorting at the moment.
If you need localised sorting this could either be done client side or with an UDF, if you want to implement this within CrateDB with UDFs 2 options come to mind, array_agg of the resut set to UDF then sort with localeCompare then unnest, or some sort of UDF that would return a weight based on summing weights for each letter with a dictionary mapping. There may be better approaches these are just 2 ideas I got now.
I know this is not what you were hoping for but I hope it helps.
Thank you.

Hi again Murat,
An alternative approach here that just occurred to me is to store a “collated” version of the string in a separate field for sorting purposes, something along the lines of:

CREATE FUNCTION collatestring(TEXT) 
RETURNS TEXT
LANGUAGE JAVASCRIPT
AS 'function collatestring(str) {  
		return str.normalize("NFD").replace(/[\u0300-\u036f]/g, "");}';

CREATE TABLE test(
	test_field TEXT INDEX USING FULLTEXT WITH (analyzer = 'turkish'),
	test_field_collated TEXT GENERATED ALWAYS AS collatestring(test_field)
);

INSERT INTO test(test_field) VALUES ('A'),('Ş'),('Z');

SELECT test_field FROM test ORDER BY test_field_collated;

Thank you Hernan. Even though it’s not the exact solution I wanted, I will give it a try.