COPY FROM csv fails successfully

inviridi · September 28, 2021, 9:45pm

Hello everyone,

I try to import a csv file into CrateDB like this:

COPY imp5 FROM 'file:///opt/MIB/test.csv';

and get the result:

COPY OK, 0 records affected (1.437 seconds)

There is apparently no issue with the CrateDB column types and yet there are no records imported. The table “imp5” is empty before and after attempting to copy the csv file.

Any suggestions how to find the error?

Thanks in advance!

Cheers,
Frank

jayeff · September 29, 2021, 6:30am

Hello Frank,

without additional information it’s difficult to determine what the issue could be.

Are you able to share the table schema and csv file?

Some pointers on what to check or do:

Look at our documentation for Importing data into CrateDB
Does your CSV file have a header column with comma (",") separated values with column names?
Values aren’t validated. You need to make sure they fit the schema
Especially look out for additional commas in your values (are they escaped?) and mixed case column names (are they quoted?)
Try a divide and conquer approach on identifying the source of the problem (e.g. import just a single line, half number of columns to import,…). Once you have a working example build back up until it stops working and so forth.

I hope this helps. If you figure it out I would be curious to learn what tripped up the import

inviridi · September 29, 2021, 9:35pm

Swift as usual, @jayeff!

I am able to share table schema and csv file. It is not fancy at all.

Looked at the documentation, of course, and thought this would be simple to implement. Oh well.
Yes. See attached file.
They should fit, I generate the schema in Python when generating a new table for the import. Writing data serially to CrateDB works fine with the schema / data types.
Commas are not an issue but I have mixed case columns. Is this a problem? The documentation says:

Furthermore, column names in your data are considered case sensitive (as if they were quoted in a SQL statement).

No more conquering tonight. Maybe tomorrow.
Alternatively, it would be fine to import the data via pandas’ dataframe.to_sql() command into CrateDB. Have not tried this extensively but the first two attempts proved unsuccessful.

test.txt (12.1 KB)

column_name	data_type
avg_i	real
i_l1	real
i_l2	real
i_l3	real
p_l1	real
p_l2	real
p_l3	real
tot_p	real
ts_mi	timestamp without time zone

jayeff · September 30, 2021, 8:13am

Good Morning @inviridi,

Swift as usual, @jayeff!

I looked at the file you provided. There are couple things to note:

When format (json / csv) isn’t specified CrateDB tries to guess from the file extension. If this is not possible CrateDB falls back to JSON. In your case it tries to import the records from test.txt as JSON which will not work.

You can either rename test.txt to test.csv so that CrateDB can derive the format from the extension or explicitly define the format on the COPY FROM with the format option:

COPY <tablename> FROM 'file:///path/to/test.txt'
  WITH (format = 'csv')
  RETURN SUMMARY;

Imported data will not be validated or casted to the correct type. Therefore for imports CrateDB can only handle unix timestamp in milliseconds. This means instead of a ISO8601/RFC3339 timestamp (e.g. “2021-07-06 18:56:05”) you need to use the unix epoch time in milliseconds (e.g. “1625597765000”).

Commas are not an issue but I have mixed case columns. Is this a problem?

Make sure that casing of columns match between CrateDB and csv file. E.g. when you have “ts_mi,I_L1” in csv you should have columns “I_L1” and “ts_mi” defined for the CrateDB table. When using "“ts_mi,i_l1” you should have columns “i_l1” and “ts_mi” in CrateDB.

Also note the RETURN SUMMARY at the end of the COPY FROM. This will return additional information and error messages which helps with debugging an import.

I hope this solves your importing woes

inviridi · September 30, 2021, 10:16pm

Good evening @jayeff,

well, your answer has good points for debugging this further.

The file-to-be-imported has a .csv suffix in my setup, I just could not upload it with this extension.

I saw that data is not validated nor casted but did not think much of it. I would have dealt with this later. But Unix vs ISO time makes sense then… perhaps that would be a useful addition to the documentation?

The return summary addition is nice. There it says that the date is problematic guiding the user to the right spot. But then, why is copy successful returned? This is even the case if no file is found. Barely a success

I’ll convert the timestamps and look at uppercase vs lowercase tomorrow and, of course, report back. Thanks a lot for your help!

jayeff · October 1, 2021, 7:33am

Yes, I wrote this on my ToDo list

CrateDB and it’s documentation is all open source. Community contributions to either are always appreciated. If you want to add something yourself you can find an “Edit on Github” link at the end of all the documentation pages (COPY FROM — CrateDB: Reference)

inviridi · October 1, 2021, 8:32pm

I donate a few Euros to projects I like everynow and then, but have not actively contributed yet. Can we make a deal? I’ll add a few sentences to the documentation and you look into improving the success message… except of course, if you think the current behaviour is great

Another statement I would expand on in the docs is this:

Furthermore CrateDB will only check for primary key duplicates …

What if no primary key is defined? What are the consequences if duplicates are found?

jayeff · October 4, 2021, 11:17am

What if no primary key is defined? What are the consequences if duplicates are found?

This statement in the docs refers to Column Constraints. Primary Key constraint is applied but other constraints (namely NOT NULL and CHECK) are not.

Without primary key if you have duplicated entries in your COPY FROM target file these will be inserted into your table.

Topic		Replies	Views
Help with Data Ingestion SQL	3	502	April 6, 2022
Importing Nulls from a CSV with COPY FROM CrateDB	6	910	September 23, 2020
Using COPY TO for CSV CrateDB	4	510	August 25, 2022
Error occurs while using COPY FROM json file when sequence of the records in a file mismatch CrateDB sql	4	149	December 13, 2023
Upload of wide CSV (200'000 columns) CrateDB	4	189	November 4, 2023

COPY FROM csv fails successfully

Related Topics