COPY FROM csv fails successfully

Hello everyone,

I try to import a csv file into CrateDB like this:

COPY imp5 FROM 'file:///opt/MIB/test.csv';

and get the result:

COPY OK, 0 records affected (1.437 seconds)

There is apparently no issue with the CrateDB column types and yet there are no records imported. The table “imp5” is empty before and after attempting to copy the csv file.

Any suggestions how to find the error?

Thanks in advance!

Cheers,
Frank

Hello Frank,

without additional information it’s difficult to determine what the issue could be.

Are you able to share the table schema and csv file?

Some pointers on what to check or do:

  • Look at our documentation for Importing data into CrateDB
  • Does your CSV file have a header column with comma (",") separated values with column names?
  • Values aren’t validated. You need to make sure they fit the schema
  • Especially look out for additional commas in your values (are they escaped?) and mixed case column names (are they quoted?)
  • Try a divide and conquer approach on identifying the source of the problem (e.g. import just a single line, half number of columns to import,…). Once you have a working example build back up until it stops working and so forth.

I hope this helps. If you figure it out I would be curious to learn what tripped up the import :blush:

Swift as usual, @jayeff! :smiley:

I am able to share table schema and csv file. It is not fancy at all.

  • Looked at the documentation, of course, and thought this would be simple to implement. Oh well.
  • Yes. See attached file.
  • They should fit, I generate the schema in Python when generating a new table for the import. Writing data serially to CrateDB works fine with the schema / data types.
  • Commas are not an issue but I have mixed case columns. Is this a problem? The documentation says:

Furthermore, column names in your data are considered case sensitive (as if they were quoted in a SQL statement).

  • No more conquering tonight. Maybe tomorrow. :smiley:
  • Alternatively, it would be fine to import the data via pandas’ dataframe.to_sql() command into CrateDB. Have not tried this extensively but the first two attempts proved unsuccessful.

test.txt (12.1 KB)

column_name data_type
avg_i real
i_l1 real
i_l2 real
i_l3 real
p_l1 real
p_l2 real
p_l3 real
tot_p real
ts_mi timestamp without time zone

Good Morning @inviridi,

Swift as usual, @jayeff! :smiley:

:grinning_face_with_smiling_eyes:

I looked at the file you provided. There are couple things to note:

When format (json / csv) isn’t specified CrateDB tries to guess from the file extension. If this is not possible CrateDB falls back to JSON. In your case it tries to import the records from test.txt as JSON which will not work.

You can either rename test.txt to test.csv so that CrateDB can derive the format from the extension or explicitly define the format on the COPY FROM with the format option:

COPY <tablename> FROM 'file:///path/to/test.txt'
  WITH (format = 'csv')
  RETURN SUMMARY;

Imported data will not be validated or casted to the correct type. Therefore for imports CrateDB can only handle unix timestamp in milliseconds. This means instead of a ISO8601/RFC3339 timestamp (e.g. “2021-07-06 18:56:05”) you need to use the unix epoch time in milliseconds (e.g. “1625597765000”).

Commas are not an issue but I have mixed case columns. Is this a problem?

Make sure that casing of columns match between CrateDB and csv file. E.g. when you have “ts_mi,I_L1” in csv you should have columns “I_L1” and “ts_mi” defined for the CrateDB table. When using "“ts_mi,i_l1” you should have columns “i_l1” and “ts_mi” in CrateDB.

Also note the RETURN SUMMARY at the end of the COPY FROM. This will return additional information and error messages which helps with debugging an import.

I hope this solves your importing woes :blush:

1 Like

Good evening @jayeff,

well, your answer has good points for debugging this further.

The file-to-be-imported has a .csv suffix in my setup, I just could not upload it with this extension.

I saw that data is not validated nor casted but did not think much of it. I would have dealt with this later. But Unix vs ISO time makes sense then… perhaps that would be a useful addition to the documentation?

The return summary addition is nice. There it says that the date is problematic guiding the user to the right spot. But then, why is copy successful returned? This is even the case if no file is found. Barely a success :thinking:

I’ll convert the timestamps and look at uppercase vs lowercase tomorrow and, of course, report back. Thanks a lot for your help! :+1:

Yes, I wrote this on my ToDo list :+1:

CrateDB and it’s documentation is all open source. Community contributions to either are always appreciated. If you want to add something yourself you can find an “Edit on Github” link at the end of all the documentation pages (COPY FROM — CrateDB: Reference) :blush:

1 Like

I donate a few Euros to projects I like everynow and then, but have not actively contributed yet. Can we make a deal? I’ll add a few sentences to the documentation and you look into improving the success message… except of course, if you think the current behaviour is great :slight_smile:

Another statement I would expand on in the docs is this:

Furthermore CrateDB will only check for primary key duplicates …

What if no primary key is defined? What are the consequences if duplicates are found?

1 Like

What if no primary key is defined? What are the consequences if duplicates are found?

This statement in the docs refers to Column Constraints. Primary Key constraint is applied but other constraints (namely NOT NULL and CHECK) are not.

Without primary key if you have duplicated entries in your COPY FROM target file these will be inserted into your table.

2 Likes