Issue with COPY FROM a CSV with ~2M records

Hello there, I’m currently evaluating CrateDB.

I have a CSV file with ~2M records.

First, I used the try.crate.io script to try it out. I successfully imported the file above using the COPY FROM command.

Then, I went ahead and used the provided docker-compose file to create a cluster with 3 nodes:

Then, I entered the Admin UI and attempted the same COPY FROM command.

However, I’m never able to fully import it. At some random point, it fails with:

  • JobKilledException[Job killed. Participating node=cratedb02 disconnected.]
  • JobKilledException[Job killed. Participating node=cratedb03 disconnected.]

When I SELECT COUNT(*) it has inserted some lines there, but not all of them. The number of lines inserted is random: 20k, 50k, 200k, 500k, etc.

I tried to increase the CRATE_HEAP_SIZE from 2g to 8g but there’s no difference.

Am I doing something wrong?

Here are the node logs: Master Node: crate02 · GitHub

Thanks

I’m making some tests using the Crate Python client.

I’m testing inserting 100k records.

I tested with a batch size of 5k, then 8k, then 10k. This is when it failed.

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 473, in _request
    response = self.server_pool[next_server].request(
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 154, in request
    return self.pool.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rafa/repos/crate/test.py", line 21, in <module>
    cratecur.executemany("""
  File "/usr/local/lib/python3.9/site-packages/crate/client/cursor.py", line 67, in executemany
    self.execute(sql, bulk_parameters=seq_of_parameters)
  File "/usr/local/lib/python3.9/site-packages/crate/client/cursor.py", line 53, in execute
    self._result = self.connection.client.sql(sql, parameters,
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 396, in sql
    content = self._json_request('POST', self.path, data=data)
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 522, in _json_request
    response = self._request(method, path, data=data)
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 513, in _request
    self._drop_server(next_server, ex_message)
  File "/usr/local/lib/python3.9/site-packages/crate/client/http.py", line 581, in _drop_server
    raise ConnectionError(
crate.client.exceptions.ConnectionError: No more Servers available, exception from last server: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The script is very simple, I’m getting data from a MySQL db and inserting into Crate:

from crate import client
import mysql.connector

mysqlconn = mysql.connector.connect(host="127.0.0.1", port=3306, user="root", password="admin", database="dev_db")
mysqlcur = mysqlconn.cursor()

crateconn = client.connect(['http://localhost:4201/'], username='crate', password='')
cratecur = crateconn.cursor()

mysqlcur.execute("""
  SELECT application_id, follow_up_number, follow_up_date, follow_up_date_string, follow_up_result, resulted_by, title_name
  FROM untitled_name
  LIMIT 100000""")

batch_size = 10000
while True:
  rows = mysqlcur.fetchmany(batch_size)
  if len(rows) == 0:
    break

  cratecur.executemany("""
    INSERT INTO followups (application_id, follow_up_number, follow_up_date, follow_up_date_string, follow_up_result, resulted_by, title_name)
    VALUES (?, ?, ?, ?, ?, ?, ?)""",
    rows)

mysqlconn.close()
crateconn.close()
1 Like

Hi @rafbgarcia,

the NullPointerException happening on crate02 is strange. I believe it leads to the JobKilledException.

What version of CrateDB are you using?

Can you give some specs (CPU, RAM, disk) for the system you use to evalute CrateDB?
Maybe this could be the issue as running MySQL, three CrateDB nodes, ingest script is potentially overloading your test-setup. This could potentially also explain why your script worked fine for single node CrateDB cluster but does fail when you use the 3-node-cluster.

Hi @jayeff,

Crate Version: 4.8.0.

System info:

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro16,1
  Processor Name:	6-Core Intel Core i7
  Processor Speed:	2.6 GHz
  Number of Processors:	1
  Total Number of Cores:	6
  L2 Cache (per Core):	256 KB
  L3 Cache:	12 MB
  Hyper-Threading Technology:	Enabled
  Memory:	16 GB

Disk: 319GB available Flash Storage

Note that it also fails when I attempt to COPY FROM a CSV file.

Let me know if you need anything else.

Thanks!

Can you please also share the CrateDB node logs when you run your Crate Python client for testing inserts?

@jayeff here it is: crate01.md · GitHub

The last file contains the logs for the Python script execution.

Removed server http://localhost:4201 from active pool
...

Let me know if you need something else. I’m up to Zoom if you think it’d be helpful.

No logs, it just crashed.
The logs below are after its reinitialization (seem to be only start up logs)

Yes, looks like start up logs only.

Maybe you can inspect logs with docker logs after the node crashed (check docker ps --all to find container ids for your crashed containers).

Regarding the java.lang.NullPointerException: Cannot invoke "String.length()" because "str" is null": A fix for this is merged and will be included in next batch release. Testing build for 4.8.1 will likely go out in a couple days.

If you want to continue before that with your testing you could use the latest nightly Docker build.

Can you please re-run your COPY FROM import and insert tests with this fix included and let us know if you still see any exceptions. If you do please share them with us including logs.

1 Like

@jayeff same problem.

Version:
5.0.0 -SNAPSHOT-ce8e3c7

PS: My posts that include external links to gist.github.com were automatically hidden, I believe. I’m getting this message when I attempt to post a gist as a link:

Sorry you cannot post a link to that host.

Sorry, this is some automatic flagging from discourse when new users post multiple links. I restored your posts.

Too bad :disappointed:

Can you remove…

      restart_policy:
        condition: on-failure

from your docker-compose file (on all three CrateDB nodes)?

With the automatic restart old logs are gone.

I believe by removing it this should allow you to inspect the logs after the node crashes with docker logs. Can you test this please?

@jayeff I updated my docker settings to allow for 12 max CPU instead of 6 and now I can successfully COPY FROM until the end.

Is this expected?

Yes.

@jayeff I changed CPUs back to 6, removed the restart_policy, ran the COPY FROM, same error happened.

Inspected the dead container. It doesn’t produce any logs about the crash.

It crashes with Code 137, though:

Exit Code 137: Indicates failure as container received SIGKILL (Manual intervention or 'oom-killer' [OUT-OF-MEMORY])

@jayeff sorry, I had also bumped the memory from 8g to 10g in the Docker settings for that test.

I just tested again. Changing only the memory from 8g to 10g is enough to make it work.

I’m not very familiar with Docker either. My apologies for all of this trouble and thanks for your help!

2 Likes

You are welcome and good that you could solve this :+1:

1 Like