Hi,
I have installed a “cluster” with docker-compose on one node – this works well.
But then I wanted to connect one more node (on another physical machine!) to this cluster – but with that I failed.
completed handshake with [{node02}{5-0wKhgIR-ebx2t2I5oOQA}{kBJLysioRqS3IDkso3tdrQ}{172.30.0.3}{172.30.0.3:4300}{http_address=172.30.0.3:4200}] but followup connection failed
It seems that the first nodes are the sending back the IP addresses of the docker internal networks – and those are not accessible from the second docker-host.
I know this is not a “real-world” scenario, but for testing purposes it would be great if this works somehow.
Below are my docker-compose.yml files …
Maybe someone can give me a hint how I can get this up and running.
My first hosts IP is 192.168.1.3.
First docker host:
version: '3.8'
services:
node01:
image: crate:5.1.1
ports:
- "4201:4200"
- "4301:4300"
- "5434:5432"
volumes:
- /u04/data/crate/01:/data
command: ["crate",
"-Ccluster.name=crate-docker-cluster",
"-Cdiscovery.seed_hosts=node02",
"-Cnode.name=node01",
"-Cnode.data=true",
"-Cnetwork.host=_site_",
"-Cgateway.expected_data_nodes=2",
"-Cgateway.recover_after_data_nodes=2",
"-Ccluster.initial_master_nodes=node01,node02"]
deploy:
replicas: 1
restart_policy:
condition: on-failure
environment:
- CRATE_HEAP_SIZE=4g
node02:
image: crate:5.1.1
ports:
- "4202:4200"
- "4302:4300"
volumes:
- /u04/data/crate/02:/data
command: ["crate",
"-Ccluster.name=crate-docker-cluster",
"-Cdiscovery.seed_hosts=node01",
"-Cnode.name=node02",
"-Cnode.data=true",
"-Cnetwork.host=_site_",
"-Cgateway.expected_data_nodes=2",
"-Cgateway.recover_after_data_nodes=2",
"-Ccluster.initial_master_nodes=node01,node02"]
deploy:
replicas: 1
restart_policy:
condition: on-failure
environment:
- CRATE_HEAP_SIZE=4g
Second docker host
version: '3.3'
services:
node03:
image: crate:5.1.1
ports:
- "4201:4200"
- "4301:4300"
- "5434:5432"
volumes:
- /u04/data/crate/01:/data
command: ["crate",
"-Ccluster.name=crate-docker-cluster",
"-Cdiscovery.seed_hosts=192.168.1.3:4301,192.168.1.3:4302",
"-Cnode.name=node03",
"-Cnode.data=true",
"-Cnetwork.host=_site_",
"-Cgateway.expected_data_nodes=2",
"-Cgateway.recover_after_data_nodes=2",
"-Ccluster.initial_master_nodes=node01,node02"]
environment:
- CRATE_HEAP_SIZE=4g
Errors when starting the node on the second host
node03_1 | [2022-11-22T12:20:36,134][INFO ][o.e.n.Node ] [node03] initialized
node03_1 | [2022-11-22T12:20:36,134][INFO ][o.e.n.Node ] [node03] starting ...
node03_1 | [2022-11-22T12:20:36,181][INFO ][psql ] [node03] publish_address {172.18.0.2:5432}, bound_addresses {172.18.0.2:5432}
node03_1 | [2022-11-22T12:20:36,187][INFO ][o.e.h.n.Netty4HttpServerTransport] [node03] publish_address {172.18.0.2:4200}, bound_addresses {172.18.0.2:4200}
node03_1 | [2022-11-22T12:20:36,197][INFO ][o.e.t.TransportService ] [node03] publish_address {172.18.0.2:4300}, bound_addresses {172.18.0.2:4300}
node03_1 | [2022-11-22T12:20:36,359][INFO ][o.e.b.BootstrapChecks ] [node03] bound or publishing to a non-loopback address, enforcing bootstrap checks
node03_1 | [2022-11-22T12:20:36,369][INFO ][o.e.c.c.ClusterBootstrapService] [node03] skipping cluster bootstrapping as local node does not match bootstrap requirements: [node01, node02]
node03_1 | [2022-11-22T12:20:46,371][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node03] master not discovered yet, this node has not previously joined a bootstrapped (v5+) cluster, and this node must discover master-eligible nodes [node01, node02] to bootstrap a cluster: have discovered [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}]; discovery will continue using [192.168.1.3:4301, 192.168.1.3:4302] from hosts providers and [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
node03_1 | [2022-11-22T12:20:56,372][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node03] master not discovered yet, this node has not previously joined a bootstrapped (v5+) cluster, and this node must discover master-eligible nodes [node01, node02] to bootstrap a cluster: have discovered [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}]; discovery will continue using [192.168.1.3:4301, 192.168.1.3:4302] from hosts providers and [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
node03_1 | [2022-11-22T12:21:06,374][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node03] master not discovered yet, this node has not previously joined a bootstrapped (v5+) cluster, and this node must discover master-eligible nodes [node01, node02] to bootstrap a cluster: have discovered [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}]; discovery will continue using [192.168.1.3:4301, 192.168.1.3:4302] from hosts providers and [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
node03_1 | [2022-11-22T12:21:06,375][WARN ][o.e.n.Node ] [node03] timed out while waiting for initial discovery state - timeout: 30s
node03_1 | [2022-11-22T12:21:06,376][INFO ][o.e.n.Node ] [node03] started
node03_1 | [2022-11-22T12:21:06,458][WARN ][o.e.d.HandshakingTransportAddressConnector] [node03] [connectToRemoteMasterNode[192.168.1.3:4302]] completed handshake with [{node02}{5-0wKhgIR-ebx2t2I5oOQA}{kBJLysioRqS3IDkso3tdrQ}{172.30.0.3}{172.30.0.3:4300}{http_address=172.30.0.3:4200}] but followup connection failed
node03_1 | org.elasticsearch.transport.ConnectTransportException: [node02][172.30.0.3:4300] connect_timeout[30s]
node03_1 | at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:967) ~[crate-server.jar:?]
node03_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
node03_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
node03_1 | at java.lang.Thread.run(Thread.java:833) ~[?:?]
node03_1 | [2022-11-22T12:21:06,459][WARN ][o.e.d.HandshakingTransportAddressConnector] [node03] [connectToRemoteMasterNode[192.168.1.3:4301]] completed handshake with [{node01}{Xu-cZFv5SAyzyCy-jK0I6A}{rAwDTwC8TQmlID7fr0qt6Q}{172.30.0.2}{172.30.0.2:4300}{http_address=172.30.0.2:4200}] but followup connection failed
node03_1 | org.elasticsearch.transport.ConnectTransportException: [node01][172.30.0.2:4300] connect_timeout[30s]
node03_1 | at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:967) ~[crate-server.jar:?]
node03_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
node03_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
node03_1 | at java.lang.Thread.run(Thread.java:833) ~[?:?]
node03_1 | [2022-11-22T12:21:16,375][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node03] master not discovered yet, this node has not previously joined a bootstrapped (v5+) cluster, and this node must discover master-eligible nodes [node01, node02] to bootstrap a cluster: have discovered [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}]; discovery will continue using [192.168.1.3:4301, 192.168.1.3:4302] from hosts providers and [{node03}{qHBmOTkZQMuYNSP1Fn1dIw}{yrwcZWrLTu2TjGUxp5f2lg}{172.18.0.2}{172.18.0.2:4300}{http_address=172.18.0.2:4200}] from last-known cluster state; node term 0, last-accepted version 0 in term 0