Erros working with istio and Cluster

When we set a cratedb cluster in k8s with X nodes and istio working in the namespace there is no way to connect to the service of crate defined in the cluster. When we try to access to the service we always get this error:

upstream connect error or disconnect/reset before headers. reset reason: connection failure

We have set the network.publish_host variable to site but if we set the variable to local we can access to the ui but the cluster is not set, only one node is in the cluster. After investigating we see that the problem is that istio works making a binding to localhost of the side container but when we set the network.publish_host variable to site the 4200 service in the container is binding to the container ip. Anyone know a solution or workaround for this problem

hi @fjglira

I am a bit confused how you setup CrateDB.
Could you share the stateful set? Have you setup an internal service for Port 4200?
The docker containers default configuration binds to both local and site:
network.host: _local_,_site_

Sure:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cratedb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cratedb 
  serviceName: crate-svc
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app: cratedb 
      annotations:
        sidecar.istio.io/inject: "false"  
    spec:
      containers:
      - name: cratedb
        image: "{{.Values.global.Versions.crate}}"
        env:
        - name: CRATE_HEAP_SIZE
          value: 8g
        - name: CRATE_NODES
          value: "3"
        - name: CRATE_REPO_FS
          value: "1"
        - name: NETWORK_HOST
          value: "{{.Values.global.NETWORK_HOST}}"
        - name: JAVA_OPTS
          value: "-XX:ActiveProcessorCount=8 -Dlog4j2.formatMsgNoLookups=true"
        - name: CRATE_MAX_SHARDS
          value: "{{.Values.global.CRATE_MAX_SHARDS}}"
        volumeMounts:
          - mountPath: /peers
            name: peers
        resources:
          requests:
            memory: 4Gi
          limits:
            memory: 16Gi
        livenessProbe:
          httpGet:
            path: /
            port: 4200
          initialDelaySeconds: 120
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /
            port: 4200
          initialDelaySeconds: 120
          periodSeconds: 20
          failureThreshold: 15 
        ports:
        - name: crate
          containerPort: 4200
          protocol: TCP
        - name: elastic
          containerPort: 4300
          protocol: TCP
        - name: sql
          containerPort: 5432
          protocol: TCP
      initContainers:
      - name: peer-finder
        image: "{{.Values.global.Versions.peerfinder}}"
        args:
        - -service
        - crate-svc
        - -workload
        - stateful
        - -on-start
        - cat /peers/peers.env
        volumeMounts:
        - mountPath: /peers
          name: peers
        - mountPath: /peers-template
          name: peers-template
      volumes:
      - emptyDir: {}
        name: peers
      - configMap:
          defaultMode: 256
          items:
          - key: peers.txt
            path: peers.txt
          name: crate
          optional: false
        name: peers-template
      {{- if eq .Values.global.VOLUMES_PERSISTENCE true}}
      - name: cratedb-backup
        persistentVolumeClaim:
          claimName: cratedb-backup
      {{- end}}
  # These are converted to volume claims by the controller
  # and mounted at the paths mentioned above.
  {{- if eq .Values.global.VOLUMES_PERSISTENCE true}}
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            {{ if eq (.Values.global.HA | toString) "No"}}
            storage: 5Gi
            {{else}}
            storage: 10Gi
            {{end}}
  {{- end}}

We use a custom image and the variable .Values.global.NETWORK_HOST if we set it to : site or as you said local,site the cluster is not created (with or without istio), each node is isolated and I see this errors:

[2022-03-04T08:42:55,839][WARN ][o.e.t.n.Netty4Transport  ] [cratedb-2.crate-svc.ccoc.svc.cluster.local] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.42.4.108:38204, remoteAddress=cratedb-1.crate-svc.ccoc.svc.cluster.local/10.42.2.39:4300}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: received HTTP response on transport port, ensure that transport port (not HTTP port) of a remote node is specified in the configuration
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:404) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:371) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:197) [netty-handler-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.65.Final-linux-x86_64.jar:4.1.65.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.65.Final.jar:4.1.65.Final]
	at java.lang.Thread.run(Thread.java:831) [?:?]
Caused by: java.io.StreamCorruptedException: received HTTP response on transport port, ensure that transport port (not HTTP port) of a remote node is specified in the configuration
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:784) ~[crate-server.jar:?]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:774) ~[crate-server.jar:?]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:45) ~[crate-server.jar:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[netty-codec-4.1.65.Final.jar:4.1.65.Final]

I see the configuration on startup and it seems to be Ok:

Found peers: cratedb-2.crate-svc.ccoc.svc.cluster.local,cratedb-1.crate-svc.ccoc.svc.cluster.local
CRATE_NODES_LOCAL:3
NETWORK_HOST:_site_,_local_
CRATE_NODES_MINIMUM:2
NODES_ALL:cratedb-2.crate-svc.ccoc.svc.cluster.local,cratedb-1.crate-svc.ccoc.svc.cluster.local
CRATE_MAX_SHARDS_LOCAL:2000
CLUSTER_CONFIG:-Ccluster.name=ccoc -Cdiscovery.seed_hosts=cratedb-2.crate-svc.ccoc.svc.cluster.local,cratedb-1.crate-svc.ccoc.svc.cluster.local -Ccluster.initial_master_nodes=cratedb-2.crate-svc.ccoc.svc.cluster.local,cratedb-1.crate-svc.ccoc.svc.cluster.local -Cnode.master=true -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2 -Ccluster.max_shards_per_node=2000
Calling to execute-runtime-startup-tasks.sh....
HOSTNAME:cratedb-2.crate-svc.ccoc.svc.cluster.local

The positive point is that configuring local,site is accesible crate whit istio but is not creating the cluster. This is the startup logs:

Call to entrypoint...
[2022-03-04T08:42:56,563][INFO ][o.e.e.NodeEnvironment    ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [8.4gb], net total_space [48.9gb], types [ext4]
[2022-03-04T08:42:56,565][INFO ][o.e.e.NodeEnvironment    ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] heap size [8gb], compressed ordinary object pointers [true]
[2022-03-04T08:42:56,744][INFO ][o.e.n.Node               ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] node name [cratedb-1.crate-svc.ccoc.svc.cluster.local], node ID [JKhWMycPS5OjFChzNUKzcg], cluster name [ccoc]
[2022-03-04T08:42:56,761][INFO ][o.e.n.Node               ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] version[4.6.7], pid[1], build[ccf852a/2022-01-19T13:35:05Z], OS[Linux/5.4.92-flatcar/amd64], JVM[Eclipse Foundation/OpenJDK 64-Bit Server VM/16.0.2/16.0.2+7]
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2022-03-04T08:42:57,996][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] no modules loaded
[2022-03-04T08:42:57,997][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [crate-azure-discovery]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [crate-functions]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [crate-jmx-monitoring]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [crate-lang-js]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [es-analysis-common]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [es-analysis-phonetic]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [es-repository-azure]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [es-repository-hdfs]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [io.crate.plugin.BlobPlugin]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [io.crate.plugin.SQLPlugin]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [io.crate.plugin.SrvPlugin]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [io.crate.udc.plugin.UDCPlugin]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin]
[2022-03-04T08:42:57,998][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [org.elasticsearch.plugin.repository.url.URLRepositoryPlugin]
[2022-03-04T08:42:57,999][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [org.elasticsearch.repositories.s3.S3RepositoryPlugin]
[2022-03-04T08:42:57,999][INFO ][o.e.p.PluginsService     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
[2022-03-04T08:42:59,351][INFO ][o.e.d.DiscoveryModule    ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] using discovery type [zen] and seed hosts providers [settings]
[2022-03-04T08:42:59,984][INFO ][psql                     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] PSQL SSL support is disabled.
[2022-03-04T08:43:00,086][INFO ][i.c.p.PipelineRegistry   ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] HTTP SSL support is disabled.
[2022-03-04T08:43:00,303][INFO ][o.e.n.Node               ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] initialized
[2022-03-04T08:43:00,303][INFO ][o.e.n.Node               ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] starting ...
[2022-03-04T08:43:00,428][INFO ][psql                     ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] publish_address {10.42.2.39:5432}, bound_addresses {127.0.0.1:5432}, {10.42.2.39:5432}
[2022-03-04T08:43:00,439][INFO ][o.e.h.n.Netty4HttpServerTransport] [cratedb-1.crate-svc.ccoc.svc.cluster.local] publish_address {10.42.2.39:4200}, bound_addresses {127.0.0.1:4200}, {10.42.2.39:4200}
[2022-03-04T08:43:00,451][INFO ][o.e.t.TransportService   ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] publish_address {10.42.2.39:4300}, bound_addresses {127.0.0.1:4300}, {10.42.2.39:4300}
[2022-03-04T08:43:00,605][INFO ][o.e.b.BootstrapChecks    ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2022-03-04T08:43:00,607][INFO ][o.e.c.c.Coordinator      ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] cluster UUID [56h9HExZQbyqUGCcSt5xKw]
[2022-03-04T08:43:00,683][INFO ][o.e.c.s.MasterService    ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] elected-as-master ([1] nodes joined)[{cratedb-1.crate-svc.ccoc.svc.cluster.local}{JKhWMycPS5OjFChzNUKzcg}{JjGASc3kSGuCWOw06cFEYw}{10.42.2.39}{10.42.2.39:4300}{http_address=10.42.2.39:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 12, version: 473692, reason: master node changed {previous [], current [{cratedb-1.crate-svc.ccoc.svc.cluster.local}{JKhWMycPS5OjFChzNUKzcg}{JjGASc3kSGuCWOw06cFEYw}{10.42.2.39}{10.42.2.39:4300}{http_address=10.42.2.39:4200}]}
[2022-03-04T08:43:00,681][WARN ][o.e.t.n.Netty4Transport  ] [cratedb-1.crate-svc.ccoc.svc.cluster.local] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.42.2.39:39504, remoteAddress=cratedb-2.crate-svc.ccoc.svc.cluster.local/10.42.4.108:4300}], closing connection

Thanks a lot, after analyzing and testing today putting both values and fixing a definition of the service in the helm yaml is working correctly with Istio. We have the definition of the name of the port in the yaml with a wrong protocol

1 Like