Execution Queue Is Full

We are running a ceramic node connected to the Clay testnet and our own IPFS cluster inside of Kubernetes (k8s), and it is periodically dying and becoming unresponsive. It is restarting very often (13 times so far today), but when it goes completely non responsive the pattern looks like this in the logs:

node:1) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
(Use node --trace-warnings ... to show where the warning was created)
(node:1) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit

WARNING: execution queue is full, over 500 pending requests found
(Repeat every 0.1 sec or less)

Then:

WARNING: Anchor failed for commit bagcqcerarpk5bcpo7vgbdmlwiqcydgzjrgczfjm6t4sdfjpen4kmcvwwdrbq of stream k2t6wyfsu4pfwtn5uev4on3u9tk4p9b3uuhf416a2wwzvje37snvju8brkaic1: HTTP request to ‘https://cas-clay.3boxlabs.com/api/v0/requests/bagcqcerarpk5bcpo7vgbdmlwiqcydgzjrgczfjm6t4sdfjpen4kmcvwwdrbq’ failed with status ‘Bad Gateway’:

Then:

WARNING: Error loading stream kjzl6cwe1jw145s38h43vemjf49yl2soe4vgm2vc8v6rch3booyykul3wj3tvod at time undefined as part of a multiQuery request: Error: Timeout after 7000ms
(Repeat every 0.1 sec or less)

Our k8s healthcheck is not rebooting when it gets into this state, but our attached clients just get errors of the form “Cannot get stream”. Manually rebooting the node seems to restore it, but I did that this morning and had to do it again just now. I have not found anything in the logs that indicates errors prior to this pattern, but the logs are, how should I say, “voluminous”.

Is there anything I can look for or check? The only other logs I am getting are the startup logs, ending with: IMPORTANT: Ceramic API running on 0.0.0.0:7007

After restart just now, it did not error with MaxListenersExceededWarning, it just went directly to Error loading stream kjzl6cwe1jw145s38h43vemjf49yl2soe4vgm2vc8v6rch3booyykul3wj3tvod errors.

Kind of stumped here. Appreciate any help!

Hey @SnickerChar, thanks for your post. This is a problem we are aware of and actively investigating. Please stay tuned for updates :pray:t4:

Hey, I just got a new error message popping up:

ERROR: Cannot publish query message to pubsub because we have exceeded the maximum allowed rate. Cannot have more than 100 queued queries.

Our node is currently in a k8s crash-backoff loop and I can’t get it back to operation. We’re unfortunately having to table Ceramic as our storage layer for the time being.