Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow oplog tailing on ATLAS (reactivated issue) #11578

Open
derwaldgeist opened this issue Aug 10, 2021 · 4 comments
Open

Slow oplog tailing on ATLAS (reactivated issue) #11578

derwaldgeist opened this issue Aug 10, 2021 · 4 comments

Comments

@derwaldgeist
Copy link

@derwaldgeist derwaldgeist commented Aug 10, 2021

Hi, I'd like to issue #10808, because a couple of other folks (like me) are facing similar issues and I just became aware of that thread. Please also have a look at the bottom of this thread in the forum:

https://forums.meteor.com/t/horrible-degradation-of-performance-using-galaxy-and-mongodb-atlas/56435

I getting these weird "Query Targeting: Scanned Objects / Returned has gone above 1000" warning emails from ATLAS nearly every day. Already contacted Mongo support, but they couldn't help me so far (also did not find time to dig too deep into it yet, because the app itself was running fine despite of these warnings; it's just pretty annoying to receive the warning mails, sometimes 10 or more each day).

What actually alerted me in particular when I was reading though #10808 (and caused me to re-open it) is that it also mentions two other weird behaviors that I have been facing with Meteor on ATLAS for quite some time:

  1. Someone mentioned that sometimes Meteor method callbacks are not invoked, if a subscription for the same collection was running. This might be related this thread:
    https://forums.meteor.com/t/meteor-login-method-never-invokes-callback-blocks-all-future-rpc-calls/36305
    I was able to resolve this problem by setting up a new ATLAS cluster and moving all my data to it, but never got to the actual root-cause of this.

  2. Another commenter mentions that sometimes update DDP messages are not being sent to the client. This is a behavior I noticed in my Unity frontend that uses a low-level DDP protocol. Especially on Android, it happens quite often that update is not received after a login message is being sent. Unfortunately, this makes the whole login process pretty unstable. In my case, I had to implement weird timeout-and-retry workarounds to actually log the user in reliably.

I'm not sure if actually all these things are related, but I think overall this whole issue is worth looking into.

@derwaldgeist

This comment was marked as off-topic.

@StorytellerCZ StorytellerCZ changed the title Sow oplog tailing on ATLAS (reactivated issue) Slow oplog tailing on ATLAS (reactivated issue) Aug 11, 2021
@renanccastro
Copy link
Contributor

@renanccastro renanccastro commented Aug 12, 2021

Hi, @derwaldgeist. There is, unfortunately, a lot of causes for this, one can be the scarcity of read/write ticket, which would cause a lot of issues in your DB. This happens when you have a lot of observers.

But anyway, talking about this issue in specific, it seems very hard to reproduce. We would need a more isolated reproduction, without external services, to try to fix it, also because I had issues with MongoDB atlas suddenly dropping the performance.

Also, redis-oplog is a strong workaround for this issue. With it, you won't tail the oplog anymore, and in theory, you would solve all issues reported.

We also have the plan to port our default strategy from oplog to change streams, which would lead us to more performance, and more reliability when scaling.

If you could try redis-oplog and see if it solves the reported issues it would be good, so we know we are on the right path.

@HoptimizeME
Copy link

@HoptimizeME HoptimizeME commented Aug 12, 2021

Hi, we're also experiencing issues but was wondering which versions of mongo,/ mongo driver/ meteor are affected by this.

Thanks 😊

@derwaldgeist
Copy link
Author

@derwaldgeist derwaldgeist commented Oct 19, 2021

@renanccastro Thanks for sharing your insights. I can totally understand this is hard to fix. I tried to get to the root cause of this myself, but to no avail. The Mongo support team wasn't very helpful either. Haven't done the move to Redis yet, as it requires another server. Will most likely consider the switch in the future, once traffic on our platform increases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants