Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$in and $or too slow for large database (doesn´t use index) #5189

Open
zmara opened this issue Aug 21, 2024 · 3 comments
Open

$in and $or too slow for large database (doesn´t use index) #5189

zmara opened this issue Aug 21, 2024 · 3 comments

Comments

@zmara
Copy link

zmara commented Aug 21, 2024

My database has 4.7GB of data (3191269 docs)
Attribute _id has implicit index.

{
 "type": "special",
 "def": {
  "fields": [
   {
    "_id": "asc"
   }
  ]
 }
}

This selector works well:

{
   "selector": {
      "_id": {
         "$eq": "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"
      }
   }
}

This selector is pending and never finished (the same when I use $or):

{
   "selector": {
      "_id": {
         "$in": ["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"]
      }
   }
}

Why?

Steps to Reproduce

Use fauxton and mango query.

Expected Behaviour

It should work the same as $eq

Your Environment

{"couchdb":"Welcome","version":"3.3.2","git_sha":"11a234070","uuid":"ff2fae1b3089d86fd741e84a6e532eda","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

  • Browser name and version: Chrome latest version
  • Operating system and version: Windows 11
@nickva
Copy link
Contributor

nickva commented Aug 22, 2024

Thanks for your report @zmara. Yeah it's a known deficiency that$or doesn't use an index. It would have to effectively use use the two or indexes and merge them appropriately, and that part isn't implemented currently.

@zmara
Copy link
Author

zmara commented Aug 22, 2024

Thanks for explanation. What about that $in operator? I am using it on one field, so what is the issue there?

@rnewson
Copy link
Member

rnewson commented Aug 23, 2024

I recommend building a map-reduce view directly (queried via _view not _find). This is an actual index (on the emitted key), and any query you can do to _view will be done efficiently (startkey/endkey or key or keys) .

All that _find does is look at all the indexes and tries to find the most efficient one for the selector. If it doesn't find one it just reads the whole database and filters out non-matches.

In your case you could do the much more efficient /dbname/_all_docs?keys=["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"] instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants