Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add occurrence count in the json output for search engine #1076

Merged
merged 8 commits into from
Jul 16, 2024

Conversation

panglesd
Copy link
Collaborator

Since #972 there is a compile-index command which outputs a json index containing all searchable entries. The intended purpose of this json index is to pass it to a search engine.

Since #976 there is also a count-occurrences command which creates a table containing the number of occurrences for each identifier.

This PR makes the compile-index be able to use the information from count-occurrence. It adds a --occurrences argument to get the path to the table.

The first commit isolate the occurrence code in its own library.
The second commit implements the --occurrences argument of ``compile-index`.

@panglesd
Copy link
Collaborator Author

Note that there will be a conflict with #1067 which we will need to resolve (in whichever PR gets merged last).

panglesd added a commit to panglesd/odoc that referenced this pull request Jan 31, 2024
Copy link
Contributor

@gpetiot gpetiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about the displaying of the result, otherwise looks good to me!

("direct", `Float (float_of_int occ.Odoc_occurrences.Table.direct));
("indirect", `Float (float_of_int occ.indirect));
]
| None -> `Null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking but it would look more consistent to me if we had 0 for both counters here (see the output in the tests).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "no 'occurrences' field at all"?
I'm not very keen on choosing a wrong value when it was not possible to compute one because the table was not given...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least if we don't put this field at all it reduces the diff of this PR!

Copy link
Collaborator Author

@panglesd panglesd Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 6205f97, if the occurrence table is not given, the field is not included.
However, if the occurrence table is given, but does not contain the occurrence, a use of 0 is counted.

Indeed, that removed that the diff for search without occurrences.

Thanks for the comment, I think that's better now!

panglesd added a commit to panglesd/odoc that referenced this pull request Feb 19, 2024
doc/driver.mld Outdated Show resolved Hide resolved
@EmileTrotignon
Copy link
Collaborator

What's the status of this ?

@panglesd
Copy link
Collaborator Author

panglesd commented Jul 9, 2024

It was ready for review. Now there are many conflicting files.

panglesd added a commit to panglesd/odoc that referenced this pull request Jul 12, 2024
panglesd added a commit to panglesd/odoc that referenced this pull request Jul 12, 2024
@panglesd
Copy link
Collaborator Author

panglesd commented Jul 12, 2024

It is now rebased and ready for review.

Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Finally a way to consume the occurrences data.

Comment on lines 214 to 216
(* We don't want to include the [sub] field of occurrence tables. We use
a "polymorphic record" to avoid defining a type, but still get named
fields! *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is original but I find it confusing and heavy in syntax. What about defining a record with just that instead ?

This record could even replace the item type in Table as the sub field is never used outside of this module.
Here's an other idea: Julow@9cafc01

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the sub field could be useful to some application of the occurrence table.

However, there are no such application. I removed sub from the exposed type in the last commit, and use that type instead of this weird tuple. If an application requires exposing the sub field, we'll do it at this moment.

panglesd and others added 8 commits July 15, 2024 18:21
This adds the occurrence count to each element of the json index

Signed-off-by: Paul-Elliot <[email protected]>
Co-authored-by: Guillaume "Liam" Petiot <[email protected]>
Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, let's merge!

@panglesd panglesd merged commit 58a4cb3 into ocaml:master Jul 16, 2024
13 checks passed
panglesd added a commit that referenced this pull request Jul 16, 2024
Signed-off-by: Paul-Elliot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants