Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit events from the Contents Service #954

Merged
merged 4 commits into from
Aug 31, 2022

Conversation

Zsailer
Copy link
Member

@Zsailer Zsailer commented Aug 29, 2022

Uses the event logger in #862 to emit events from the Contents Manager.

Borrows the contents schema from #364

@Zsailer
Copy link
Member Author

Zsailer commented Aug 29, 2022

Pinging @dlqqq for feedback for your use-case in the File ID service proposed in #921.

@Zsailer Zsailer added this to the 2.0 milestone Aug 29, 2022
@Zsailer
Copy link
Member Author

Zsailer commented Aug 29, 2022

@dlqqq

Here's an example snippet of an extension "listening" to events emitted by the contents manager in this PR:

class MyExtension(ExtensionApp):

    ... 

    def initialize_settings(self):
        event_logger = self.serverapp.event_logger

        async def my_listener(logger: EventLogger, schema_id: str, data: dict) -> None:
            print(f"THIS WAS SEEN!: {data}")

        event_logger.add_listener(
            schema_id="https://events.jupyter.org/jupyter_server/contents_service/v1", listener=my_listener)

        ...

Let me know what you think!

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2022

Codecov Report

Merging #954 (f0ba8db) into main (63bb2a9) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
+ Coverage   72.07%   72.20%   +0.13%     
==========================================
  Files          64       64              
  Lines        8031     8063      +32     
  Branches     1340     1342       +2     
==========================================
+ Hits         5788     5822      +34     
+ Misses       1837     1836       -1     
+ Partials      406      405       -1     
Impacted Files Coverage Δ
jupyter_server/__init__.py 70.58% <100.00%> (+6.30%) ⬆️
jupyter_server/serverapp.py 65.76% <100.00%> (+0.15%) ⬆️
jupyter_server/services/contents/filemanager.py 71.62% <100.00%> (+0.22%) ⬆️
jupyter_server/services/contents/manager.py 83.52% <100.00%> (+0.80%) ⬆️
jupyter_server/services/kernels/handlers.py 59.07% <0.00%> (+0.42%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dlqqq
Copy link
Contributor

dlqqq commented Aug 29, 2022

@Zsailer Wow! You beat me to it, I was just playing around with my own custom implementation. This looks great.

@@ -395,6 +395,7 @@ def get(self, path, content=True, type=None, format=None):
if type == "directory":
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type")
model = self._file_model(path, content=content, format=format)
self.emit(data={"action": "get", "path": path})
Copy link
Contributor

@dlqqq dlqqq Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague performance concern about emitting an event on every get and save. These are by far the most common ContentsManager actions called by clients. In my opinion, we should only emit events that have demonstrated a developer need. For my File ID manager, I do not need these events. If there are no known uses for these events, I suggest we remove these events entirely rather than attempt to anticipate future uses at the cost of performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we should only emit events that have demonstrated a developer need

I think this is too stringent of a condition.

Events certainly have use-cases outside of just developers. Operators/admins look to use this event system to audit user activity—in fact, this was the original demand driving the jupyter events/telemetry efforts mentioned in this Jupyter Telemetry enhancement proposal (jupyter/enhancement-proposals#41).

If this work is causing a noticeable performance degradation, we should address it rather than limit the usage of the event system for operators.

Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit method immediately returns: https://github.com/jupyter/jupyter_events/blob/50746633a2adc7e41e2e0e1b0631db10ffb165db/jupyter_events/logger.py#L337-L342

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.

Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit method immediately returns:

I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.

Of course, this is out of scope for the PR. I'll make an issue of this in Jupyter Events for further discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.

I think the main point is that Jupyter Server should reach a future state where it provides high volumes of fine-grained, structured event data from most areas across Jupyter Server, allowing operators to get detailed information about what's happening in a running server. In many deployment scenarios, it's required that operators keep a detailed log of everything happening in the application.

If performance is the concern, the answer isn't to avoid adding events; it's to improve performance. I think we can do it! 😎

I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.

Yeah, that's right. I think we should update jupyter_events to check if the specific event is being watched by any listeners.

This is a little trickier (not impossible) to do with handlers, since we don't keep a mapping of handlers to their specific events (consequence of using Python's logging libraries for the main EventLogger). All handlers listen to all events, unless someone adds a logging.Filter object to look for specific events. The challenge is that we can't inspect the filter objects easily. I have some ideas on how we can solve this issue in jupyter_events, but this shouldn't block the PR here.

@dlqqq
Copy link
Contributor

dlqqq commented Aug 29, 2022

@Zsailer Completed my review. 👀

@Zsailer
Copy link
Member Author

Zsailer commented Aug 30, 2022

Thank you for the review @dlqqq! I appreciate it!

It's probably worth doing some benchmarking tests to see how much event emission affects the server's performance. I'll try to find time to build these tests later this week.

Copy link
Contributor

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Great work! Not sure if you wanted to tackle performance before/after merging this. LMK your thoughts.

@Zsailer
Copy link
Member Author

Zsailer commented Aug 30, 2022

Looks good so far. Great work! Not sure if you wanted to tackle performance before/after merging this. LMK your thoughts.

Thanks @dlqqq! I think all of the performance issues can/should be solved in the jupyter_events library itself. I think we can merge this PR here separately from those changes and just bump the version of jupyter_events here when it's ready.

@Zsailer Zsailer marked this pull request as ready for review August 31, 2022 20:21
@Zsailer Zsailer merged commit 6dc7d53 into jupyter-server:main Aug 31, 2022
@Zsailer Zsailer deleted the contents-service-events branch January 16, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants