Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize a2o utils #655

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

peichins
Copy link

a2o_utils had some values hardcoded to only work with the A2O and not other instances of the Bioacoustic Workbench (BAW) instances, such as Ecosounds.

A2O utils that can apply to other BAW instances have been generalized to allow this, and been moved to a new baw_utils module. A2O-specific functionality has stayed in a2o_utils. No changes to the agile_monitoring notebook are needed, since, if no baw_domain is supplied to the BootstrapState constructor, it defaults to the a2o baw_domain.

Also modifies the utility function that parses the audio_recording_id out of a filename to work with filetypes besides flac.

Copy link

google-cla bot commented May 15, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@sdenton4 sdenton4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I branched into a new change in order to handle notebook compatibility and some invisible bookkeeping stuff; hope that's ok. There's still a relevant question about the preferred domain, though.

# Extract the recording UID. Example:
# 'site_0277/20210428T100000+1000_Five-Rivers-Dry-A_909057.flac' -> 909057
# 'site_0277/20210428T100000+1000_Five-Rivers-Dry-A_909057.wav' -> 909057
pattern = re.compile(r'.*_(\d+)\.[^\.]+$')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's upgrade this to a file-level compiled pattern; it will then compile once on import, saving a few cycles every time we construct a url. Something like:

FILE_ID_TO_UID_PATTERN = re.compile(r'.*_(\d+).[^\.]+$')

yield ex
finally:
session.close()


def get_a2o_embeddings_config() -> config_dict.ConfigDict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is thin enough that we might as well move this last function into baw_utils as well.



def make_baw_audio_url_from_file_id(
file_id: str, offset_s: float, window_size_s: float, baw_domain: str = "data.acousticsobervatory.org"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo here, which threw me for a loop: data.acousticsobervatory.org

Is there a reason to prefer the data subdomain vs the api subdomain we were using before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, my mistake. I will fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants