Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3721: Simplify ModelContainer #8831

Draft
wants to merge 89 commits into
base: main
Choose a base branch
from
Draft

Conversation

emolter
Copy link
Collaborator

@emolter emolter commented Sep 26, 2024

Resolves JP-3721
Resolves JP-2038

Closes #8738
Closes #5950

With the addition of ModelLibrary, ModelContainer can be substantially simplified. ModelLibrary certainly renders the return_open and save_open options obsolete, as well as the get_sections method, since if memory usage is a concern then ModelLibrary should be used instead. Some additional discussion about how the ModelContainer used to perform way to many tasks can be found on this innerspace page.

During discussions related to JP-3715, it has become increasingly clear that the strictness of ModelLibrary and the additional borrow/shelve code required to access models is not necessary/desired for many use-cases. For example, the calwebb_spec3 pipeline currently makes extensive use of ModelContainer but does not have the same memory issues as calwebb_image3 does. Learning to use ModelLibrary may also be a nuisance for users manipulating relatively small datasets.

Therefore, ModelContainer should not be removed, but instead become a lightweight, easy-to-use class for loading a list of models and association metadata. This PR proposed a container satisfying the following constraints:

  • Loads association metadata dictionary
  • Can be manipulated as if it were a list
  • Loads all datamodels into memory (i.e., use ModelLibrary instead if we care about memory usage)
  • Is the default data structure loaded by datamodels.open() for asn-type data but is no longer itself a datamodel
  • Can be used as a context manager, i.e., with datamodels.open(asn.json) as container has the expected behavior

This PR should also fix a long-standing issue, previously closed as won't-fix, that filenames inside associations containing directories (e.g., "test/file.fits") don't work in the level3 pipelines (but do work in the level2 pipelines).

Linked PR in stdatamodels: spacetelescope/stdatamodels#330
Linked PR in stpipe: spacetelescope/stpipe#190

Tasks

  • request a review from someone specific, to avoid making the maintainers review every PR
  • add a build milestone, i.e. Build 11.3 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly
news fragment change types...
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change
  • changes/<PR#>.docs.rst
  • changes/<PR#>.stpipe.rst
  • changes/<PR#>.datamodels.rst
  • changes/<PR#>.scripts.rst
  • changes/<PR#>.fits_generator.rst
  • changes/<PR#>.set_telescope_pointing.rst
  • changes/<PR#>.pipeline.rst

stage 1

  • changes/<PR#>.group_scale.rst
  • changes/<PR#>.dq_init.rst
  • changes/<PR#>.emicorr.rst
  • changes/<PR#>.saturation.rst
  • changes/<PR#>.ipc.rst
  • changes/<PR#>.firstframe.rst
  • changes/<PR#>.lastframe.rst
  • changes/<PR#>.reset.rst
  • changes/<PR#>.superbias.rst
  • changes/<PR#>.refpix.rst
  • changes/<PR#>.linearity.rst
  • changes/<PR#>.rscd.rst
  • changes/<PR#>.persistence.rst
  • changes/<PR#>.dark_current.rst
  • changes/<PR#>.charge_migration.rst
  • changes/<PR#>.jump.rst
  • changes/<PR#>.clean_flicker_noise.rst
  • changes/<PR#>.ramp_fitting.rst
  • changes/<PR#>.gain_scale.rst

stage 2

  • changes/<PR#>.assign_wcs.rst
  • changes/<PR#>.badpix_selfcal.rst
  • changes/<PR#>.msaflagopen.rst
  • changes/<PR#>.nsclean.rst
  • changes/<PR#>.imprint.rst
  • changes/<PR#>.background.rst
  • changes/<PR#>.extract_2d.rst
  • changes/<PR#>.master_background.rst
  • changes/<PR#>.wavecorr.rst
  • changes/<PR#>.srctype.rst
  • changes/<PR#>.straylight.rst
  • changes/<PR#>.wfss_contam.rst
  • changes/<PR#>.flatfield.rst
  • changes/<PR#>.fringe.rst
  • changes/<PR#>.pathloss.rst
  • changes/<PR#>.barshadow.rst
  • changes/<PR#>.photom.rst
  • changes/<PR#>.pixel_replace.rst
  • changes/<PR#>.resample_spec.rst
  • changes/<PR#>.residual_fringe.rst
  • changes/<PR#>.cube_build.rst
  • changes/<PR#>.extract_1d.rst
  • changes/<PR#>.resample.rst

stage 3

  • changes/<PR#>.assign_mtwcs.rst
  • changes/<PR#>.mrs_imatch.rst
  • changes/<PR#>.tweakreg.rst
  • changes/<PR#>.skymatch.rst
  • changes/<PR#>.exp_to_source.rst
  • changes/<PR#>.outlier_detection.rst
  • changes/<PR#>.tso_photometry.rst
  • changes/<PR#>.stack_refs.rst
  • changes/<PR#>.align_refs.rst
  • changes/<PR#>.klip.rst
  • changes/<PR#>.spectral_leak.rst
  • changes/<PR#>.source_catalog.rst
  • changes/<PR#>.combine_1d.rst
  • changes/<PR#>.ami.rst

other

  • changes/<PR#>.wfs_combine.rst
  • changes/<PR#>.white_light.rst
  • changes/<PR#>.cube_skymatch.rst
  • changes/<PR#>.engdb_tools.rst
  • changes/<PR#>.guider_cds.rst

braingram and others added 30 commits July 29, 2024 16:07
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 2.94118% with 66 lines in your changes missing coverage. Please review.

Project coverage is 12.46%. Comparing base (e860360) to head (a893d5e).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
jwst/datamodels/container.py 2.94% 33 Missing ⚠️
jwst/master_background/master_background_step.py 0.00% 7 Missing ⚠️
jwst/pipeline/calwebb_spec3.py 0.00% 5 Missing ⚠️
jwst/pipeline/calwebb_coron3.py 0.00% 4 Missing ⚠️
jwst/pipeline/calwebb_tso3.py 0.00% 4 Missing ⚠️
jwst/badpix_selfcal/badpix_selfcal_step.py 25.00% 3 Missing ⚠️
jwst/extract_1d/extract_1d_step.py 0.00% 3 Missing ⚠️
jwst/extract_1d/extract.py 0.00% 2 Missing ⚠️
jwst/pixel_replace/pixel_replace_step.py 0.00% 2 Missing ⚠️
jwst/cube_build/data_types.py 0.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #8831       +/-   ##
===========================================
- Coverage   61.86%   12.46%   -49.40%     
===========================================
  Files         377      377               
  Lines       38911    38814       -97     
===========================================
- Hits        24071     4840    -19231     
- Misses      14840    33974    +19134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@emolter
Copy link
Collaborator Author

emolter commented Sep 26, 2024

Initial round of regression tests still showing several failures to track down

@emolter
Copy link
Collaborator Author

emolter commented Sep 27, 2024

The most recent changes fixed most of those failures. There is still a lot to do, including docs updates and the like, but I think it's time to get some feedback on the overall idea behind the new ModelContainer, so I'm requesting a review from some folks now even though this is staying in draft mode.

Copy link
Contributor

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had minor comments so far. In the end documentation on what the new ModelContainer class does and supports with reference to the changes relative to the older version should be provided.

@@ -13,12 +13,12 @@
"members": [
{
"exptype": "SCIENCE",
"expname": "test.fits",
"expname": "test.fits"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one of the linting tools requiring this now. In them olden days, this sort of thing was sometimes recommended for eliminating errors when someone added new items to a list or dictionary (yes, it would then lead to a syntax error). I'm curious if this is now discouraged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my json linter, at least, yes. I don't think these are being removed in Python

else:
val = None

setattr(m.meta, attr, val)
self._models.append(m)

except IOError:
self.close()
raise

# Pull the whole association table into meta.asn_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment now appears to be incorrect after the change.

@@ -2859,6 +2856,7 @@ def do_extract1d(
if hasattr(meta_source, "int_times"):
output_model.int_times = meta_source.int_times.copy()

print(meta_source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this will be removed in the end.

self.output_file = output_file

print("self.output_file = ", self.output_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed?

@@ -82,6 +86,7 @@ def process(self, input, selfcal_list=None, bkg_list=None):
i.e., true self-calibration.
"""
input_sci, selfcal_list, bkg_list = _parse_inputs(input, selfcal_list, bkg_list)
print(input_sci, selfcal_list, bkg_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed?

asn_data = json.load(f)
assert asn_data["products"][0]["members"][0]["expname"].split("/") == [DATADIR, "test.fits"]

# cehck that this can be opened
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@emolter
Copy link
Collaborator Author

emolter commented Oct 1, 2024

I only had minor comments so far. In the end documentation on what the new ModelContainer class does and supports with reference to the changes relative to the older version should be provided.

Thanks for the review Perry. I agree we should add some documentation. I'm still wondering what people think of the new-look container overall - is it complementary to ModelLibrary, does it make sense as a data structure, do any methods still need to be removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify ModelContainer Inconsistent assumed file locations in association files
3 participants