feat: named axis for `ak.Array` #3238

pfackeldey · 2024-09-12T17:52:50Z

Proposal for named axis

This PR addresses #2596.

References for other named axis implementations:

Motivation

As argumented at PyHEP.dev 2023 and by the Harvard NLP group in their "Tensor Considered Harmful" write-up, named axis can be a powerful tool to make code more readable and less error-prone.

Design

`ak.Array` with named axis

Named axis are implemented through a mapping from named axis to positional axis.
named axis are hashables (mostly strings), except for integers as they are reserved for positional axis.

import typing

AxisName: typing.Alias = typing.Hashable

By default a ak.Array uses positional axis, but named axis can be added to the array in the following ways:

import awkward as ak

# tuple:
#   positional axis: (0, 1)
#   named axis: ("events", "jets")
array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("events", "jets"))

# dict:
#   positional axis: (0, 1)
#   named axis: ("events", "jets")
array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis={"events": 0, "jets": 1})

# the dict interface allows to name single axis, also negative positional axis
array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis={"jets": -1})

# attach axis naming to an existing array
array = ak.Array([[1, 2], [3], [], [4, 5, 6]])
array = ak.with_named_axis(array, ("events", "jets"))
# or
array = ak.with_named_axis(array, {"events": 0, "jets": 1})

The named_axis argument of the constructor of an ak.Array is a tuple of AxisName, or a dict of AxisName to integers.
It is stored in the .attrs attribute of the array with a reserved key "__named_axis__" of type dict[AxisName, int].
The two types of axis can be accessed through the named_axis and positional_axis property (always represented as a tuple):

import awkward as ak

array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("events", "jets"))
array.named_axis
>>> ("events", "jets")
array.positional_axis
>>> (0, 1)

array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis={"jets": -1})
array.named_axis
>>> (None, "jets")

Named axis in high-level functions

Named axis can be used by all high-level functions, e.g. ak.sum, ak.max, etc.:

import awkward as ak

array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("events", "jets"))

# sum over the "jets" axis
sum_jets = ak.sum(array, axis="jets")
>>> ak.Array([3, 3, 0, 15])
sum_jets.named_axis
>>> ("events",)

# the `keepdims=True` argument keeps the named axis
sum_jets = ak.sum(array, axis="jets", keepdims=True)
>>> ak.Array([[3], [3], [], [15]])
sum_jets.named_axis
>>> ("events", "jets")

There are different scenarios how named axis are propagated to the resulting array:

Nothing changes: The named axis are kept in the resulting array, e.g. ak.sum(array, axis="jets", keepdims=True) or array ** 2.
Named axis are removed: The named axis are removed from the resulting array, e.g. ak.sum(array, axis="jets").
Named axis are unified from binary operations of two ak.Array:

import awkward as ak

array1 = ak.Array([[1, 2], [3, 4]], named_axis=("In", None))
array2 = ak.Array([[5, 6], [7, 8]], named_axis=(None, "Out"))

(array1 + array2).named_axis
>>> ("In", "Out")

Here, checks for matching named axis are possible, the rules are:

ak.Array([1], named_axis=("foo",)) + ak.Array([1], named_axis=("foo",))    # OK
ak.Array([1], named_axis=("foo",)) + ak.Array([1], named_axis=(None,))     # OK
ak.Array([1], named_axis=("foo",)) + ak.Array([1], named_axis=("bar",))    # raise Exception

Named axis are collapsed into a new one:

import awkward as ak

array = ak.ones((1, 2, 3), named_axis=("x", "y", "z"))

# does this even make sense/exist?
ak.flatten(array, axis=("y", "z")).named_axis
>>> ("x", None)

ak.flatten(array, axis=None).named_axis
>>> (None,)

no use-case exists currently / not possible: Named axis permuted: The named axis of the resulting array are permuted, e.g.:

import awkward as ak

array = ak.Array([[1], [2]], named_axis=("x", "y"))
array.named_axis
>>> ("x", "y")

array.T.named_axis
>>> ("y", "x")`

no use-case exists currently / not possible: Named axis are contracted away:

import awkward as ak

array1 = ak.Array([[1, 2], [3, 4]], named_axis=("In", "Foo"))
array2 = ak.Array([[5, 6], [7, 8]], named_axis=("Foo", "Out"))

(array1 @ array2).named_axis
>>> ("In", "Out")

Named axis in indexing

In addition, named axis can be used to select data:

import awkward as ak

array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("events", "jets"))

# select the first event
first_event = array[{"events": 0, "jets": slice(None)}]
>>> ak.Array([1, 2], named_axis=("jets",))

# select the first jet of each event
first_jet = array[{"events": slice(None), "jets": slice(0, 1)}]
>>> ak.Array([[1], [3], [], [4]], named_axis=("events", "jets"))

For synthatic sugar ak.slice is added:

import awkward as ak

array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("events", "jets"))

# select the first jet of each event
first_jet = array[{"events": ak.slice[...], "jets": ak.slice[0:1]}]
>>> ak.Array([[1], [3], [], [4]], named_axis=("events", "jets"))

# or mixed with positional axis
first_jet = array[..., {"jets": ak.slice[0:1]}]
>>> ak.Array([[1], [3], [], [4]], named_axis=("events", "jets"))

This PR has to touch a lot of code and needs to add custom named axis propagation to each high-level operation. Thus, this PR is currently in draft mode.

Looking forward to ideas, thoughts, feedback on this effort!

pfackeldey · 2024-09-13T19:09:37Z

jpivarski · 2024-09-13T19:13:10Z

And all the data types that can be passed into square brackets with __getitem__.

codecov · 2024-09-13T20:47:05Z

Codecov Report

Attention: Patch coverage is 91.18497% with 61 lines in your changes missing coverage. Please review.

Project coverage is 82.26%. Comparing base (b749e49) to head (3bb8efa).
Report is 162 commits behind head on main.

Files with missing lines	Patch %	Lines
src/awkward/_namedaxis.py	85.40%	20 Missing ⚠️
src/awkward/_operators.py	76.66%	7 Missing ⚠️
src/awkward/operations/ak_pad_none.py	75.00%	2 Missing ⚠️
src/awkward/operations/ak_with_named_axis.py	92.00%	2 Missing ⚠️
src/awkward/operations/ak_without_named_axis.py	86.66%	2 Missing ⚠️
src/awkward/_layout.py	92.30%	1 Missing ⚠️
src/awkward/_typing.py	50.00%	1 Missing ⚠️
src/awkward/contents/content.py	97.77%	1 Missing ⚠️
src/awkward/operations/ak_all.py	92.85%	1 Missing ⚠️
src/awkward/operations/ak_any.py	92.85%	1 Missing ⚠️
... and 23 more

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/_nplikes/array_like.py	`97.14% <ø> (+27.75%)`	⬆️
src/awkward/_nplikes/typetracer.py	`75.05% <ø> (+0.19%)`	⬆️
src/awkward/_regularize.py	`87.87% <100.00%> (+0.37%)`	⬆️
src/awkward/contents/numpyarray.py	`90.50% <100.00%> (-1.01%)`	⬇️
src/awkward/highlevel.py	`77.16% <100.00%> (+0.49%)`	⬆️
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_almost_equal.py	`93.75% <100.00%> (+0.56%)`	⬆️
src/awkward/operations/ak_argcombinations.py	`88.00% <ø> (ø)`
src/awkward/operations/ak_array_equal.py	`100.00% <ø> (ø)`
src/awkward/operations/ak_cartesian.py	`91.89% <100.00%> (+0.89%)`	⬆️
... and 53 more

... and 85 files with indirect coverage changes

…xing named axis propagation

…x named axis propagation in indexing for type tracers

…ak.mean; remove inplace addition of arrays from test

… Ellipsis failed

…tible with branched structures;fix regularize_axis in all highlevel ops

…ranch_depth using inner_shape property

pfackeldey and others added 4 commits September 12, 2024 13:24

start implementing named axis for awkward array

eb5c188

style: pre-commit fixes

3978f43

add support for named axis for first batch of highlevel functions

556362e

formatting

16a119b

pfackeldey changed the title ~~Feat: named axis for ak.Array~~ feat: named axis for ak.Array Sep 12, 2024

next batch of high-level functions

c96f61e

pfackeldey and others added 3 commits September 13, 2024 16:38

fix type hints & safer control flow when checking for named axis

7f7db85

style: pre-commit fixes

572051e

(hopefully) fix old (<3.10) python type annotation syntax

e57e2c3

pfackeldey and others added 16 commits September 13, 2024 16:47

(hopefully) fix old (<3.10) python type annotation syntax

e338332

next batch of highlevel functions

a07adfa

next batch of highlevel functions

4c51e6b

style: pre-commit fixes

d05517a

update named axis implementation to not use tuples at all; start inde…

a705981

…xing named axis propagation

add named axis propagation for binary ops, some highlevel ops, and fi…

1ded71b

…x named axis propagation in indexing for type tracers

Merge remote-tracking branch 'upstream/main' into feat/named_axes

7158d07

fix keepdims in ak.covar & ak.corr; properly propagate named axis in …

00669a3

…ak.mean; remove inplace addition of arrays from test

add ak.std & ak.var; fix bug in indexing where == comparisons against…

bb97999

… Ellipsis failed

add ak.(arg)combinations and ak.(arg)cartesian; make named axis compa…

8720a05

…tible with branched structures;fix regularize_axis in all highlevel ops

avoid touching shape too much for purelist_depth, minmax_depth, and b…

1af4376

…ranch_depth using inner_shape property

ak.without_named_axis: allow ak.Records

01b459c

ak.with_named_axis: add check to validate the given named axis mapping

c970d06

fix doc strings and remove obsolete functions in _namedaxis.py module

f5f9495

update Slicer doc string

51bc6d6

docs: add documentation page for named axes

3bb8efa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: named axis for `ak.Array` #3238

feat: named axis for `ak.Array` #3238

pfackeldey commented Sep 12, 2024 •

edited

Loading

pfackeldey commented Sep 13, 2024 •

edited

Loading

jpivarski commented Sep 13, 2024

codecov bot commented Sep 13, 2024 •

edited

Loading

feat: named axis for ak.Array #3238

Are you sure you want to change the base?

feat: named axis for ak.Array #3238

Conversation

pfackeldey commented Sep 12, 2024 • edited Loading

Proposal for named axis

Motivation

Design

ak.Array with named axis

Named axis in high-level functions

Named axis in indexing

pfackeldey commented Sep 13, 2024 • edited Loading

Progress

general

slicing

Unary and binary operations

high-level functions

Independent of named axis: improvements / bugs found that are fixed by this PR aswell:

jpivarski commented Sep 13, 2024

codecov bot commented Sep 13, 2024 • edited Loading

Codecov Report

feat: named axis for `ak.Array` #3238

feat: named axis for `ak.Array` #3238

pfackeldey commented Sep 12, 2024 •

edited

Loading

`ak.Array` with named axis

pfackeldey commented Sep 13, 2024 •

edited

Loading

codecov bot commented Sep 13, 2024 •

edited

Loading