HEP Tables
hep-tables
aims to join different parts of the data query pipe-line together:
- Data fetched from
ServiceX
- That data processed by
coffea
and similar tools - Fits into the same ecosystem that tools like
pyhf
andcabinetry
inhabit.
Further, it does this with a fairly straightforward array-like syntax:
- Initial dataset to histogram are specified in a coherent and unified way.
- Syntax is inspired by
pandas
andnumpy
array syntax
Features
- Basic array processing features
- Lambda variable capture to allow for multi-object relationships
- Basic histogramming
- Uses ServiceX and
awkward
as back ends
Road map
It is best to check the repositories mentioned above for the most recent status. But some future projects
- Integration of
coffea
as a backend processor to default to multi-CPU/processor work. - Ability to run
numba
(ornumba
-like) code - Ability to run C++ code
- Running in a facility with the user having a very simple light-weight python front-end package.
Status
At the moment this is a prototype package. It’s development is being driven by the requirements of an analysis in ATLAS.
- Some initial documentation exists in the form of a tour to show off what it can do.
- Three packages make up this project currently:
- dataframe_expressions - User facing API, converts array expressions into AST’s. Other packages then interpret this in order to execute or act on the user’s desire. Includes support for leaf referencing, slicing, lambda functions, and
numpy
integration. - hep_tables - Interprets a
dataframe
expression and converts it tofunc_adl
to be executed on aServiceXDatasetSource
. It can only interpret as much as whatfunc_adl
(or ServiceX can do: return data from the service. - hl_tables - Plug-in architecture allows multiple back-ends for execution. Currently supports
hep_tables
to run data fetch and basic queries and also an immediateawkward
array processor. The array processor can generate histograms among other things.
- dataframe_expressions - User facing API, converts array expressions into AST’s. Other packages then interpret this in order to execute or act on the user’s desire. Includes support for leaf referencing, slicing, lambda functions, and
Team
Publications
- hep_tables: Heterogeneous Array Programming for HEP, Watts, Gordon, arXiv 2103.11525 (21 Mar 2021).