Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systematically handling column names and indexes of transformed dataframes #1021

Open
jeromedockes opened this issue Jul 31, 2024 · 1 comment

Comments

@jeromedockes
Copy link
Member

when we transform a dataframe we want to make sure that in the output

  • the column names are always the same (and unique)
  • if it is a pandas dataframe, the index is preserved
  • possibly other checks performed by CheckInputDataframe

see for example this comment

I'm opening this now just so we don't forget about it

@TheooJ
Copy link
Contributor

TheooJ commented Aug 2, 2024

Agreed !

It would also be useful to check dataframe types between main and aux. For now, I believe only AggJoiner checks that both have the same type in X, self._aux_table = self._check_dataframes(X, self.aux_table), but we probably want this in the other joiners too.

We could use something like:

self._aux_check_input = CheckInputDataFrame()
self._aux_table = self._aux_check_input.fit_transform(self.aux_table)

self._main_check_input = CheckInputDataFrame()
main = self._main_check_input.fit_transform(main)

if self._main_check_input.module_name_ != self._aux_check_input.module_name_:
   ...

For now,

  • the Joiner uses CheckInputDataFrame for main and aux, but doesn't check the type.
  • the InterpolationJoiner doesn't use CheckInputDataFrame. Note that here, main might not be known at the time of fitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants