You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ingest consumes my converted input data, applies some transformations and populates the application database. The state of the database is persisted via import-db, which is the data dependency for running the application.
Right now, dvc repro throws the following error with this config:
Running stage 'ingest':
> dvc import-db --table ingested_data --conn pgsql -o data/database
ERROR: output 'data/database' is already specified in stage: 'ingest'.
Use `dvc remove ingest` to stop tracking the overlapping output.
ERROR: failed to reproduce 'ingest': failed to run: dvc import-db --table ingested_data --conn pgsql -o data/database, exited with 255
It would be great to have a flag telling dvc import-db that it is part of a pipeline such that overlapping outputs are not an issue.
The text was updated successfully, but these errors were encountered:
Make sense to expand pipeline stages to be DB import (or other imports?), wdyt @skshetry ?
For now I would recommend to run the query directly via Python script. You can take a look into DbDependency implementation and get some SQL wrapper code from it (it should not be very complicated I think).
I'm missing the possibility to run
import-db
as part of my pipeline. Consider the following scenario:ingest
consumes my converted input data, applies some transformations and populates the application database. The state of the database is persisted viaimport-db
, which is the data dependency for running the application.Right now,
dvc repro
throws the following error with this config:It would be great to have a flag telling
dvc import-db
that it is part of a pipeline such that overlapping outputs are not an issue.The text was updated successfully, but these errors were encountered: