Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add/commit: use hardlink to transfer files to the cache when cache.type is hardlink and on relink mode #10557

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Sep 16, 2024

When the cache_type is hardlink, we can simply hardlink the file from the workspace to the cache, and we don't need to relink back from the cache to the workspace.

Name Time (in s)
test_add_hardlink-add[hardlink-relink] 17.5393 (1.0)
test_add_hardlink-add[main] 32.9904 (1.88)

Note that this is a bit unsafe, because a failure in the middle of dvc add may end up in a situation where the file has been hardlinked to the cache, but we have not protected them yet to make it read-only. Modification of these files in the workspace might corrupt cache (since they are hardlinked). But this is what we were doing before #10513, and there is always a chance of corruption for hardlink anyway.

Also note that before #10513, on relink mode, we were hardlinking on all link types, but in this PR, I have limited it to just hardlink cache types.

@skshetry skshetry added the performance improvement over resource / time consuming tasks label Sep 16, 2024
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.83%. Comparing base (f56343d) to head (9ef9024).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10557      +/-   ##
==========================================
+ Coverage   90.61%   90.83%   +0.22%     
==========================================
  Files         504      504              
  Lines       39582    39584       +2     
  Branches     5716     5716              
==========================================
+ Hits        35868    35958      +90     
+ Misses       3032     2965      -67     
+ Partials      682      661      -21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry skshetry merged commit b8266ed into main Sep 17, 2024
27 checks passed
@skshetry skshetry deleted the hardlink-relink branch September 17, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance improvement over resource / time consuming tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants