Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Performance optimizations #67

Open
jpmckinney opened this issue May 5, 2021 · 0 comments
Open

Performance optimizations #67

jpmckinney opened this issue May 5, 2021 · 0 comments

Comments

@jpmckinney
Copy link
Member

jpmckinney commented May 5, 2021

I/O options

  • Write the LZ4 files alongside the directories. If an LZ4 file exists, skip writing.
  • Calculating the checksum and bytes during tarfile creation might be faster.
  • Writing the tarfile to boto3's socket will be faster, if possible.
  • A single PUT, which can be used for files up to 5GB, will be better I/O but maybe worse network than a multipart upload, since uploads can be parallelized.
  • I don't know if boto3 calculates ContentMD5. MD5 is slow (though there is a SIMD version). If we can skip ContentMD5, we can instead check integrity using the ETag.

Note that if we need to use multipart uploads, we need to consider incomplete uploads.

Network options

To measure performance, we can report the running time and the bytes read, written, transferred (and cached).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant