Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: cache results between TypeScript projects #182

Merged
merged 3 commits into from
Oct 18, 2022
Merged

Conversation

olafurpg
Copy link
Member

Towards #175

Previously, scip-typescript didn't cache anything at all between TypeScript projects. This commit implements an optimization so that we now cache the results of loading source files and parsing options. Benchmarks against the sourcegraph/sourcegraph repo indicate this optimization consistently speeds up the index command in all three multi-project repositories that I tested it with.

  • sourcegraph/sourcegraph: ~30% from ~100s to ~70s
  • nextautjs/next-auth: ~40% from 6.5s to 3.9
  • xtermjs/xterm.js: ~45% from 7.3s to 4.1s

For every repo, I additionally validated that the resulting index.scip has identical checksum before and after applying this optimization. Given these promising results, this new optimization is enabled by default, but can be disabled with the option --no-global-cache.

Test plan

Manually tested by running scip-typescript index tsconfig.all.json in the sourcegraph/sourcegraph repository. To benchmark the difference for this PR:

  • Checkout the code
  • Run yarn tsc -b
  • Go to the directory of your project
  • Run node PATH_TO_SCIP_TYPESCRIPT/dist/src/main.js
  • Copy the "optimized" index.scip with cp index.scip index-withcache.scip
  • Run node PATH_TO_SCIP_TYPESCRIPT/dist/src/main.js --no-global-caches
  • Validate the checksum is identical from the optimized output shasum -a 256 *.scip

Towards #175

Previously, scip-typescript didn't cache anything at all between TypeScript projects. This commit implements an optimization so that we now cache the results of loading source files and parsing options. Benchmarks against the sourcegraph/sourcegraph repo indicate this optimization consistently speeds up the `index` command in all three multi-project repositories that I tested it with.

- sourcegraph/sourcegraph: ~30% from ~100s to ~70s
- nextautjs/next-auth: ~40% from 6.5s to 3.9
- xtermjs/xterm.js: ~45% from 7.3s to 4.1s

For every repo, I additionally validated that the resulting index.scip has identical checksum before and after applying this optimization. Given these promising results, this new optimization is enabled by default, but can be disabled with the option `--no-global-cache`.

*Test plan*

Manually tested by running `scip-typescript index tsconfig.all.json` in the sourcegraph/sourcegraph repository. To benchmark the difference for this PR:

- Checkout the code
- Run `yarn tsc -b`
- Go to the directory of your project
- Run `node PATH_TO_SCIP_TYPESCRIPT/dist/src/main.js`
- Copy the "optimized" index.scip with `cp index.scip index-withcache.scip`
- Run `node PATH_TO_SCIP_TYPESCRIPT/dist/src/main.js --no-global-caches`
- Validate the checksum is identical from the optimized output `shasum -a 256 *.scip`
Comment on lines 25 to 30
if (!hostCopy.getParsedCommandLine) {
return undefined
}
if (cache.parsedCommandLines.has(fileName)) {
return cache.parsedCommandLines.get(fileName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand all the cases here; is it possible that the cache has a hit for this even though hostCopy.getParsedCommandLine is missing? (when is getParsedCommandLine missing?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding your comment, but we guard against hostCopy.getParsedCommandLine at the top, so even if the cache has a hit we return undefined when hostCopy.getParsedCommandLine is undefined.

Comment on lines 38 to 46
const fromCache = cache.sources.get(fileName)
if (fromCache !== undefined) {
return fromCache
}
const result = hostCopy.getSourceFile(fileName, languageVersion)
if (result !== undefined) {
cache.sources.set(fileName, result)
}
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Maybe add a comment here or near the cache definition which says "It is safe use the file name as a key here because we assume that a source file only ever belongs to a single project, and is only ever type-checked with a single language version".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I added an additional check to additionally only cache when languageVersion is unchanged (excluding a setExternalModuleIndicator field that is a function that is never the same. I also added a comment explaining in more details. I also discovered that getSourceFile has two optional parameters that we're now properly forwarding.

@varungandhi-src
Copy link
Contributor

varungandhi-src commented Oct 17, 2022

Very nice! How did you recognize that this approach of caching the SourceFile values would help speed up type-checking? (Nvm, saw that you wrote this in the issue.)

@@ -24,7 +24,6 @@ function checkIndexParser(

// defaults
checkIndexParser([], {
progressBar: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this get removed? Shouldn't there be an extra globalCaches: true here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention it in the description but this PR also disables the fancy progress bar by default. I didn't measure any performance difference with it enabled/disabled, but I found it distracting while testing projects. scip-typescript runs >99% of the time in CI where you should disable it by default anyways and I'm guessing most people are not doing it.

Copy link
Member Author

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! 🙏🏻

@@ -24,7 +24,6 @@ function checkIndexParser(

// defaults
checkIndexParser([], {
progressBar: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention it in the description but this PR also disables the fancy progress bar by default. I didn't measure any performance difference with it enabled/disabled, but I found it distracting while testing projects. scip-typescript runs >99% of the time in CI where you should disable it by default anyways and I'm guessing most people are not doing it.

Comment on lines 25 to 30
if (!hostCopy.getParsedCommandLine) {
return undefined
}
if (cache.parsedCommandLines.has(fileName)) {
return cache.parsedCommandLines.get(fileName)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding your comment, but we guard against hostCopy.getParsedCommandLine at the top, so even if the cache has a hit we return undefined when hostCopy.getParsedCommandLine is undefined.

Comment on lines 38 to 46
const fromCache = cache.sources.get(fileName)
if (fromCache !== undefined) {
return fromCache
}
const result = hostCopy.getSourceFile(fileName, languageVersion)
if (result !== undefined) {
cache.sources.set(fileName, result)
}
return result
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I added an additional check to additionally only cache when languageVersion is unchanged (excluding a setExternalModuleIndicator field that is a function that is never the same. I also added a comment explaining in more details. I also discovered that getSourceFile has two optional parameters that we're now properly forwarding.

@olafurpg olafurpg merged commit 21f18f1 into main Oct 18, 2022
@olafurpg olafurpg deleted the olafurpg/perf branch October 18, 2022 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants