Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchRepo runs GitHub API v3 and v4 concurrently #22

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

wowawiwa
Copy link
Contributor

@wowawiwa wowawiwa commented Oct 11, 2018

What this PR does:

  • In order to increase the rate limit, enables fetchRepos to run using both v3 and v4 of GitHub API concurrently.
  • Removes the spinner and use console.log instead (spinner doesn't work well with concurrent code)
  • Fixes/Improves v4 driver
    • Fixes typo pusher_at
    • Fixes response structure for owner field
    • Handles the case when there is no primary language
    • Returns expected error codes when the GraphQL endpoint returns an error

Differences v4 driver compared to v3:

  • There is no webflow user
  • In the repo query, the organization field is just an empty object instead of containing information. When the owner is not an organization, there is no organization field at all.

@wowawiwa wowawiwa changed the title Ghapiv4 2 run concurrently Run GitHub API v3 and v4 concurrently Oct 11, 2018
@brillout
Copy link

Awesome 😍

@wowawiwa wowawiwa changed the title Run GitHub API v3 and v4 concurrently fetchRepo runs GitHub API v3 and v4 concurrently Oct 11, 2018
@lourot lourot self-assigned this Oct 20, 2018
@@ -59,14 +55,12 @@ optional arguments:
const user = new DbFile(path.join(data.users, file));
if (!user.ghuser_deleted_because && !user.removed_from_github) {
users.push(user);
spinner.text = `${spinnerText} [${users.length}]`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually still quite useful to me, when I SSH to the machine running the fetchBot. This kind of loops can take several minutes and it's nice to see that the script isn't hanging.

But as you know, I intend to refactor all these big loops everywhere. So I suggest you don't worry about it and I'll push some refactoring in this branch on top of your work, and everything will get merged together at the end. Is it OK for you?


for (const repoFullName in repoPaths) {
const repo = new DbFile(repoPaths[repoFullName].repo);
full_names = [...referencedRepos];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself: I need to validate this kind of lines on the full DB, because we have more than 100k repos and this might be expensive.

full_names = [...referencedRepos];
await Promise.all([
loopDo(full_names, async (full_name) => { await fetchRepo(ghclV3, full_name, firsttime)}),
loopDo(full_names, async (full_name) => { await fetchRepo(ghclV4, full_name, firsttime)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll disable the v4-worker for now and as long as the front end isn't ready to deal with an empty repo.organization object. But it still has high-prio for me.

const repo = async (oraSpinner, errCodes, repoFullName, repoFetchedAtDate) => {
`;
// TODO: This sets "organization" field if appropriate, but as an empty object.
const repo = async (errCodes, repoFullName, repoFetchedAtDate) => {
// TODO: Use repoFetchedAtDate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's necessary (and actually the argument name isn't the best). If we don't do conditional requests, we'll consume the API credits 10x faster (maybe even 100x, I wouldn't be surprised).

The argument should be called ifModifiedSince at this level.

@lourot
Copy link
Member

lourot commented Oct 20, 2018

Nice @wowawiwa, thanks, we're getting close to using GraphQL in production :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants