Make the crawler concurrent #91

rajivharlalka · 2024-09-19T13:51:47Z

Is your feature request related to a problem? Please describe.
Currently the crawler sequentially fetches each paper details, parses it and downloads the paper. This can be made lot faster using go-routines.

shikharish · 2024-09-19T18:16:00Z

This was implemented then removed because it lead to the library website dropping requests.

rajivharlalka · 2024-09-19T18:49:55Z

Did the implementation have an upperbound on the number of parallel requests being made? AFAIR no. IMO using waitgroups to limit the number of concurrent workers to 2-3 should improve the perform significantly.

shikharish · 2024-09-19T18:56:46Z

I dont remember exactly.
BTW we won't need to implement go-routines ourselves as colly has an option to enable async request and also limit them. Can test that.

proffapt · 2024-09-26T13:47:50Z

@rajivharlalka or @harshkhandeparkar please update the state of this issue to be reflected on the kanban.

harshkhandeparkar · 2024-09-27T06:08:40Z

@shikharish what should be the status of this?

shikharish · 2024-09-27T06:16:07Z

It is not needed as of now. We only need to run the crawler once or twice a semester so it's very low priority.

harshkhandeparkar · 2024-09-27T06:18:52Z

Is it hard to do?

shikharish · 2024-09-27T06:19:06Z

Not at all

harshkhandeparkar · 2024-09-27T06:20:47Z

Then just finish it off maybe?

harshkhandeparkar · 2024-09-27T06:21:04Z

No point in keeping hanging issues if they can be solved in a few minutes.

proffapt · 2024-09-27T13:56:36Z

@shikharish updates?

shikharish · 2024-09-27T19:09:15Z

Did some testing and turns out even using 2 go routines leads to dropping of 1-2 requests. Further increasing it to 6 goroutines makes it 3-4 requests.

Should we skip this one for now?

proffapt · 2024-09-28T00:14:40Z

Try to implement a retry function. Also, how many requests are you able to make concurrently. Even if it is more than one, then that's a win situation.

proffapt · 2024-09-28T11:25:39Z

Halting this, till we have time to have a look at it comfortably.

shikharish self-assigned this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the crawler concurrent #91

Make the crawler concurrent #91

rajivharlalka commented Sep 19, 2024

shikharish commented Sep 19, 2024

rajivharlalka commented Sep 19, 2024

shikharish commented Sep 19, 2024 •

edited

Loading

proffapt commented Sep 26, 2024

harshkhandeparkar commented Sep 27, 2024

shikharish commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

shikharish commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

proffapt commented Sep 27, 2024

shikharish commented Sep 27, 2024

proffapt commented Sep 28, 2024

proffapt commented Sep 28, 2024

Make the crawler concurrent #91

Make the crawler concurrent #91

Comments

rajivharlalka commented Sep 19, 2024

shikharish commented Sep 19, 2024

rajivharlalka commented Sep 19, 2024

shikharish commented Sep 19, 2024 • edited Loading

proffapt commented Sep 26, 2024

harshkhandeparkar commented Sep 27, 2024

shikharish commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

shikharish commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

harshkhandeparkar commented Sep 27, 2024

proffapt commented Sep 27, 2024

shikharish commented Sep 27, 2024

proffapt commented Sep 28, 2024

proffapt commented Sep 28, 2024

shikharish commented Sep 19, 2024 •

edited

Loading