-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
39k Pfizer dataset (#188) #189
Conversation
Change summary:
|
@skearnes @connorcoley @qai222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All reactions pass validations and spot checks show no obvious problems.
Thanks @bdeadman! |
Change summary:
|
1 similar comment
Change summary:
|
@bdeadman I just realized that the dataset name and description are empty; can you submit a PR to update them? |
39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.
Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings.
data and generator notebook.zip