Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a SQL (preferably a SQLAlchemy dialect) interface to query the REST API #31

Open
devraj opened this issue Mar 27, 2024 · 4 comments
Assignees

Comments

@devraj
Copy link
Member

devraj commented Mar 27, 2024

As our ambition around the library grows, the sync feature #9 looks to be a lot more useful than I previously imagined. One of my thoughts was to treat this as if they were two databases sources and use a layer like SQLAlchemy to keep them in sync.

Sync is a difficult problem (see also SQL Sync for Schema with SQLAlchemy
python-sync-db) at the best of times. My initial research lead me down the path of writing a SQLAlchemy dialect primarily thinking about if it's possible to wrap a REST service as a SQL source.

I found a number of articles and discussions which leads to believe this is a possible way forward:

Digging through the list of external dialects I found betodealmeida / gsheets-db-api which provides that we can converse with a REST API via a SQLAlchemy dialect. This project links to Shillelagh which is a library is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library)

Other resources:

  • Dolt version controlled MySQL database

The requirement is thus to research the above resources and outline the possibility of using a SQLALchemy dialect to write the sync module with the view of being able to use the REST client to communicate with the Gallagher proxy.

Why SQL?

One of the major questions around this is Why SQL?, we could head down the route of using an object database. The question to consider is if the end user / customer would benefit from the data being available in an object database?

With a foundation of writing to SQL backend we also stand the advantage of writing to / reading from corporate database backends.

@devraj devraj self-assigned this Mar 27, 2024
@devraj devraj added this to the beta-1 milestone Apr 14, 2024
@devraj
Copy link
Member Author

devraj commented Apr 15, 2024

From my initial reading, the way we should approach this is to write a customer adapter for shillelagh which also allows for a custom sqlalchemy dialect.

The references the following

Sharks have been around for a long time. They’re older than trees and the rings of Saturn, actually! The reason they haven’t changed that much in hundreds of millions of years is because they’re really good at what they do.

SQL has been around for some 50 years for the same reason: it’s really good at what it does.

as their philosophy for Why SQL.

devraj added a commit that referenced this issue Apr 18, 2024
adds shillelagh to attempt implementing a sql layer on top of the
library and see if it helps with #9

todo items for me are:
- see if pyndatic can help define sqlalchemy schema
- as per #31 i want to have a sqlclahemy dialect
- be able to sync two sources one being the rest endpoints using
sqlalchemy
@betodealmeida
Copy link

This is awesome! Thanks for the heads up, and let me know if you have any questions, I'm happy to help.

@devraj
Copy link
Member Author

devraj commented May 14, 2024

Quick note for anyone else working with shillelagh and attempting to debug their adapter, my general strategy for development is:

  • Start a virtualenv
  • Install shillelagh to use the interactive console
  • Install my adapter using pip in editable mode from the local folder e.g pip install -e ~/Work/Lib/gallagher
  • Use the console to test as I build the adapter

I figured it would be easier to this before I get into building a SQLAlchemy dialect #9

At first attempt it all looked well and when I ran:

🍀> SELECT * FROM "https://commandcentre-api-au.security.gallagher.cloud/api/cardholders";
Unsupported table: https://commandcentre-api-au.security.gallagher.cloud/api/cardholders

I got the unsupported table error and could not figure out what was wrong. For obvious reasons the console suppresses log messages, I decided to enable logging and set it to DEBUG, this was added to the top of the file containing my adapter.

import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s')

upon restarting the console, I saw the following, and found that my adapter wasn't registering properly:

🍀>                                                                                                                               
GoodBye!
(shillelagh) ➜  ~ shillelagh 
2024-05-15 07:51:36,627 ERROR test
2024-05-15 07:51:36,628 DEBUG No module named 'gallagher.ext.shillelagh.GallagherCommandCentreAPI'
2024-05-15 07:51:36,641 DEBUG Using selector: KqueueSelector

Upon inspecting my pyproject.toml I found that I had a syntax error where I was referring to the module as gallagher.ext.shillelagh.GallagherCommandCentreAPI instead of gallagher.ext.shillelagh:GallagherCommandCentreAPI, the fixed version is:

[tool.poetry.plugins."shillelagh.adapter"]
gacc = "gallagher.ext.shillelagh:GallagherCommandCentreAPI"

And given that it returns the required rows of data, the logs should make it easier to debug the adapter.

TypeError: missing a required argument: 'api_key'
(shillelagh) ➜  ~ shillelagh 
2024-05-15 07:55:03,649 ERROR test
2024-05-15 07:55:03,666 DEBUG Using selector: KqueueSelector
🍀> SELECT * FROM "https://commandcentre-api-au.security.gallagher.cloud/api/cardholders";
2024-05-15 07:55:09,149 DEBUG Instantiating adapter with deserialized arguments: ['https://commandcentre-api-au.security.gallagher.cloud/api/cardholders', 'api_key']
authorised    first_name      id  last_name
------------  ------------  ----  -----------
True          Dev              1  Mukherjee
(1 row in 0.05s)

2024-05-15 07:55:09,150 DEBUG Using selector: KqueueSelector

Make sure that this isn't enabled when the adapter ships, even better find a way to dynamically configure the logger

devraj added a commit that referenced this issue May 14, 2024
adds a few lines of logging to discover that the shillelagh adapter
wasn't register due to a syntax error in pyproject, this change verifies
that we can return rows of data back via the shillelagh adapter

note this commit does not query the api for results, this is to come
in subsequent commits, read the following comment
#31 (comment)

for how we got to this solution
Refs #31
@devraj devraj modified the milestones: beta-1, alpha-3 May 14, 2024
devraj added a commit that referenced this issue May 29, 2024
initially i was implementing a sql interface within the dto objects, as
the solution grew it made more sense to be able to consolidate this into
the shillelagh extension, ensuring that if the sql interface isn't in use
the normal dto isn't burdened by imports

i've kept some helper methods to concat dictionaries of annotations, this
should not affect the workings of the solution

refs #31
devraj added a commit that referenced this issue May 29, 2024
using the detail dto_retrieve model to parse the sql responses returned
to the shillelagh, things to check post this commit

- can we get the detail fields when the user requests a model via where clause
- is this the right thing to do (moving away from using the sql_model configuration)

removes various TODO markers as appropriate where the issues have been resolved

refs #31
@devraj
Copy link
Member Author

devraj commented May 30, 2024

utils.py in the dto package currently uses the py3.9 __annotation__ property to access the annotations:

    @classmethod
    def _accumulated_annotations(cls) -> dict:
        """Return a dictionary of all annotations

        This method is used to return a dictionary of all the
        annotations from itself and it's parent classes.

        It is intended for use by the shillelagh extension
        """
        annotations = cls.__annotations__.copy() # TODO: should we  make copies?
        for base in cls.__bases__:
            if issubclass(base, BaseModel):
                # Copy annotations from classes that are pydantic models
                # they are the only things that form part of the response
                annotations.update(base.__annotations__)
        return annotations.items()

This recommendation has changes in py3.10 see this article.

Looking at the inspect package get_annotations returns the annotations of the class itself, where as typing.get_type_hints returns typing definitions from itself and parent classes.

>>> import typing
>>> typing.get_type_honest(CardholderSummary)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'typing' has no attribute 'get_type_honest'. Did you mean: 'get_type_hints'?
>>> typing.get_type_hints(CardholderSummary)
{'href': <class 'pydantic_core._pydantic_core.Url'>, 'id': <class 'str'>, '_good_known_since': typing.Optional[datetime.datetime], 'first_name': <class 'str'>, 'last_name': <class 'str'>, 'short_name': typing.Optional[str], 'description': typing.Optional[str], 'authorised': <class 'bool'>}

We should refactor this method to use a py3.10+ solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants