Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV output issues #1374

Closed
bwbroersma opened this issue Aug 8, 2019 · 7 comments · Fixed by #2825
Closed

CSV output issues #1374

bwbroersma opened this issue Aug 8, 2019 · 7 comments · Fixed by #2825
Labels

Comments

@bwbroersma
Copy link

bwbroersma commented Aug 8, 2019

SQL output (and input) is wrong in multiple ways.

Main issue on the output side:

  • the header is constructed by row_to_json->json_object_keys->string_agg without a " added or doubled (as escape), see the code and [1]
    asCsvHeaderF =
    "(SELECT coalesce(string_agg(a.k, ','), '')" <>
    " FROM (" <>
    " SELECT json_object_keys(r)::TEXT as k" <>
    " FROM ( " <>
    " SELECT row_to_json(hh) as r from " <> sourceCTEName <> " as hh limit 1" <>
    " ) s" <>
    " ) a" <>
    ")"
  • the body is constructed by the hacky substring(2,composite_type::text,length-2), but the composite output != CSV output, since \ is escaped to \\ see the code and [2]
    asCsvBodyF = "coalesce(string_agg(substring(_postgrest_t::text, 2, length(_postgrest_t::text) - 2), '\n'), '')"

Minor issues:

Possible (body) solution start:

SELECT (
  SELECT array_to_string(array_agg('"' || REPLACE(val,'"','""') || '"'), ',')
  	FROM unnest(arr) val
  )
  FROM (
    SELECT ARRAY[.. columns ..]
      FROM ..
  ) alias(arr);

Probably faster then using json functions, but NULL output issues & change of composite to array will need a code change

Another try with row_to_json:

SELECT (
  SELECT string_agg(
    CASE
      WHEN json_typeof(val.value) = 'null' THEN ''
      ELSE '"' || REPLACE(val.value#>>'{}','"','""') || '"'
    END,
    ',')
    FROM
      json_each(row_to_json(row)) val
  )
  FROM (
    SELECT (.. composite ..)
      FROM ...
    ) alias(row);

This will change the fact that non-strings (booleans and numbers) will be quoted, which is not ideal for type detection, but can be fixed with an extra WHEN (if the delimiter/separator is not in the ::text value of course). Also this does not yet fix the resource embedding.

[1]: See the PostgreSQL Identifiers syntax

Quoted identifiers can contain any character, except the character with code zero.

[2]: See the code record_out code or PostgreSQL Docs Composite Type Input and Output Syntax

@steve-chavez
Copy link
Member

steve-chavez commented Aug 8, 2019

Definitely a lot of shortcomings with CSV right now. Ideally we would solve all of these with pg COPY in a more elegant(it already support things like custom delimiters, null strings, etc) and performant way, but we're missing COPY support in our lib nikita-volkov/hasql#1.

@bwbroersma
Copy link
Author

Ah, I was wrongfully ignoring COPY .. TO .. because I thought it had to be a file output, but STDOUT can be used too of course. So indeed COPY .. TO STDOUT would be best.

The stalling of basic PostgreSQL features in hasql (namely notifications nikita-volkov/hasql#43 for a DDL notify and nikita-volkov/hasql#1 COPY) are holding progress in PostgREST. Same goes for the lack of OpenAPI 3.0 that is an upstream problem GetShopTV/swagger2#105 (comment) (although there is a v3 version, but it looks unmaintained). These issues / PRs are open for years, I'm afraid we have to do some work upstream to fix these issues.

@steve-chavez
Copy link
Member

Regarding OpenAPI, we have the alternative of generating the spec with SQL.

I made this proposal here #790 (comment), and it was added in #1317 (root-spec config still not documented)

Then we could test the spec with this pg extension https://github.com/gavinwahl/postgres-json-schema (we're already using this for testing postgrest openapi spec).

@bwbroersma
Copy link
Author

bwbroersma commented Aug 9, 2019

Ok wow, this is really powerful! Thanks for pointing me to it. Kind of removes the need for notify too, since we can use a live view during development (and optionally use ddl_command_end trigger to refresh a materialized view to speed it up).

Next thing is using the OpenAPI or more likely a PostgREST schema endpoint for internal use, this way authentication can also be blocked before trying to execute, and give more meaningful errors.

Off topic: I also completely agree with the fact that SwaggerUI is overrated, ReDoc seems much cleaner (and has a generate static HTML) but lacks 'Try it'. But the best thing would be to create a custom front end tool for PostgREST to show the API, to handle all input operators and enum arrays, etc., handle the embedding, etc.

@ppKrauss
Copy link
Contributor

See reference about CSV conventions and how to express alternatives by JSON (CSV schema metadata),
https://frictionlessdata.io/specs/csv-dialect/

@wolfgangwalther
Copy link
Member

Definitely a lot of shortcomings with CSV right now. Ideally we would solve all of these with pg COPY in a more elegant(it already support things like custom delimiters, null strings, etc) and performant way, but we're missing COPY support in our lib nikita-volkov/hasql#1.

Even if we had COPY support in hasql, we still couldn't really use it for text/csv output - because the json output is only one of several columns we return right now. We don't want our whole query to return CSV - we only want the body in one of the columns to contain CSV.

@wolfgangwalther
Copy link
Member

There's one more problem with csv headers when returning a composite result from an RPC right now:

    it "should respond with correct, aliased CSV headers to 'text/csv' request on RPC" $
      request methodGet "/rpc/getallprojects?select=client:client_id,name,project:id&limit=2&order=id"
              (acceptHdrs "text/csv") ""
        `shouldRespondWith` "client,project,name\n1,1,Windows 7\n1,2,Windows 10"
        { matchStatus  = 200
        , matchHeaders = ["Content-Type" <:> "text/csv; charset=utf-8"]
        }

returns

       body mismatch:
         expected: "client,project,name\n1,1,Windows 7\n1,2,Windows 10"
         but got:  "client_id,id,name\n1,\"Windows 7\",1\n1,\"Windows 10\",2"

The aliases don't work and the headers are returned in the wrong order. This is because asCsvF selects the headers from sourceCTEName and the data from _postgrest_t (where the renaming and reordering happens in this case).

I'm hoping for #1582 to allow defining a csv aggregate. Unfortunately, #2701 won't help, yet, because I need to return json or csv from the same RPC dynamically depending on accept header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants