Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse errors are not user friendly... #26

Open
dimitri opened this issue Jun 6, 2014 · 14 comments
Open

Parse errors are not user friendly... #26

dimitri opened this issue Jun 6, 2014 · 14 comments

Comments

@dimitri
Copy link

dimitri commented Jun 6, 2014

As seen in dimitri/pgloader#56 we have problems with users trying to decipher error messages. Is it possible to do something about it? Do you have plans to improve error reporting?

@scymtym
Copy link
Contributor

scymtym commented Jun 6, 2014

Hi,

note that the esrap repository from which quicklisp fetches is scymtym/esrap.

Regarding your question: I have been working on parse error reports like

At

  t a { a b:double a b { a@string a b } { a b > } 
                         ^ (Line 1, Column 30, Position 30)

Could not parse NAME. Expected:

     any character satisfying ALPHANUMERICP
  or the character { (LEFT_CURLY_BRACKET)

Reached via

     STATEMENT
  -> (+ (OR SCOPE DECLARATION USE))
  -> (OR SCOPE DECLARATION USE)
  -> USE
  -> NAME
  -> (+ (ALPHANUMERICP CHARACTER))
  -> (ALPHANUMERICP CHARACTER)

Do you think that would help?

@nikodemus: would do you think about this?

@dimitri
Copy link
Author

dimitri commented Jun 6, 2014

It would help tremendously, yes!

@scymtym
Copy link
Contributor

scymtym commented Jun 7, 2014

OK, the code will need some work before it can go into master, but I will try to get it done as soon as possible.

@scymtym
Copy link
Contributor

scymtym commented Jul 10, 2014

@dimitri, I pushed a draft of the improved parse error reporting to scymtym/esrap:wip-better-errors. In addition to the error position, parse errors report

  • the most specific rule that failed to apply
  • a list of characters, strings, etc. expected at the failure position
  • a derivation explaining how the failing rule was reached

I tested the code by hacking an Emacs completion backend that displays the expected strings or characters:
screenshot

Could you test whether it would improve the situation for you? I would be interested to hear whether it works for you and whether error messages are too verbose/too terse/just right.

After some cleanup, the code could go into esrap master, I think.

PS: The completion hack above uncovered the following potential bug in the pgloader parser: the unix: scheme visible in the screenshot leads to the following error:

:UNKNOWN fell through ECASE expression.
Wanted one of (:MYSQL :POSTGRESQL).
   [Condition of type SB-KERNEL:CASE-FAILURE]

Restarts:
 0: [*ABORT] Return to SLIME's top level.
 1: [ABORT] Abort thread (#<THREAD "worker" RUNNING {128E6B01}>)

Backtrace:
  0: (SB-KERNEL:CASE-FAILURE ECASE :UNKNOWN (:MYSQL :POSTGRESQL))
  1: ((LAMBDA (PGLOADER.PARSER::URI #:START398 #:END399) :IN "/home/jmoringe/.local/share/common-lisp/quicklisp/dists/quicklisp/software/pgloader-3.0.99/src/parser.lisp") ((:TYPE :UNKNOWN) (:USER "a" :PASS..
      Locals:
        PGLOADER.PARSER::DBNAME = "bla"
        PGLOADER.PARSER::HOST = (:HOST "c")
        PGLOADER.PARSER::PASSWORD = "b"
        PGLOADER.PARSER::PORT = NIL
        PGLOADER.PARSER::TABLE-NAME = NIL
        TYPE#1 = :UNKNOWN
        PGLOADER.PARSER::URI = ((:TYPE :UNKNOWN) (:USER "a" :PASSWORD "b") (:HOST (:HOST "c") :PORT NIL) (:DBNAME "bla") NIL)
        PGLOADER.PARSER::USER = "a"
        PGLOADER.PARSER::USER#1 = "a"
        #:WHOLE400 = (:TYPE :UNKNOWN :USER "a" :PASSWORD "b" ...)
  2: ((LAMBDA NIL :IN ESRAP::COMPILE-RULE))
  3: (ESRAP::RESULT-PRODUCTION #S(ESRAP::RESULT :%PRODUCTION #<CLOSURE (LAMBDA NIL :IN ESRAP::COMPILE-RULE) {12997B55}> :POSITION 46))
  4: ((LAMBDA NIL :IN ESRAP::COMPILE-SEQUENCE))

The case failure is in line 342 of parser.lisp (Quicklisp version of pgloader).

@dimitri
Copy link
Author

dimitri commented Jul 11, 2014

Hi,

Jan Moringen [email protected] writes:

@dimitri, I pushed a draft of the improved parse error reporting to
scymtym/esrap:wip-better-errors. In addition to the error position,
parse errors report

Thanks a lot for your work on that, it's going to be awesome!

I did git clone your project and branch and am trying it by injecting
random errors in my test load files and seeing what happens with the
following command:

(pgloader::with-monitor ()
(pgloader.parser::parse-commands-from-file
"/Users/dim/dev/pgloader/test/csv.load"))

As far as the error message goes, it's already way better, but as far as
understanding what's wrong in the source file goes, I think we need more
improvements.

Could you please try removing a comma separator in the file (I tested
just after the truncate keyword), or maybe removing the d letter from
the “enclosed” keyword, etc, and see what happens?

It seems that all I can get from those tests is the following error
message:

At end of input


  ^ (Line 51, Column 1, Position 1400)

Could not parse DOUBLED-COLON. Expected:

  the string "::"

Reached via

     DOUBLED-COLON
  -> (AND "::")
  -> "::"

Could not parse DOUBLED-AT-SIGN. Expected:

  the string "@@"

Reached via

     DOUBLED-AT-SIGN
  -> (AND "@@")
  -> "@@"

Could not parse DSN-USER-PASSWORD. Expected:

  the character @ (COMMERCIAL_AT)

Reached via

     DSN-USER-PASSWORD
  -> (AND USERNAME (? (AND ":" (? PASSWORD))) "@")
  -> "@"

It might that my esrap grammar is just confused, tho.

After some cleanup, the code could go into esrap master, I think.

Yes please! ;-)

PS: The completion hack above uncovered the following potential bug in
the pgloader parser: the unix: scheme visible in the screenshot
leads to the following error:

I will see about that, thanks for reporting!

:UNKNOWN fell through ECASE expression.

My understanding is that using ecase prevents the parser to do proper
backtracking here, and so we won't get to your nicer error messages in
case of typo, right?

Regards,

dim

@scymtym
Copy link
Contributor

scymtym commented Jul 16, 2014

Could you please try removing a comma separator in the file (I tested just after the truncate keyword), or maybe removing the d letter from the “enclosed” keyword, etc, and see what happens?
[...]
It might that my esrap grammar is just confused, tho.

This is indeed caused by a problem in the grammar. I debugged this by doing (esrap:trace-rule 'commands :recursive t). Among many other things, the output contained

7: PGLOADER.PARSER::USERNAME 440-1367 -> "/pgloader?csv (a, b, d, c)

     WITH truncate,
          skip header = 1,
          fields optionally enclosed by '\"',
          fields escaped by double-quote,
          fields terminated by ','

      SET client_encoding to 'latin1',
          work_mem to '12MB'
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ drop table if exists csv; $$,
    $$ create table csv (
        a bigint,
        b bigint,
        c char(2),
        d text
       );
    $$;



Stupid useless header with a © sign
\"2.6.190.56\",\"2.6.190.63\",\"33996344\",\"33996351\",\"GB\",\"United Kingdom\"
\"3.0.0.0\",\"4.17.135.31\",\"50331648\",\"68257567\",\"US\",\"United States\"
\"4.17.135.32\",\"4.17.135.63\",\"68257568\",\"68257599\",\"CA\",\"Canada\"
\"4.17.135.64\",\"4.17.142.255\",\"68257600\",\"68259583\",\"US\",\"United States\"
\"4.17.143.0\",\"4.17.143.15\",\"68259584\",\"68259599\",\"CA\",\"Canada\"
\"4.17.143.16\",\"4.18.32.71\",\"68259600\",\"68296775\",\"US\",\"United States\"
"

i.e. the username rule consumed all remaining input (that's why your report mentions failing at "end of input"). The username rule is defined as

(defrule username (+ (or (not (or ":" "@" )) doubled-at-sign doubled-colon))
  (:text t))

meaning that it can consume anything as long as it is neither ":" nor "@". Restricting username (e.g. disallowing whitespace or whatever is appropriate) should solve the problem.

As far as the error message goes, it's already way better, but as far as understanding what's wrong in the source file goes, I think we need more improvements.

Fixing the above problem should make the error messages much more useful. One problem will probably remain, though: in typical grammars, whitespace and comments are permitted almost everywhere. However, it is not helpful if virtually every error messages contains "you know, you could just insert whitespace or a comment at the problematic position". So I suppose we need a mechanism for excluding certain rules from error messages.

My understanding is that using ecase prevents the parser to do proper backtracking here, and so we won't get to your nicer error messages in case of typo, right?

Kind of. "Normally" the parser first applies rules without calling result producing code (:lambda, :destructure, etc.). In case the parse succeeds, result producing code is called to, well, produce parse results. This implies that it is not possible to influence the parse from within :lambda, :destructure, etc. Signaling an error in one of these will simply signal an error in the result production phase, but only after a successful parse.

The parse vs. result production phase separation is not absolute, though. "Semantic predicates" can be used to steer the parse based on intermediate results. A simple example would be

(defrule integer ...) ; returns parsed integer
(defrule odd-integer (oddp integer))

When odd-integer is tried, oddp is called with the production of the integer rule and odd-integer succeeds if oddp returns true.

@dimitri
Copy link
Author

dimitri commented Jul 16, 2014

I did as you propose in dimitri/pgloader@0f3103d and I really like the error messages I get now. It's a huge improvement, thanks a lot for it!

Please consider merging that into main esrap already, even if some improvements can be think of for later revisions, such as avoiding mentioning some parts of the syntax tree such as the comments.

dimitri added a commit to dimitri/pgloader that referenced this issue Jul 16, 2014
As per grammar review at  nikodemus/esrap#26,
improve the situation.
@dimitri
Copy link
Author

dimitri commented Jul 16, 2014

I also fixed (I think) the dsn prefix parsing bits thanks to the patch linked here.

@scymtym
Copy link
Contributor

scymtym commented Jul 18, 2014

I will try to clean up and merge the code as quickly as I can. No promises though.

@dimitri
Copy link
Author

dimitri commented Jul 18, 2014

Thanks for your efforts and interest (and code) ;-)

@scymtym
Copy link
Contributor

scymtym commented Aug 11, 2014

@dimitri Just to keep you informed: most of the improved error reporting code is approaching acceptable quality with tests and all. However, there is one design decision (scraping the required result information from the cache, "post-mortem" vs. storing more information in results while parsing) which needs at least benchmarking and potentially more thought.

Sorry for dragging this out like this but I would hate to sacrifice (much) performance for the improved error reporting. I think that is how @nikodemus would feel about this as well.

@dimitri
Copy link
Author

dimitri commented Aug 11, 2014

Thanks a lot for the update, really. I understand that some design decisions are more tricky than others and time is needed. I'm still waiting for debian packaging work to release pgloader 3.1.0, I hope that with some luck we will be able to benefit from the new error reporting as soon as 3.1.0, let's see how it goes ;-)

@fade
Copy link

fade commented Oct 14, 2014

The better errors branch has already helped me dig myself out of a PEBCAK problem with pgloader, so thanks for the work on this. IMO, it's very valuable.

@scymtym
Copy link
Contributor

scymtym commented Jan 30, 2016

A much improved version of this code is in https://github.com/scymtym/esrap master (which is the maintained upstream repository)

dimitri added a commit to dimitri/pgloader that referenced this issue Mar 26, 2016
The WIP branch about better error messages made its way through the main
code, so switch back to the mainline as available directly in Quicklisp.

See nikodemus/esrap#26.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants