Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code marked as comment due to Umlauts #134

Closed
NielsNet opened this issue Sep 4, 2017 · 10 comments
Closed

Code marked as comment due to Umlauts #134

NielsNet opened this issue Sep 4, 2017 · 10 comments
Labels

Comments

@NielsNet
Copy link

NielsNet commented Sep 4, 2017

There seems to bee an issue with Textmate and umlauts.
As shown in the issue angelozerr/typescript.java#203 as soon as two or more letters with an umlaut are used the following lines of code are marked as comment even though they should not.
The following screenshot shows different cases where the behaviour appears.
grafik

@angelozerr
Copy link
Contributor

@fabioz
Copy link
Contributor

fabioz commented Sep 16, 2017

I already have some utilities for converting to/from bytes/multibyte I used in LiClipseText... (https://github.com/fabioz/LiClipseText/blob/master/plugins/org.brainwy.liclipsetext.editor/src/org/brainwy/liclipsetext/editor/partitioning/Utf8WithCharLen.java).

I'll take a look on porting them to OnigString (but will leave it up to you to where to place the calls, because it's not just convert one way, the result has to be converted back too...).

Maybe better would be just using always utf-8 internally and convert just the final result? Anyway, will provide a pull request for the conversions shortly, but will leave it up to you where to do the convertUtf16OffsetToUtf8 and convertUtf8OffsetToUtf16 calls.

@angelozerr
Copy link
Contributor

@NielsNet could you reinstall http://download.eclipse.org/tm4e/snapshots/ and tell me if it fixes your problem. Thanks!

@fabioz
Copy link
Contributor

fabioz commented Sep 19, 2017

@angelozerr With this change I have the same behavior I had with the approach I was using which just used everything with utf-8 offsets internally and just converting the final result from utf-16 to utf-8.

(i.e.: fabioz/LiClipseText@1d8f88c -- the interesting bit is Token.java and Grammar.java -- mainly, no conversion is needed internally and just the final result is converted).

So not sure... I must say I'm more inclined toward the patch which keeps everything utf-8 internally and just changes offsets when returning to grammar callers (it should be a bit faster as it doesn't have to convert all the time internally which may happen more often and makes internal handling simpler as everything is utf-8 internally as oniguruma has to work on bytes anyways).

If you want I can provide a patch for that...

@angelozerr
Copy link
Contributor

@fabioz I have tried to translate node onigurama code and it seems it fixes this issue and #136 (I have only tested in Windows OS).

Im' waiting for users feedback to know if it fixes encoding problem with other OS.

If you want I can provide a patch for that...

I'm aware with any patch if it improves performance and fixes any problem.

Please note that I would like to consume tokenizeLine2 #38 like VSCode does but for you it will change nothing if you wish to use again tokenizeLine

@NielsNet
Copy link
Author

@angelozerr Sorry I did not notice your comment. I tried to install the snapshot but it doesn't work. I get an Cannot perform operation. Computing alternate solutions... as the installation fails.

@angelozerr
Copy link
Contributor

@NielsNet please try to reinstall typescript.java http://oss.opensagres.fr/typescript.ide/snapshots/ which install last version of tm4e

@NielsNet
Copy link
Author

The problem is partally gone:
image
However:
image
image

@Omcsesz
Copy link

Omcsesz commented Apr 7, 2022

@angelozerr After installing the latest version of typescript.java, the problem fully is gone:
kep

@mickaelistria
Copy link
Contributor

Thanks, let's close then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants