Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport unicode fix from piotr1212@17e23ef #2643

Merged
merged 7 commits into from
Oct 22, 2020
Merged

Backport unicode fix from piotr1212@17e23ef #2643

merged 7 commits into from
Oct 22, 2020

Conversation

deniszh
Copy link
Member

@deniszh deniszh commented Oct 19, 2020

@@ -93,7 +105,8 @@ def setRaw(s, loc, toks):
).setParseAction(setRaw)('call')

# Metric pattern (aka. pathExpression)
validMetricChars = ''.join((set(printables) - set(symbols)))
unicodePrintables = u''.join(unichr(c) for c in xrange(sys.maxunicode) if not unichr(c).isspace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 concerns here:

  1. This is a huge range
  2. Should we be excluding more characters than just spaces, there are many unicode categories https://en.wikipedia.org/wiki/Unicode_character_property#General_Category

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, I missed that point, thanks! I'll probably put it behind switch then - if user want UTF-8 in metrics, it can enable that. Would it be better?
  2. I think only spaces is good for start, not sure should we add anything later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Dan, this is basically the reason why I didn't create the PR back then, I didn't think this was an elegant solution and didn't know how to do it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, TIL pyparsing 2.3.0 and newer giving you pyparsing_unicode.printables !
But it's better to hide it behind switch anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah pyparsing_unicode.printables is much better! Still not sure if this is the right way to do this but as long it is behind the switch it is good enough for me.

@deniszh
Copy link
Member Author

deniszh commented Oct 22, 2020

Still need to test is it really fix #2641

@deniszh
Copy link
Member Author

deniszh commented Oct 22, 2020

Yes, it's helping with #2641, if UTF8_METRICS is enabled ofc. Merging.

@deniszh deniszh merged commit 9b7c761 into graphite-project:master Oct 22, 2020
@deniszh deniszh deleted the DZ-UTF-8-fix branch October 22, 2020 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants