Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up lv_i18n_get_text_core by clever optimization. #53

Open
bubeck opened this issue Nov 16, 2022 · 10 comments · May be fixed by #55
Open

Speed up lv_i18n_get_text_core by clever optimization. #53

bubeck opened this issue Nov 16, 2022 · 10 comments · May be fixed by #55

Comments

@bubeck
Copy link

bubeck commented Nov 16, 2022

I am willing to improve

static const char * __lv_i18n_get_text_core(lv_i18n_phrase_t * trans, const char * msg_id)

for speed. Before issuing a PR I would like to validate my idea here
and ask wether you are willing to accept or discuss this.

Currently __lv_i18n_get_text_core is O(n) which means, that it takes
longer if the list of strings gets longer. In my case this leads to
inacceptable long run time to find the translation. My change leads to
O(1), that its much faster.

The basic idea is illustrated by the follwing example:

// German translation of "Hello World" and "Good Bye":
static char *de_DE_singulars[] = { 
  "Hallo Welt",     // Index 0
  "Tschuess"        // Index 1
};

printf ("%s\n", _("Hello World"));

This gets translated at compile time to:

  char *_1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

where lv_i18n_get_text(1) then simply does:

char *
lv_i18n_get_text (int index)
{
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

The magic part is, how the compiler comes from _("Hello World") to the
index position "1" (which is the index position of this
string). _("Good Bye") would give "2". Here is the solution:

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))
  printf ("%s\n", _("Hello World"));

The CPP expands this to:

  printf ("%s\n", ((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)) == 0 ? "Hello World" : lv_i18n_get_text((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)))));

which then gets compiled (!) to:

  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

The important thing is, that IDX is optimized by the compiler at
compile time to the index without the use of any strcmp executed at
run time. This is done, as all strings are constant and the compiler
has a knowledge of strcmp as well. It therefore eavluates the
constants at compile time to the corresponding index.

This optimization is certainly compiler dependant. It is done by GCC
since version 4 (at least) available since around 8 years. It is also
true for clang and probably most other compilers. It does not need "-O" or anthing else. It is done by standard compiler options in any case. It is also done with "-g" and also with "-O0".

Here is the complete example.

#include <stdio.h>
#include <string.h>

static char *de_DE_singulars[] = { 
  "Hallo Welt", 
  "Tschuess" 
};

char *
lv_i18n_get_text (int index)
{
  printf("lv_i18n(%d)\n", index);
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))

int
main ()
{
  int index;
  printf ("%d\n", IDX ("Hello World"));
  printf ("%d\n", IDX ("Good Bye"));
  printf ("%d\n", IDX ("Unknown"));
  printf ("%s\n", _("Hello World"));
  printf ("%s\n", _("Unknown"));

  printf("Division by zero will follow (because of dumb compiler)\n");
  index = 10 / IDX("Unknown");
  return 0;
}

This gets translated to

main ()
{
  int index;
  int D.2357;
  char * D.2356;
  char * _1;
  int _3;

  <bb 2>:
  printf ("%d\n", 1);
  printf ("%d\n", 2);
  printf ("%d\n", 0);
  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);
  __builtin_puts ("Unknown");
  __builtin_puts (&"Division by zero will follow (because of dumb compiler)"[0]);
  index_2 = 10 / 0;
  _3 = 0;

<L0>:
  return _3;

}

My change would go for "lv_i18n compile" adding a flag like
"--optimize". When given, then the above code would be produced,
instead of the previous one. This means especially that IDX will be generated with all translationKeys.keys. Without that flag, everything will stay
unchanged. Developers can then decide, if their toolchain has the
necessary optimization and then use "--optimize" or not.

What do you think? If you agree on this idea, I will implement this and do a PR

@kisvegabor
Copy link
Member

Sounds like an interesting idea!

Just to be sure I understand correctly

  1. If there are 1000 "tags" then IDX will have be 999 level nested ? : expression, right? I'm not sure it's a problem, just good to e aware of it.
  2. It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

@bubeck
Copy link
Author

bubeck commented Nov 21, 2022

Sounds like an interesting idea!

Thanks! I also like it. ;)

1. If there are 1000 "tags" then `IDX` will have be 999 level nested  `? :`  expression, right? I'm not sure it's a problem, just good to e aware of it.

Yes, thats right. The imporant fact here is, that IDX get evaluated by the compiler at compile time to the right index (e.g. 18). so there is no runtime penalty for finding it. I tried it on clang and gcc with 3000 tags. It worked as well. The only small problem is, that compile time slightly increases, but really not big. I already made a dirty version for our own project and it works like a charm: https://github.com/fhorinek/BB.

2. It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

The idea can also be applied to plurals, no problem. Only the PoC is only for singular, but the idea is valid for plural as well.

You can find a mostly dirty and hacked version for our own project under https://github.com/fhorinek/BB/tree/master/Utilities/lv_i18n/node_modules/lv_i18n. I will improve to make it more clean and general usable.

I am willing to implement it as a PR but only if the idea gets a chance to get included. Therefore I am asking here for your feedback if I should continue.

@kisvegabor
Copy link
Member

Sounds good to me.

Just one thing: apply logarithmic search could be an option too. It'd reduce 1000 searches to 10 (1%) and would be a more standard approach. What do you think about it?

@bubeck
Copy link
Author

bubeck commented Nov 22, 2022

Yes, goods point.

Thanks for your ideas and help. I will finish the PR in the next days.

bubeck added a commit to bubeck/lv_i18n that referenced this issue Nov 30, 2022
@puzrin
Copy link
Collaborator

puzrin commented Dec 1, 2022

Phrase search was done "quick and dirty" since phrases count in embedded is usually small. We decided to improve it later, if required.

So, my opinion is:

  1. Better search should be landed (via halving and so on).
  2. I'd avoid to introduce new options if possible. IMO package should "just work".

Consider halving search as "good enough with minimal efforts" (but I do not insist and open to any suggestions)

@bubeck
Copy link
Author

bubeck commented Dec 2, 2022

In our case (https://github.com/fhorinek/BB/tree/master/BB3/App/lv_i18n), we have around 500 keys (will probably increase to around 700) and the implementation was definitevly too slow. This is why I came up with this proposal. You are right, that it could be improved by binary search but still we would need around 9 strcmp to find the right key with 500 to look through. As this is done quiet often, it is too slow for our project.

The above solution reduces this to 1 integer compare, regardless of the number of keys, which is substantially faster.

I introduces an additional option "--optimize" to make sure, that the code base is unchanged, as long as we stabilize and test the idea. If you want, you can make "--optimize" the default if we all are sure, it works well.

As the PR is nearly finished (needs some polishing), I will finialize it and send to you. It is then definitely up to you to decide how to proceed. I am certainly willing to adopt to your propsals and wishes. :)

@puzrin
Copy link
Collaborator

puzrin commented Dec 2, 2022

I understand general idea. It needs data rework, because macro also needs internal state (locale), and compile-time index has no access to that.

Probably, this can be achieved by align all sigulars & plurals between different languages, at cost of some minor flash size (insert empty values or markers for not-existing translations).

https://github.com/lvgl/lv_i18n/blob/master/src/lv_i18n.template.c - if you could manually rewrite this template to new format, I will start thinking how to arrange that in JS. Check points are:

  • languages should have different number of forms in plurals
  • translations should have holes (missed phrases)

Ideally, it would be fine to leave just your optimization, without options. Alternative - C macros (#ifdef) to activate fallback for ancient compilers. If both impossible - then will consider --option. Can not say anything more certain until see new data structs.

@bryghtlabs-richard
Copy link
Contributor

Another slightly slower O(1) approach would be minimal perfect hashmap like gperf generates. Each translation would be a different sourceString->translatedString map. This would not require changing the lv_i18n_get_text() API.

@bryghtlabs-richard
Copy link
Contributor

I thought I was facing the same issue, but we had something else causing us to retranslate the current screen too often.

Here's a rough implementation of the binary search method that assumes Node.js and strcmp() sort the same way.

@teaalltr
Copy link

Hi @puzrin @bubeck @bryghtlabs-richard is there any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants