Speed up lv_i18n_get_text_core by clever optimization. #53

bubeck · 2022-11-16T15:43:12Z

I am willing to improve

lv_i18n/src/lv_i18n.template.c

Line 175 in f871e05

    
           static const char * __lv_i18n_get_text_core(lv_i18n_phrase_t * trans, const char * msg_id)

for speed. Before issuing a PR I would like to validate my idea here
and ask wether you are willing to accept or discuss this.

Currently __lv_i18n_get_text_core is O(n) which means, that it takes
longer if the list of strings gets longer. In my case this leads to
inacceptable long run time to find the translation. My change leads to
O(1), that its much faster.

The basic idea is illustrated by the follwing example:

// German translation of "Hello World" and "Good Bye":
static char *de_DE_singulars[] = { 
  "Hallo Welt",     // Index 0
  "Tschuess"        // Index 1
};

printf ("%s\n", _("Hello World"));

This gets translated at compile time to:

  char *_1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

where lv_i18n_get_text(1) then simply does:

char *
lv_i18n_get_text (int index)
{
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

The magic part is, how the compiler comes from _("Hello World") to the
index position "1" (which is the index position of this
string). _("Good Bye") would give "2". Here is the solution:

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))
  printf ("%s\n", _("Hello World"));

The CPP expands this to:

  printf ("%s\n", ((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)) == 0 ? "Hello World" : lv_i18n_get_text((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)))));

which then gets compiled (!) to:

  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

The important thing is, that IDX is optimized by the compiler at
compile time to the index without the use of any strcmp executed at
run time. This is done, as all strings are constant and the compiler
has a knowledge of strcmp as well. It therefore eavluates the
constants at compile time to the corresponding index.

This optimization is certainly compiler dependant. It is done by GCC
since version 4 (at least) available since around 8 years. It is also
true for clang and probably most other compilers. It does not need "-O" or anthing else. It is done by standard compiler options in any case. It is also done with "-g" and also with "-O0".

Here is the complete example.

#include <stdio.h>
#include <string.h>

static char *de_DE_singulars[] = { 
  "Hallo Welt", 
  "Tschuess" 
};

char *
lv_i18n_get_text (int index)
{
  printf("lv_i18n(%d)\n", index);
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))

int
main ()
{
  int index;
  printf ("%d\n", IDX ("Hello World"));
  printf ("%d\n", IDX ("Good Bye"));
  printf ("%d\n", IDX ("Unknown"));
  printf ("%s\n", _("Hello World"));
  printf ("%s\n", _("Unknown"));

  printf("Division by zero will follow (because of dumb compiler)\n");
  index = 10 / IDX("Unknown");
  return 0;
}

This gets translated to

main ()
{
  int index;
  int D.2357;
  char * D.2356;
  char * _1;
  int _3;

  <bb 2>:
  printf ("%d\n", 1);
  printf ("%d\n", 2);
  printf ("%d\n", 0);
  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);
  __builtin_puts ("Unknown");
  __builtin_puts (&"Division by zero will follow (because of dumb compiler)"[0]);
  index_2 = 10 / 0;
  _3 = 0;

<L0>:
  return _3;

}

My change would go for "lv_i18n compile" adding a flag like
"--optimize". When given, then the above code would be produced,
instead of the previous one. This means especially that IDX will be generated with all translationKeys.keys. Without that flag, everything will stay
unchanged. Developers can then decide, if their toolchain has the
necessary optimization and then use "--optimize" or not.

What do you think? If you agree on this idea, I will implement this and do a PR

kisvegabor · 2022-11-20T17:47:09Z

Sounds like an interesting idea!

Just to be sure I understand correctly

If there are 1000 "tags" then IDX will have be 999 level nested ? : expression, right? I'm not sure it's a problem, just good to e aware of it.
It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

bubeck · 2022-11-21T06:57:41Z

Sounds like an interesting idea!

Thanks! I also like it. ;)

1. If there are 1000 "tags" then `IDX` will have be 999 level nested  `? :`  expression, right? I'm not sure it's a problem, just good to e aware of it.

Yes, thats right. The imporant fact here is, that IDX get evaluated by the compiler at compile time to the right index (e.g. 18). so there is no runtime penalty for finding it. I tried it on clang and gcc with 3000 tags. It worked as well. The only small problem is, that compile time slightly increases, but really not big. I already made a dirty version for our own project and it works like a charm: https://github.com/fhorinek/BB.

2. It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

The idea can also be applied to plurals, no problem. Only the PoC is only for singular, but the idea is valid for plural as well.

You can find a mostly dirty and hacked version for our own project under https://github.com/fhorinek/BB/tree/master/Utilities/lv_i18n/node_modules/lv_i18n. I will improve to make it more clean and general usable.

I am willing to implement it as a PR but only if the idea gets a chance to get included. Therefore I am asking here for your feedback if I should continue.

kisvegabor · 2022-11-21T21:14:29Z

Sounds good to me.

Just one thing: apply logarithmic search could be an option too. It'd reduce 1000 searches to 10 (1%) and would be a more standard approach. What do you think about it?

bubeck · 2022-11-22T09:10:53Z

Yes, goods point.

Thanks for your ideas and help. I will finish the PR in the next days.

Fixes lvgl#53

puzrin · 2022-12-01T17:59:35Z

Phrase search was done "quick and dirty" since phrases count in embedded is usually small. We decided to improve it later, if required.

So, my opinion is:

Better search should be landed (via halving and so on).
I'd avoid to introduce new options if possible. IMO package should "just work".

Consider halving search as "good enough with minimal efforts" (but I do not insist and open to any suggestions)

bubeck · 2022-12-02T07:31:22Z

In our case (https://github.com/fhorinek/BB/tree/master/BB3/App/lv_i18n), we have around 500 keys (will probably increase to around 700) and the implementation was definitevly too slow. This is why I came up with this proposal. You are right, that it could be improved by binary search but still we would need around 9 strcmp to find the right key with 500 to look through. As this is done quiet often, it is too slow for our project.

The above solution reduces this to 1 integer compare, regardless of the number of keys, which is substantially faster.

I introduces an additional option "--optimize" to make sure, that the code base is unchanged, as long as we stabilize and test the idea. If you want, you can make "--optimize" the default if we all are sure, it works well.

As the PR is nearly finished (needs some polishing), I will finialize it and send to you. It is then definitely up to you to decide how to proceed. I am certainly willing to adopt to your propsals and wishes. :)

puzrin · 2022-12-02T08:22:45Z

I understand general idea. It needs data rework, because macro also needs internal state (locale), and compile-time index has no access to that.

Probably, this can be achieved by align all sigulars & plurals between different languages, at cost of some minor flash size (insert empty values or markers for not-existing translations).

https://github.com/lvgl/lv_i18n/blob/master/src/lv_i18n.template.c - if you could manually rewrite this template to new format, I will start thinking how to arrange that in JS. Check points are:

languages should have different number of forms in plurals
translations should have holes (missed phrases)

Ideally, it would be fine to leave just your optimization, without options. Alternative - C macros (#ifdef) to activate fallback for ancient compilers. If both impossible - then will consider --option. Can not say anything more certain until see new data structs.

bryghtlabs-richard · 2024-07-10T14:10:03Z

Another slightly slower O(1) approach would be minimal perfect hashmap like gperf generates. Each translation would be a different sourceString->translatedString map. This would not require changing the lv_i18n_get_text() API.

bryghtlabs-richard · 2024-07-12T21:25:37Z

I thought I was facing the same issue, but we had something else causing us to retranslate the current screen too often.

Here's a rough implementation of the binary search method that assumes Node.js and strcmp() sort the same way.

teaalltr · 2024-09-18T16:31:47Z

Hi @puzrin @bubeck @bryghtlabs-richard is there any news on this?

bubeck added a commit to bubeck/lv_i18n that referenced this issue Nov 30, 2022

use --optimize to generate faster code without strcmp at runtime

491e27e

Fixes lvgl#53

bubeck linked a pull request Nov 30, 2022 that will close this issue

use --optimize to generate faster code without strcmp at runtime #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up lv_i18n_get_text_core by clever optimization. #53

Speed up lv_i18n_get_text_core by clever optimization. #53

bubeck commented Nov 16, 2022 •

edited

Loading

kisvegabor commented Nov 20, 2022

bubeck commented Nov 21, 2022 •

edited

Loading

kisvegabor commented Nov 21, 2022

bubeck commented Nov 22, 2022

puzrin commented Dec 1, 2022

bubeck commented Dec 2, 2022

puzrin commented Dec 2, 2022

bryghtlabs-richard commented Jul 10, 2024

bryghtlabs-richard commented Jul 12, 2024

teaalltr commented Sep 18, 2024

Speed up lv_i18n_get_text_core by clever optimization. #53

Speed up lv_i18n_get_text_core by clever optimization. #53

Comments

bubeck commented Nov 16, 2022 • edited Loading

kisvegabor commented Nov 20, 2022

bubeck commented Nov 21, 2022 • edited Loading

kisvegabor commented Nov 21, 2022

bubeck commented Nov 22, 2022

puzrin commented Dec 1, 2022

bubeck commented Dec 2, 2022

puzrin commented Dec 2, 2022

bryghtlabs-richard commented Jul 10, 2024

bryghtlabs-richard commented Jul 12, 2024

teaalltr commented Sep 18, 2024

bubeck commented Nov 16, 2022 •

edited

Loading

bubeck commented Nov 21, 2022 •

edited

Loading