-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display optimizations (between 2x00 and 8x00 times faster) (ignore, superseded by #160) #158
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since 0.8.1 pyte does not support Python 2.x anymore so it makes sense to upgrade one of its dev dependencies, pyperf.
Receive via environ the geometry of the screen to test with a default of 24 lines by 80 columns. Add this and the input file into Runner's metadata so it is preserved in the log file (if any)
Implement three more benchmark scenarios for testing screen.display, screen.reset and screen.resize. For the standard 24x80 geometry, these methods have a negligible cost however of larger geometries, they can be up to 100 times slower than stream.feed so benchmarking them is important. Changed how the metadata is stored so on each bench_func call we encode which scenario are we testing, with which screen class and geometry.
A shell script to test all the captured input files and run them under different terminal geometries (24x80, 240x800, 2400x8000, 24x8000 and 2400x80). These settings aim to stress pyte with larger and larger screens (by a 10 factor on both dimensions and on each dimension separately).
The input files in the tests/captured must be loaded with ByteStream and not Stream, otherwise the \r are lost and the benchmark results may not reflect real scenarios.
eldipa
force-pushed
the
Display-Optimizations
branch
from
July 2, 2022 22:02
b37bfaf
to
011dcb6
Compare
The former `for x in range(...)` implementation iterated over the all the possibly indexes (for columns and lines) wasting cyclies because some of those indexes (and in some cases most) pointed to non-existing entries. These non-existing entries were faked and a default character was returned in place. This commit instead makes display to iterate over the existing entries. When gaps between to entries are detected, the gap is filled with the same default character without having to pay for indexing non-entries. Note: I found that in the current implementation of screen, screen.buffer may have entries (chars in a line) outside of the width of the screen. At the display method those are filtered out however I'm not sure if this is not a real bug that was uncovered because never we iterated over the data entries. If this is true, we may be wasting space as we keep in memory chars that are outside of the screen.
Python generators (yield) and function calls are slower then normal for-loops. Improve screen.display by x1 to x1.8 times faster by inlining the code.
The assert that checks the width of each char is removed from screen.display and put it into the tests. This ensures that our test suite maintains the same quality and at the same time we make screen.display ~x1.7 faster.
Instead of computing it on each screen.display, compute the width of the char once on screen.draw and store it in the Char tuple. This makes screen.display ~x1.10 to ~x1.20 faster and it makes stream.feed only ~x1.01 slower in the worst case. This negative impact is due the change on screen.draw but measurements on my lab show inconsistent results (stream.feed didn't show a consistent performance regression and ~x1.01 slower was the worst value that I've got).
eldipa
force-pushed
the
Display-Optimizations
branch
from
July 4, 2022 22:26
011dcb6
to
020fce6
Compare
eldipa
changed the title
Display optimizations (between 2x00 and 8x00 times faster)
Display optimizations (between 2x00 and 8x00 times faster) (ignore, superseed by #160)
Jul 14, 2022
Closed, superseded by #160 |
eldipa
changed the title
Display optimizations (between 2x00 and 8x00 times faster) (ignore, superseed by #160)
Display optimizations (between 2x00 and 8x00 times faster) (ignore, superseded by #160)
Jul 14, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
While the runtime of a general application using
pyte
is dominated bystream.feed
for the standard geometry (24x80), the runtime ofscreen.display
gets dominant for larger geometries (240x800, 2400x80, 24x8000).This is because
screen.display
does not use the fact thatscreen.buffer
is sparse and iterates over the whole range of possible coordinates(x,y)
in the screen, wasting time accessing non-existing entries inscreen.buffer
.Proposal
This PR does a series of changes to the
screen.display
method to make it faster with 4 changes:screen.display
aware thatscreen.buffer
is sparse and iterate over the real existing chars and not over the range of coordinates (bfeab39)for
-loop: generators coded in Python (not in C) have a lower performance than traditionalfor
-loop so a change is an easy win ( 5b32e25)assert
that was called for every single char: the corresponding check was moved to the tests so we don't loose coverage (13ee784)wcwidth
on each char: whilewcwidth
is already a function with a cache (thanks tofunctools
), callingwcwidth
still requires to do a call. We can avoid that storing the results ofwcwidth
on the char during thescreen.draw
and reuse it later inscreen.display
(c298bd3)Results
For the standard geometry of 24x80 we got the following improvement on
screen.display
:For larger geometries we made
screen.display
x10, x100 and almost x1000 faster.For
stream.feed
we got a minimal improvement and a minimal regression (*)(*) I don't thing that the results of
stream.feed
are meaningful and the discrepancies look like more due the noise. In a separated analysis aboutpyperf
(the tool that we use for the benchmark), it seems that it uses the average instead of the minimum of the samples so this will make the results slightly unstable)Full results are in
benchmark_results/
: one file has the performance for0.8.1
while the other includes the optimizations. These benchmark were executed with the auxiliary scriptfullbenchmark
.