Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve texture memory usage, by unloading tiles, partly resolving #12906 #12924

Merged
merged 3 commits into from
Dec 4, 2023

Conversation

tristan-morris
Copy link
Contributor

@tristan-morris tristan-morris commented Oct 8, 2023

This partly PR resolves #12906, and likely contributes to a few other issues relating to loss of WebGL context related to excessive GPU memory.

The root cause is textures aren't being unloaded from GL memory when they're no longer needed. This is most pronounced when moving around the map, with flyTo being the most pathological case. From what I can tell, this impacts all platforms but particularly on resources constrained devices. Safari on iOS and macOS enforces lower memory limits, contingent on the device being used, and seems to manifest more often.

The root cause of this problem was really difficult to identify originally. For me, this showed up initially in my app as WkWebView being terminated and raising webgl context lost errors. If you follow the device logs (not the app logs in XCode) it's possible to see warnings related to excessive memory usage, but the excessive memory use isn't attributed to VRAM usage! (I might raise this on their Bugzilla.) I've been able to replicate this issue for weeks with the pathological flyto (zipping between Australia and USA), but understanding the cause was another problem. Anyway... This PR is somewhat complete, but it needs some input from MB.

A few notes

  • I've added a debug artefact debug/pathological-flyto.html. This seems to really help identifying what is going on with GPU memory allocations.
  • I've updated the type references to Tile.texture so it's known by its concrete types throughout the codebase. Using an any type here likely contributed to the missing derefs.
  • This problem is not constrained to raster sources -- geojson and vector sources appear to exhibit the same issues -- their textures are not unloaded on pan/zoom and will continually grow until all vram is exhausted/limit reached. I loaded about 800 points, after zooming/panning in and out on a few areas, vram use is ~800mb+.

Status
My poking around so far has been to consolidate the destroy codepaths, aggressively drop textures and ensure the map continues to work. I am super keen to get some feedback from MB devs who know the codebase a lot better. Otherwise, I'm happy to continue poking around and liberate all of the lost vrams.

There's one remaining path which seems to cause some significant memory retention - if you zoom into a location with lots of geojson loaded, and then zoom out the tiles remain loaded. And can use a lot of VRAM! See src/source/source_cache.js I think...

Before
Top left shows an overlay with predicted vram usage in MB. By 40 seconds, Safari has terminated the tab, so ~7 pans/zooms.

Before.mp4

After
Max GL memory reaches ~210MB, with satellite raster and DEM. Safai seems happy.

After.mp4

Launch Checklist

  • [X ] briefly describe the changes in this PR
  • [ X] include before/after visuals or gifs if this PR includes visual changes
  • write tests for all new functionality
  • document any changes to public APIs
  • [ ? kind of] post benchmark scores
  • [X ] manually test the debug page
  • tagged @mapbox/map-design-team @mapbox/static-apis if this PR includes style spec API or visual changes
  • tagged @mapbox/gl-native if this PR includes shader changes or needs a native port
  • apply changelog label ('bug', 'feature', 'docs', etc) or use the label 'skip changelog'
  • [X ] add an entry inside this element for inclusion in the mapbox-gl-js changelog: <changelog>Fix an issue with excessive video memory usage with raster sources, as textures weren't being unloaded from video memory when no longer needed. </changelog>
    @

@tristan-morris tristan-morris requested a review from a team as a code owner October 8, 2023 09:17
@CLAassistant
Copy link

CLAassistant commented Oct 8, 2023

CLA assistant check
All committers have signed the CLA.

@tristan-morris tristan-morris changed the title Fix texture memory usage, by unloading tiles, resolving #12906 Improve texture memory usage, by unloading tiles, partly resolving #12906 Oct 10, 2023
@mourner
Copy link
Member

mourner commented Oct 11, 2023

@tristan-morris this is really impressive work, thanks so much for taking on this! Our memory handling definitely needs this kind of tightening up, and it's been a while since we did such a thorough hunt for memory leaks.

For a little bit of background on texture retention — originally it was introduced 9 years ago (!) in #597 for performance reasons described in #591 — to have a pool of common size textures to reuse instead of continuously creating and deleting textures. Before we remove this mechanism, let's investigate to understand why exactly it leads to unbounded memory growth — in theory, if I remember correctly, the number of saved textures should be bounded by what's seen on screen, and any newly visible tiles should reuse textures left in the pool for those that went offscreen, thus balancing the total amount. Did we break that balance at some point? Or is the amount retained too much even if it's bounded, and it's better for performance to always delete and create new textures without reuse?

@tristan-morris
Copy link
Contributor Author

tristan-morris commented Oct 12, 2023

Thanks @mourner. More than happy to help and it's been a good opportunity to understand the codebase in a lot more detail. The background is really useful and makes a lot of sense.

My takeaway is the texture handling has evolved over time and for the different concerns you've outlined, this has somewhat dropped off, and become quite fragmented. This can be seen in tile.js in particular with all of the different variables utilising Texture--all super important I bet, but the particular texture caching path I removed was only used by the raster layer source.

My thinking so far is to find the last bits of code which are holding onto textures quite aggressively, and then re-optimise texture allocations with reuse across everything utilising a texture? There might be a bit of a resource trade-off here between CPU/GPU and VRAM allocation.

As a comparison, if I load the pathological-flyto.html sample which utilises the MB Earthquake example, the texture usage is ~72MB and then in my app it's ~170MB. Similar enough, but I'm loading sig more data across ~7-8 sources. If I zoom in and out a few times on the same location with geojson loaded, this balloons to 1GB and 223MB, respectively. This high point is indicative of the texture caches being fully hydrated (tiles being retained due to current viewport & maxTileCacheSize setting). So overall, maybe slightly too much is being retained in VRAM but I guess this limit could be dynamic (ideally; or perhaps more dynamic) and performance characteristics calculated per device.

@tristan-morris
Copy link
Contributor Author

I think this is ready for another look @mourner.

It looks like different gl unload codepaths weren't releasing their resources consistently, which meant the texture/raster tile cache could continually grow. There was an unloadTileData and an unloadTile call which would have pushed twice into the cache, the same texture I guess but then it would have been popped once. Then again (eventually) perhaps causing some weird texture changes(?) but I wonder if this quirk is worked around elsewhere so it's not obvious, or an unload check. Either way, it causes the cache to grow.

The solution I implemented was to add types, and then destroy everything with the same Tile.destroy() method. The tile cache has remained pretty stable since (e.g. instrumenting the cache to check growth). I did have a fixed cap, but I think if it's all correctly pushed/popped, as you said, it should be fine. Feels slightly diiirty without a hard cap but ok. I refactored other cleanup logic so things aren't as easily missed.

This becomes most problematic on Webkit, iOS, with dem or raster and doing more with the library (consuming higher mems). Higher dpi devices (iPhone etc.) also fetching the 2x textures chew more allocation too. So you kind of end up with the absolute worst case user - an iPhone target, with an allocator that behaves like a potato.

With the changes:
Main ~50MB stable.
Workers 16MB stable.
GPU 300MB with Satellite, terrain etc. stable.

I'll write up a separate issue with larger geojson data sources running on a potato.

@mourner
Copy link
Member

mourner commented Oct 23, 2023

@tristan-morris this is fantastic work, thanks so much for working further on this! I'm pretty sure we'll try to land this before v3 final — let me spend some time reviewing some details, but overall this looks great.

if (tile.texture) {
painter.saveTileTexture(tile.texture);
}
// Texture caching on unload occurs in unloadTile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepankuzmin can you please check the logic here re CustomSource? Looks like the way this is currently wired, after the change, it would stop caching textures — what's the best way to accommodate the new approach while making sure nothing regresses there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey all! I've checked the CustomSource caching, and it was affected by this PR. I've made a fix for that in #12993. Since this PR was made from a fork, I couldn't target my changes on top of it. We can merge them sequentially. /cc @mourner

tile.destroy(true);

// Save the texture to the cache
if (tile.texture && tile.texture instanceof Texture) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: redundant check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that too! Seems that Flow picks up that tile may have been modified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you can usually placate Flow by separating the condition (or a texture reference) into a variable.

} else {
tile.destroy();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering — would it simplify things if we moved this whole logic to tile.destroy instead of having a conditional parameter? It might be better for consistency, since the counterpart getTileTexture logic of reusing textures is inside tile.js (setTexture method).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that too. My thinking against was, if you call destroy() on something you think it should be destroyed. Or at least provide a mechanism which makes sense. I also didn't want to move the calls around too much :)

src/terrain/draw_terrain_raster.js Outdated Show resolved Hide resolved
src/render/program/line_program.js Outdated Show resolved Hide resolved
Co-authored-by: Volodymyr Agafonkin <[email protected]>
Copy link
Member

@mourner mourner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 given #12993 will be merged afterwards to address CustomSource caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Safari on macOS and iOS memory overuse from user movement
4 participants