Hi all! Captain Jeremy here. I’m excited to talk about some serious rendering improvements that we've been working on after the Update 1 release. And by "serious", I mean up to 2x the FPS on large factories! Keep reading to learn how we were able to achieve this amazing speedup.
We’ve been very impressed with the factories people have managed to create since Update 1, but the larger the factory, the more work it is for the CPU and GPU to keep frame rates high. For this reason, performance optimization is a never ending process. Readers of previous Captain’s Diaries will be aware that we’ve always found that switching our rendering over from Unity’s built-in renderers to a custom “instanced” rendering system has been beneficial for performance. For example we’ve used this technique for ports (1.4x speedup), terrain (9x speedup), and transport pillars (2x speedup). Next up are the factory structures themselves.
One major reason we’ve not tackled these structures up to this point is animations. The majority of our structures become animated when active: from simple things like spinning fans all the way to large buckets of molten material flying through the air. In the past all these animations were driven by the CPU: every frame the rotation translation and scale of each sub-object in the structure was updated and those updates sent to the GPU. This constant updating by the CPU is incompatible with efficient instanced rendering, and so we had to change the way we animate objects before we could start considering them for instanced rendering.
The solution we went for is known as the Vertex Animation Texture (VAT) technique. As all our animations are predefined from start to finish, we can preprocess all of these animations to generate an image (or texture) where colors in the image represent the position of each vertex at a given time. The x-axis is time, the y-axis is “vertex ID” Interpolating between those positions in the x-axis gives us smooth transitions across time.
Below we’re showing the VAT technique to implement morphing of a square. At runtime the GPU loads the texture with one pixel height per vertex, and one pixel width per “keyframe” (a time where everything is defined). If the current time is between two keyframes, we linearly interpolate between the previous and next keyframe. For example If it’s 25% of the way through we take 75% of the previous keyframe and 25% of the next. This color is then converted into an offset for the vertex, and we transform the shape so that the position is the original position + this offset.
In game our textures are much larger than in this simple example: we’ll often have hundreds of keyframes and thousands of animated vertices. Once the base method is implemented, however, the rest is just a matter of scale. Below is an example of how the gold furnace is animated using this method.
With the animations now computed on the GPU we can use instanced rendering for our structures, treating them the same as any other instanced entity. Furthermore, we’re able to free up CPU cycles which would otherwise be used to compute animation positions and instead place that on the much faster GPU. While performance uplift depends exactly on the scene being viewed, we’ve seen up to 4x faster rendering of structures, leading to up to 2x increase in FPS.
It is worth highlighting that this optimization is most effective for late-game very large factories. As the game progresses different aspects of the GPU and CPU load scale differently, so it’s unlikely that you’ll see much improvement in the early and mid-game factories (eg. Factory C).
In tandem with work on instanced animated buildings our artist has been working on improving Level of Detail (LOD) support for buildings. Rather than just having a single mesh per building, many buildings now change meshes dynamically depending on their distance from the player’s view. Buildings which are further away from the player can be rendered with fewer triangles, making long views significantly cheaper to render for the GPU.
Simulation improvements
Captain Filip is here to talk about improvements in the simulation loop. These changes are already in the game as part of Update 1, but we never had the opportunity to share them with you, and now it seems like a good one.
Improving simulation performance is critical for large end-game factories, especially if the game is played at increased speed. Our game simulation runs ten times per second at the slowest speed. The simulation is responsible for calculating many things, such as updating machines, moving products on belts, finding jobs for vehicles, path-finding, electricity distribution, and more.
Before Update 1, we noticed that the simulation deserves improvements. The most significant slowdown factor in our simulation was memory jumps. Every time the CPU needs to jump to a different object, it needs to jump to a different address in the memory and fetch its data which is slow. Initially, it was natural to have a granular object-oriented design, but that got costly as the demands grew.
Our ports implementation (ports between machines and belts) had performance inefficiencies due to frequent memory jumps. In the old design, each port was backed by an individual class - IoPort. Each port had a reference to the port it was connected to. So exchanging products between two ports required calling methods on two IoPort instances, which would then hop onto the connected entity, such as a machine to try to send it some products. This was a nice abstraction but had its cost. If you look at the diagram below, you can see that trying to send products via 4 ports required 14 memory jumps in total.
To improve this, we have changed the API so that each entity caches a direct reference to the entity it is connected to using a struct called IoPortData and just provides an identifier wrapped as IoPortToken to identify which specific port on an entity is being used. The outcome is that instead of 14 memory jumps, we now perform only 5 jumps (per 4 ports). And it can’t be much better because we can’t avoid jumping to a connected entity to see if it can accept a product, and we also have to pay 1 jump for storing ports in an array.
The ports refactoring brought some nice perf improvement on the average duration of simulation update. The figure below shows results from a benchmark on 4000 ports trying to send products between each other actively. As you can see, the average case got much better.
However, these optimizations come with a cost. It requires very careful management of all the caches, making the code more error-prone. Also, the API becomes less flexible. Having a decent set of automated tests also really pays off during these rewrites.
Machines optimizations
Machines became very complex throughout the development of COI. Initially, we thought machines should be general and versatile and combined from standalone parts so we could swap their behaviors and reuse them for different entities, which never happened. So we created a fairly complex set of objects, as shown below.
To improve on this, we flattened most of these classes into just three ones which are Machine, MachineInputBuffer, and MachineOutputBuffer. That has saved us lots of memory jumps. It also allowed us to introduce caches; for instance, for each recipe, we store the direct reference to its buffers, which eliminated the use of dictionaries. Replacing dictionaries with cached references did pay off very well in the hot spots.
Also every update, we would iterate over all the output buffers to see if we can send some products to the output ports (checking each buffer involves a memory jump). That was a waste of CPU time. So now we skip that if we know that all the output buffers were empty during the previous attempt. These optimizations are beneficial, but again, they are much more error-prone as we have to rely on caches that introduce multiple sources of truth.
The figure above shows a benchmark of the simulation of 1500 busy machines.
If a machine had no work to do, it would always scan all its recipes to see if it could run one. And that was very expensive, and it actually meant that idle machines had significant overhead. So we introduced optimization where if no quantity changes (input or output), we don’t try to search for a recipe if we failed the last time.
The figure above shows benchmarks of the simulation of 1500 idle machines. As you can see, even the minimal duration is now 7 times better.
After we realized how significantly all the memory jumps affect performance, we also started caching information, such as whether an entity is enabled or has workers. This would be normally verified in every sim update by going into individual managers. But now, we just check the value cached directly on the object, so we don’t have to jump anywhere. That also brought another 10-20% speed up.
Another area we focused on was the vehicle jobs scheduler. It works by scanning all the product buffers in your island, caching the data, and then performing pairing to find a job. Previously we did this for every vehicle separately. The reason was that, in theory, it shouldn’t matter much if we do it per vehicle or for all the vehicles simultaneously. Since we still loop over each vehicle for each buffer pair. But it turned out that memory access again played its role as the initial scanning of buffers, which is just O(n), was more expensive than the pairing. Also, most buffer pairs end up rejected before we even try to find a corresponding truck, which means that suddenly we don’t have to reject each buffer for each vehicle individually. The result is that finding jobs for 10 trucks takes almost a similar time as for a single truck. And it also allowed us to prioritize trucks based on their distance and capacity, which wasn’t possible before. This optimization is a nice example where we didn’t know how it pans out until we tried and measured it.
To close this off. If you are developing a game, you probably shouldn’t be focusing much on some excessive caching as the biggest enemy of yours is time, and working on code that is simple and easy to rewrite saves it. However, when designing game mechanics, you should consider how it affects performance. Going with simple game entities can give you better performance than universal objects that are just doing too much. Sometimes this pitfall can occur naturally just by overusing object inheritance. Also, performance requirements can actually drive what features you can or cannot provide to your players. But this is often hard to decide upfront.
What we are working on
This month we have been focusing on performance improvements in rendering as Jeremy covered. We also smashed many bugs filed on our GitHub; thank you to everyone who filed reports!
We are also advancing on the design of the upcoming features. As we have promised, we are working on the map editor. Another area we are looking into is improving game difficulties, making them more interesting, and also introducing a more entry-level difficulty that will, for instance, have no death spirals and be more forgiving. We will also squeeze in a few smaller surprises (as always :))
Kickstarter
As some of you might know, we had a great crowdfunding campaign on our website and Kickstarter. And we owe t-shirts and posters to some of our backers. We know it takes us quite some time and I would like to apologize for that. We got delayed as we worked on the updates and also because we are really trying to deliver something nice. For instance, we made 4 different designs of the t-shirt over the last year and half until we got happy with what we have. So we really appreciate your patience, and trust me, we also want to get it out of the door as soon as possible, as the fulfillment and shipping costs have increased significantly over the last two years. But please don’t worry; we will get you your merch!
Full patch notes [v0.5.4 - v0.5.4e]
Performance
Rendering optimizations yielding up to 2x the FPS on large factories.
New animation system that uses vertex animation texture (VAT) technique.
Added LODs for the most common entities.
Reduced LOD switching distance for transport pillars.
Improvements
Migrated to the latest Unity version 2022.3 LTS which should reduce rare and unexpected crashes. This also caused a large patch download size due to differences in game files.
Improved text rendering that results in sharper fonts.
Increased font size for languages that use complex symbols such as Japanese, Chinese, or Korean to improve readability.
Blueprints can now be reordered using drag & drop and the window size was increased.
Small-quantity jobs no longer skip high-priority jobs in the queue just because there is not enough quantity for the high-priority ones.
Farm recipes were added into the recipe book.
Allow transport connectors to be placed on a single-tile transport between two entities.
Farm no longer notifies about lost crops when it is paused.
Added camera field-of-view configuration to the video settings. Vertical FOV is now configurable from 50-70 degrees (default is 60).
Dumping is no longer allowed under entities that have underground structures such as fly-wheel or launch pad.
Bug fixes
Fixed concrete ground tiles that were sometimes left behind when an entity was deconstructed.
Transport pillars are no longer left at the ground level when transport does not need them.
Fixed missing cargo visualization on the haul truck after recovery.
Vehicle depot’s quick-delivery is now possible even when the vehicle cap is reached.
Fixed that designations could be placed too close to the terrain limits, making mining and dumping unfulfillable.
Fixed deconstruction speed exploit via quick remove and toggle of deconstruction, quick remove now also reduces the time needed for deconstruction.
Fixed product stats leak occurring when buffers outputs were forcibly cleared when recipe was running.
Fixed nuclear reactor that was incorrectly tracking fuel after upgrade.
Fixed issue when molten metal channel height was changed.
Cargo depot modules no longer consume power even when idle.
Transport path-finding port-avoidance heuristic now ignores ports for connectors.
Transport connectors are now properly removed when a destroyed connected building was the last reason to keep them.
Fixed upgrade window button that was showing incorrect cost in some cases.
Fixed trucks randomly stopping during deliveries to storages.
Fixed entity highlighting that was not tracking the animation for some entities.
Fixed nuclear reactor UI after upgrade to the higher tier.
Fixed research UI that had wrong size in some situations.
Fixed settlement UI that would sometimes show needs that were not available yet.
Fixed farm yield estimates that did not take farm yield edicts into account.
Fixed incorrect cannot deliver notification for already discarded product.
Fixed empty fuel station notification that was sometimes stuck even if the station was full.
Fixed that ship sometimes was not allowed to sail to a destination despite having enough fuel.
Fixed transport pillars that were left not constructed when constructing/deconstructing overlapping transports.
Fixed transport path-finding that was not able to connect a straight-ramp combination in some cases.
Fixed UI dialogs movement that was in some cases not synced with the mouse movement properly.
Fixed an unwanted transport reverse when constructing transport that ends with a connector.
Fixed an unwanted transport connector removal in cases where the resulting transport would have to be reversed in order to connect properly. This was causing unexpected transport reversals.
Fixed entity icons that were sometimes not showing correctly after operations such as cut/paste.
Fixed issues with buried rocks that were blocking vehicles while being invisible to the player.
Icons such as pause for transports now sit closer to the transport.
Improved visibility of mine tower work areas.
Fixed auto-save that was not showing an error on failure in some cases.
Comments