In-game UI architecture

Started by
9 comments, last by NightCreature83 7 months, 1 week ago

I'm working on a system to support the in-game UI for my game engine. Below is a list of the scope and key features I intend to provide:

  • Rendered on a 2D canvas on top of all gameplay graphics
    • In other words, I want to restrict this to absolute basics. I wouldn't (yet) go into the weeds with “diegetic UI” that needs to be projected into 3D space (e.g what Doom 3 or Dead Space has)
  • Uses basic primitives (triangles, quads, etc.) and textures
    • I already have a rendering system to which I can submit draw commands for primitives, so I just need the higher-level logic to turn UI elements into batches of primitives
  • Supports basic user interface widgets for things like menus: frames, buttons, labels, etc.
    • This includes both rendering and event handling, potentially with state changes based on events (e.g button having a “hovered" and “clicked” state)
    • Parent-child logic to allow for composite UI elements (e.g window with buttons) that can be modified in bulk (show/hide, etc.)
    • No need for a fully fleshed out window management system! I aim to restrict the UI to “one active window/frame at any one time” (menu pages, etc.). Said windows may have internal child elements (buttons, labels, lists, etc.) but I wouldn't need anything more complex, like having to manage Z ordering and focus for multiple windows.
    • The one exception to the above: modal dialog (i.e popup/notification) logic which is drawn on top and consumes all inputs until the user acknowledges it
  • In-game UI elements: HUD, inventory, etc.
    • Besides barebones functionality, I'm hoping to add some features that fit the aesthetic and add to the game experience (animations, sound FX, etc.)
  • Convenient client-side API
    • The actual projects using this system have a convenient API to create and manage UI elements. They have interfaces to get/set widget data (e.g getting the text from an input field), they can subscribe and respond to UI events and modify the UI state accordingly

What are some good case studies for an architecture to use to accomplish this, ideally using modern programming paradigms?

Some of the main dilemmas I currently have:

  • What does “UI implemented through composition” even look like?
    • If I go down an ECS-like route of widgets being “bundles of components”, the systems that handle events, input, rendering, etc. become much less straightforward.
  • How to handle relationships (parent-child, one-to-many, etc.)?
    • This is trivial (if inefficient) in OOP, but less so when it's implemented via IDs and component pools
  • What do the systems look like?
    • To pick rendering as an example: I no longer have a virtual render function to call, and my logic may in fact depend on what components are present. I also need to deal with Z order and other considerations.
  • What does the user-facing API look like?
    • It's convenient for users if things are encapsulated in a base class pointer, and this convenience should not be ignored. If they have to start looking up individual components by ID, it could make the API much more of a slog to deal with.
  • Generally speaking, assuming OOP is not the way to go, how should one break out of that mindset w.r.t UI?
    • UI is the rare case where it seems to work decently enough, and many existing APIs (Qt, wxWidgets) still use this approach, though they are used for building entire GUI applications, not UI within an existing one

I listed below all the approaches I've already considered:

OOP

The most straightforward approach. Have something like a Widget base class, all widget types inherit from this, allowing for parent-child relationships and an interface for all the necessary logic. Client code can subclass the relevant widgets if needed, and they can instantiate and combine widgets to create the UI.

I used this for my initial attempt, where I created a very simplified copy of the widgets in Qt. This does get the job done, but it feels clunky, and I figured I could do a lot better. A big advantage of OOP is how it can hide the “messy bits” from the client, meaning they just get a friendly interface to interact with (show/hide, subscribe to events, etc.), lifetime for them starts and ends with the Widget object, and the backend is left to deal with all the hassle of things like resources and rendering logic.

The downside, however, is that it gets very complicated to achieve modularity through interfaces and inheritance. To give one example, it would be nice to be able to separate the logical aspects of buttons (a thing the user can activate to trigger some functionality) from the visuals (since it can be anything from a colored rectangle to an animated sprite). This gets very messy with OOP, and composition would obviously be better, but I haven't found any good case studies.

Dear ImGui

I already use this for developer/debug UI, and the draw lists would solve the rendering part. The logic for generating the UI is also relatively simple and straightforward.

However, I'm not sure immediate mode is good for in-game UI overall. To get things like fancy layouts and whatnot, it seems like it would add a lot of unwanted complexity, both to the API and the backend. It seems more suited for when the priority is functionality, and is less ideal for when the UI is “part of the game experience”.

Web-UI

Would allow using HTML, the contents of webpages cover pretty much all the requirements I have, and there is plenty of existing material to learn from.

However, it would also introduce a huge load on memory and processing, practically “running an engine within the engine". Feels like overkill.

Advertisement

I'd suggest using OOP for this case. I use it in my GUI system and it has never caused any issues. In some ways, OOP was created as a paradigm to structure GUI systems. It's actually well-suited to the task.

yah-nosh said:
it would be nice to be able to separate the logical aspects of buttons (a thing the user can activate to trigger some functionality) from the visuals (since it can be anything from a colored rectangle to an animated sprite). This gets very messy with OOP

For this case, one option is to use overriding of virtual functions (e.g. you can implement AnimatedSpriteButton by replacing its render() function). It's not that messy in practice, at least in my engine/editor.

Another option is to use delegate functions. This is important in general for clean UI design. Each widget type can have its own type of delegate (a struct containing a std::function or equivalent for each type of event that can be produced). When creating the widget, the delegate gets hooked up to the code that handles the logic. For a button, there might be an “onPress” callback, an “onHover”, and “render” callback. The button calls those functions (if they are not null) inside its base implementation. The delegates allow extending functionality of widgets with less boilerplate code than overriding functions. I'd strongly suggest using this pattern. It's worked very well for my editor.

Classical Monolythic Widgets are not the right solution to the UI problem, the best system I worked with so far was one where the widgets where constructed from components. In general you dont care about gameplay data all that much and you are better off polling from the UI for the data than setting the data from the other side, you need to translate that data anyway. In a really generalised sense of how this system worked, you could see a screen as a collection of widgets, which where a collection of behaviours, this still allows you to have a top layer of screen where you feed in input and such things. You dont have to do a classical ECS system here. This system also defined the way the UI looked from states in the UI state machine, this system also ruled which screens connected to each other not the code of the widgets or the screens. You never asked for widgets or behaviours by ID in this system. There is almost never a need for a behaviour to know about another behaviour. As an example of how you could define a widget from composition (this describes a button more or less)

<widget >
   <Behaviour <!--behaviour attributes-->/>
   <BehaviourSelectRenderObject <!--behaviour attributes-->/>
   <BehaviourAnimate <!--behaviour attributes-->/>
   <BehaviourText <!--behaviour attributes-->/>
   <BehaviourSelect <!--behaviour attributes-->/>
</widget>

One other thing I would strongly avoid in either case is that have widget draw themselves, this is asking for inheritance trees you dont want because you need something to do something slightly different. You want one system that mostly deals with activating, updating and navigating the UI logical side, have another system that deals with drawing the UI. This gets you out of the problem of having to inherit from one component just to override a minimal thing, you know have a specfic visual object that can deal with that specific thing you want it to be whilst still be driven by the same logic.

The behaviours actually get the input state and react based on it if they need to react to input, they all have an update call in which you deal with the behaviour in some cases they do nothing.

The user facing API should mostly be defining how the state machine works and capability of writing new composable behaviours.

You are going to always need a layer between the UI system and the Gameplay system, there is usually a translation needed between the data anyway, this also allows gives you a place to add localisation to the data.

Think about data driving the creation and layout of the UI and also how the UI flows from one screen to the next, making this data driven will allow you to offload that to another system and simplify how your runtime deals with updating a screen.

The worst systems I worked with btw forced you to manually code out item selections on screens and activations of selected items, the problem of not automating that stuff is you have to do it for each screen and its messy and fragile logic that will have to change with each udpate you make to the screen.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

@NightCreature83 Thank you for your comment, this looks quite promising. Do you know of a codebase which has this kind of system, so I could use it as reference?

That code base is very much proprietary, since this was a company I used to work for. So I cant point you at code, and only at how it works without too many details.

I can tell you I looked at several HTML uis and they all have the same issue, performance (there are solutions but you are basically rewriting the browser engine). The thing is that switching away from Scaleform/Flash based UI to HTML 5 actually isn't any different and has the exact same drawbacks as Flash had. The reason its happening is because its easier to find people who know that technology and are comfortable working in it.

It's really hard to find good UI people in games btw, and so they look at outside tech to bring in, in hopes of finding people willing to switch to games UI.

If you create something what should be foremost in your mind is iteration speed, that is the thing you have to get right in these systems, since UI changes all the time during development of a game sadly. So the less costly and painfull it is to do things the better. This is why people like imgui so much for debug UI its super easy.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

I'm not convinced the approach suggested by NightCreature83 is the right one for complex GUI systems. Maybe it's OK for a simple game UI. The benefit of separating out pieces of widgets into behavior components doesn't seem to outweigh the additional complexity this adds. It seems like you would get an exponential explosion of possible combinations of components that would make it hard to sufficiently test all code paths. What happens when you create a widget from components that don't work well together? On the other hand, a monolithic widget can result in clean readable code that is easy to reason about. If you use delegates like I talked about in my first reply, this can also be very extensible with minimal boilerplate.

I also don't like the idea of constantly polling the UI for data. This will break down in performance with large UI with hundreds of widgets. A core advantage of retained-mode GUI is that everything can be event-driven and therefore much faster.

One thing I agree with NightCreature83 on is that there should be no global event handling or looking up widgets by ID. In my GUI, I provide mouse/keyboard/dragdrop events to the top-level widget on the screen. These events then get propagated down through the GUI hierarchy until they are handled by a widget somewhere. Each event handler returns whether or not it handled the event to its parent, so that the parent can decide if the event should be propagated further. Each widget is responsible for passing the events to its children and doing any necessary transformation so that the events are in the child's coordinate system. There are only a handful of non-leaf widgets: GridView, ListView, TreeView, SplitView, ScrollView, CanvasView. Most of the GUI is structured using GridView, which is similar to HTML tables. Leaf widgets can be pretty simple: to click a button all it needs to do is check if mouse position is inside their boundary and then call the delegate function to handle the event.

As for widgets rendering themselves, it works in a similar way to events. I pass a GUIRenderer object into the top-level widget's virtual render() function. The widget then uses high-level functions of GUIRenderer to specify how it is drawn, and calling render() on any children. This might be something like renderer.drawBorderedRectangle(…) or renderer.drawText(...). The renderer then translates these high-level draw operations into low-level primitives and places them into a vertex buffer which is shared between all widgets and rebuilt every frame. The draw calls are placed into a queue which is submitted to the graphics API when the top-level widget returns from its render() function. The base widgets themselves are flexible enough (e.g. with custom style shaders, images, border/background colors and gradients) that I rarely find a need to override the rendering functionality.

Here is an example of an audio recording software I built using this GUI system, implemented entirely in C++. This image has around 1000 widgets visible, to give some idea of the complexity it can handle.

@Aressera If I undestand @nightcreature83 correctly, the principle is “separation of concerns”. All widgets are just “bundles” of various components, and the components themselves are added to their own well-defined systems.

To give a crude example, I add a Clickable component to a widget, which will then add it to a system that only cares about this component, plus the ones it needs for click handling logic (parent/child relationships and the bounding box). The system can then listen for click events, test all the clickable widgets, and run a user-provided callback stored in the component. There's not really that many different ways something can be “clickable”, so the responsibilities and implementation of the system should be fairly straightforward. Same can be done for rendering, where it's either specific primitives (mostly boxes, optionally with textures), or we outsource to the user.

I can see lots of benefits for the backend in this approach, as I know that a modest number of well-defined components should give me the feature set that I need, and it's easier to split the code for different parts of the logic. It also becomes easier to create custom widgets, since features can be added through components, or the widget itself is actually a composite of dedicated child widgets. I think Unity's UI API behaves something like this as well?

While this is promising, there are a couple of hurdles, mainly to do with the user-facing API:

  1. Composition is nice, but it will turn iteration into a slog if clients keep having to bundle up all the components they need for every single widget. I could use wrapper classes which do this automatically, but then I risk being back to OOP hell again.
  2. Following from above, what would it look like if client code needs to access data stored in specific components? In UI programming, it's hard to deny the convenience of just having a straightforward interface to interact with, rather than having to query components. Then again, I may be stuck in the “Qt mindset”, and there is a way to make this more user-friendly.

I think the main problem is that a lot of existing, tried-and-tested APIs (e.g Qt) are also very opaque. While the API just presents a very straightforward interface, making it seem like a “Button” is a single object that has everything we need, in reality it may be several different objects bundled together, and the API is just hiding this from the client. This makes sense for convenience, but it distorts the perspective on how a UI framework should be designed in the first place, as the frontend is confused with the backend (i.e developers believe the object hierarchy presented to the user is also what the backend uses for event handling, rendering, etc.)

I think the part I can agree on is the separation of concern. We can always have a view that is ready to be mapped with data model behind the scene that is processed by a business logic. This view can really be in any form, even as a separate file, not a code. The easiest example is HTML/CSS. HTML/CSS concerns are really just how things are look and represented. Whether you want it modified; say only part of it to be inserted in certain part of the HTML, or just want to update certain text in that HTML, or even highlight a color by appending new color by modifying/adding a CSS style, that's JavaScript's job (and the browser engine's renderer, internally).

What I'm saying is you can actually combine both opinions just fine. How you want to represent the view as a mere aesthetic data and how your system modify it can be two completely different things. I think UDK 3 (Unreal Engine 3) used Flash at some point, it was called Scaleform, before moving to UMG on UE4. I assume that is more on what nightcreature83 talk about. How the result is mapped into the engine and then rendered is something else entirely, which I assume is what Aressera concerns more.

@xrbtrx Data-driving is definitely a goal, as it would make authoring a lot easier and remove any hardcoded UI creation logic. Scaleform I'm aware of, but Flash is basically extinct, and the method itself feels like the web engine approach (which I already mentioned I would prefer to avoid).

There's another point that I forgot to add to my previous comment: while for certain features, it's enough to have a single component contain all the necessary data, what if something needs to interact with multiple components? For example, click handling might necessitate accessing the Clickable component and the BoundingBox component, and if we go for an ECS-like approach, that means a lot of extra queries from different containers. It gets even more messy with widget hierarchies, since it's rare that things can be accessed in neat sequential loops. More often than not, it will end up requiring a lot of random access.

This is not to say that this approach is bad, just wondering if there exist ways to mitigate this issue (or if the issue is worth worrying about at all). And as mentioned, this also makes the systems less intuitive. OOP is appealing because all the required logic is encapsulated, but this can easily sacrifice modularity.

Put very crudely, my main issue is that I have a hard time fully visualizing the necessary systems, i.e how the API looks on the client side, and how the backend should be structured. Qt appears very friendly on the client side, although the backend is a bit of a nightmare (I spent quite a bit of time diving into the source code). HTML/CSS-like approaches do hint at how things should be data-driven, but there's almost no material on what the backend of that looks like.

@yah-nosh Yes that is what I meant, it also allows you to write really specific one of behaviours that really have no business messing with the other code that you would have in a monolithic widget design. You data drive the widget composition, we never wrote out the actual composition I showed earlier, there were macros in the data processing that worked like C++ preprocessor to do things for you.
A lot of people are stuck in the mode that game ui has to be like application ui and it just is not the same. The requirements for a game UI system just are far less complex than for a normal application.

@xrbtrx You can mix ideas of both approaches but trust me the component one that splits of all concerns is far easier to create a UI with. Scaleform was a nightmare to work with, I got forced to use flash on the last couple of games I worked on and it was not fun, it wasn't as bad as the Snake(Python)-based UI system. The real reason everyone was using Scaleform is that you could attracked people who worked in flash before. Scaleform is a slimmed down version of flash basically, you dont have the fancy features that the full version has, this is the same with most HTML based UI systems too btw.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

This topic is closed to new replies.

Advertisement