Eliminating virtual fn calls from game loop: is it worth it?

Started by
5 comments, last by dp304 6 years, 2 months ago

Hello!

As far as I understand, the traditional approach to the architecture of a game with different states or "screens" (such as a menu screen, a screen where you fly your ship in space, another screen where you walk around on the surface of a planet etc.) is to make some sort of FSM with virtual update/render methods in the state classes, which in turn are called in the game loop; something similar to this:


struct State {
        virtual void update()=0;
        virtual void render()=0;
        virtual ~State() {}
};

struct MenuState:State {
        void update() override { /*...*/ }
        void render() override { /*...*/ }
};

struct FreeSpaceState:State {
        void update() override { /*...*/ }
        void render() override { /*...*/ }
};

struct PlanetSurfaceState:State {
        void update() override { /*...*/ }
        void render() override { /*...*/ }
};

MenuState menu;
FreeSpaceState freespace;
PlanetSurfaceState planet;
State * states[] = {&menu, &freespace, &planet};
int currentState = 0;

void loop() {
        while (!exiting) {
                /* Handle input, time etc. here */
                states[currentState]->update();
                states[currentState]->render();
        }
}

int main() {
        loop();
}

My problem here is that if the state changes only rarely, like every couple of minutes, then the very same update/render method will be called several times for that time period, about 100 times per second in case of a 100FPS game. This seems a bit to make dynamic dispatch, which has some performance penalty, pointless. Of course, one may argue that a couple hundred virtual function calls per second is nothing for even a not so modern computer, and especially nothing compared to the complexity of the render/update function in a real life scenario. But I am not quite sure. Anyway, I might have become a bit too paranoid about virtual functions, so I wanted to somehow "move out" the virtual function calls from the game loop, so that the only time a virtual function is called is when the game enters a new state. This is what I had in mind:


template<class TState>
void loop(TState * state) {
	while (!exiting && !stateChanged) {
		/* Handle input, time etc. here */
		state->update();
		state->render();
	}
}

struct State {
	/* No update or render function declared here! */
	virtual void run()=0;
	virtual ~State() {}
};

struct MenuState:State {
	void update() { /*...*/ }
	void render() { /*...*/ }
	void run() override { loop<MenuState>(this); }
};

struct FreeSpaceState:State {
	void update() { /*...*/ }
	void render() { /*...*/ }
	void run() override { loop<FreeSpaceState>(this); }
};

struct PlanetSurfaceState:State {
	void update() { /*...*/ }
	void render() { /*...*/ }
	void run() override { loop<PlanetSurfaceState>(this); }
};

MenuState menu;
FreeSpaceState freespace;
PlanetSurfaceState planet;
State * states[] = {&menu, &freespace, &planet};

void run() {
	while (!exiting) {
		stateChanged = false;
		states[currentState]->run();  /* Runs until next state change */
	}
}

int main() {
	run();
}

The game loop is basically the same as the one before, except that it now exits in case of a state change as well, and the containing loop() function has become a function template.

Instead of loop() being called directly by main(), it is now called by the run() method of the concrete state subclasses, each instantiating the function template with the appropriate type. The loop runs until the state changes, in which case the run() method shall be called again for the new state. This is the task of the global run() function, called by main().

There are two negative consequences. First, it has become slightly more complicated and harder to maintain than the one above; but only SLIGHTLY, as far as I can tell based on this simple example. Second, code for the game loop will be duplicated for each concrete state; but it should not be a big problem as a game loop in a real game should not be much more complicated than in this example.

My question: Is this a good idea at all? Does anybody else do anything like this, either in a scenario like this, or for completely different purposes? Any feedback is appreciated!

Advertisement

No, it's not a good idea to make your code convoluted to reduce 100 virtual function calls per second. If it were 1998, it might be different, but it's not 1998.

When you have tens of thousands of virtual function calls per second, then it might be worth looking at.

 

Thanks for the quick answer! I was suspecting something like that, but looking at the trend of reducing the amount of inheritance and virtual function calls in games (e.g. by using Entity-Component-System frameworks instead), I became unsure; although probably the trend is due to those tens of thousands of virtual calls (involving game objects).

(But if I understand correctly, there used to be a time when even a couple of virtual calls per frame was a concern that needed to be optimised - there were games developed in C++ as early as 1998, weren't there? If I ever target a very low-end platform, on the level of a Pentium-II - which I probably won't - should I refactor virtual calls? Or in that case should I forget about C++ altogether?)

36 minutes ago, dp304 said:

there were games developed in C++ as early as 1998, weren't there?

C++ compiler optimisations were also a lot worse in 1998.

38 minutes ago, dp304 said:

If I ever target a very low-end platform, on the level of a Pentium-II - which I probably won't

A Raspberry PI (or equivalent mobile device) is somewhere on the order of an old Pentium II. So it's not outside the realm of possibility.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

3 hours ago, dp304 said:

there used to be a time when even a couple of virtual calls per frame was a concern that needed to be optimised - there were games developed in C++ as early as 1998, weren't there?

No... There were games in 1998 doing thousands of virtual calls per frame (or equivalents, such as calling tables of function pointers).

3 hours ago, dp304 said:

...looking at the trend of reducing the amount of inheritance and virtual function calls in games (e.g. by using Entity-Component-System frameworks instead...

There's lots of ECS frameworks that do things that are way slower than a simple virtual function call -- e.g. the whole "parent->GetComponent<Foo>()->DoStuff()" pattern.

It's worthwhile learning how to implement virtual functions yourself so you can intuit how they work under the hood. Say you've got a simple example, like:



	class Base
	{
	public:
		virtual void DoStuff(int a, int b) = 0;
	};
	class Derived : public Base
	{
	public:
		virtual void DoStuff(int a, int b)
		{
			printf("%d, %d", a, b);
		}
	};

This is roughly equivalent to this manually written version:


typedef void (FnDoStuff)(void*, int, int);
struct BaseVTable
{
	FnDoStuff* doStuff;
};
struct Base
{
	BaseVTable* vtable;
	void DoStuff(int a, int b)
	{
		vtable->doStuff(this, a, b);
	}
};

extern BaseVTable g_DerivedVtable;
struct Derived
{
	Base base;

	Derived()
	{
		base.vtable = &g_DerivedVtable;
	}
	void DoStuff(int a, int b)
	{
		printf("%d, %d", a, b);
	}

	static void _DoStuff(void* self, int a, int b)
	{
		((Derived*)self)->DoStuff(a, b);
	}
};
BaseVTable g_DerivedVtable = { Derived::_DoStuff };

The base class has one member, which is a pointer to a structure containing one function pointer for each virtual function. Each derived type declares a global instance of this structure, containing pointers to it's versions of the virtual functions. The constructor for the derived type initializes the base "vtable" pointer. When you call the virtual function on the base class, it uses the vtable pointer to fetch the function pointer (any variable access via a pointer is potentially a cache miss -- maybe basically no cost, or maybe a few hundred CPU cycles), and then it calls that function via the function pointer (which is either the same cost as a regular function call, or if the CPU mispredicts the branch, then maybe a two dozen CPU cycle penalty).

It's basically the same cost as any other part of your code that uses pointers -- maybe free, or maybe some cost in hundreds of nanoseconds.

 

The way that game optimization usually works, is that you have a target frame time for a given HW configuration -- say 16.66ms (60Hz) -- and if the game is over that frame time, you try and find bits of code that you can speed up. Usually, you're several milliseconds over, so you're looking at things that cost at least ~500μs as optimization targets to try and get that time back and get yourself under budget.

If you have one virtual function call per frame as part of your main loop, it's a potential 100ns (0.1μs / 0.0001ms) saving that you could address. That's not going to make it onto anyone's list of things to spend time on! :D

 

Your answers have been very helpful. Especially the last paragraph with the timing estimates. Probably I won't bother optimising my game loop (at least not the way I described). Not even when working on a computer from '98. Thanks again!

This topic is closed to new replies.

Advertisement