Unheard Engine: 2 months Journey of Vulkan Learning

posted in The Graphic Guy Squall

Published November 20, 2022

Vulkan

*!Long article warning. If you're the type of observing codebase directly. Referring to the project link at the bottom of this post.

All contents below are just forwarding from my word press post.

==================================================================================================================================================================

Or to be more specific, I spent 57 days to build this small, lovely Vulkan engine :)

Since I mostly worked with Unity and Unreal before. I randomly pick up a vocabulary which starts with un- . This is it! My Unheard Engine. (I'll simply use the term UHE in following paragraphs.)

Also I'm just a graphic guy, I've not planned to implement other subjects like AI, physics or audio.

If you haven't touched any code of Vulkan yet, I suggest to start with Vulkan Tutorial website. Hopefully this article can help those who are seeking more advanced applications.

Environment

As like many Vulkan novices, I started learning from the Vulkan Tutorial, and of course the Vulkan Specification. You probably expect UHE is a combination of Vulkan + GLFW + GLM.

But, no! Since I had been working with DirectX 12 before, I'd like to choose the environment I'm familiar with. Also, I don't want just copy-paste from tutorial this time. I hope I can do this in my own style. So the UHE is a combination of Vulkan + Win32 Window + DirectX math + HLSL + WIC texture loader + FBX importer.
Almost everything is the same as my previous works, only the Vulkan and FBX SDK are the first time for me.

A full list of UHE dependencies:
● IDE: Visual Studio 2022
● C++ Version: C++17
● Vulkan SDK version: 1.3.224.1
● FBX SDK version: 2020.0.1
● Windows SDK Version: 10.0.20348.0
● Platform Toolset: v143
● DXC Compiler: 2022_07_18 release on their GitHub
Be sure to satisfy the environments if you want to execute UHE.

Viking Huts is downloaded from CGTrader:
https://www.cgtrader.com/free-3d-models/exterior/house/viking-hut

Skybox textures from:
https://opengameart.org/content/sky-box-sunny-day

I really appreciate those artists who provide free source for testing :)
My development machine: i7 - 11375H, RTX 3060 Laptop, 16GB RAM

Debug vs Release Build

I differentiated lots of design with debug and release. With a custom definition WITH_DEBUG.
Features like editor UIs, raw asset importing, GPU labeling, and Vulkan debug layer validation will only be available in debug build.
There is no need to compile editor-only features in release build. Also, there will have significant performance hits when using debug build.

False Errors

There are false positive errors that I don't know how to deal with at the moment.
None of them would cause rendering issues, however.

● VUID-VkSwapchainCreateInfoKHR-imageFormat-01778(ERROR / SPEC) VUID-VkImageViewCreateInfo-usage-02275(ERROR / SPEC)
I got this only when RTSS is enabled. I guess other FPS overlay tools might have the similar issue. If you have any kinds of overlay tools, try to ignore UHE in these tools.

● Microsoft C++ exception: MONZA::IgcThreadingContext::msg_end
This exception shows up when I call vkCreateSwapchainKHR(). That is - when resizing, toggling full screen, restoring from minimized window, it shows up.
But again, this won't interrupt the engine and rendering won't be messed up.

I simply ignore those issues at the moment. If someone can point how to solve these false errors, I will really appreciate it!

Code Arrangement

Despite this is a unpaid and learning-purpose project. I took it a bit seriously and implemented lots of high-level classes instead of calling Vulkan functions anywhere.

For example:

All files are implemented from scratch. And mirrored some utilities from my previous works.
For the implementation that has a reference, I put the reference source link in the comments.

If you want to trace UHE codebases / assets, I roughly separate those in the folders:
● Runtime: Contains Classes/Components/Engine/Renderer
Engine: Core files like Engine, Graphic, Input, Config...etc
Classes: Shared class implementation like Texture2D, Scene, Material, Mesh..etc
Components: Transform, Camera, Light...etc, all components have unique ID in UHE.
Renderer: Rendering implementations, also contains a ShaderClass subfolder for shader class implementation.

● Editor: Fbx, texture, and shader importers are here. Also the editor UIs.
● Shaders: HLSL shaders, also has subfolder like "post processing", "ray tracing".
● RawAssets: Raw fbx files or image files for textures.
● Assets: UHE generated assets.
● AssetCaches: Asset caches.
● Game: Game logics are here. Currently it only has UHDemoScript.cpp I used for initializing demo scene.

Current Features

I think I've done enough introductions in the paragraphs above, now proceed to the feature.

-Rendering Pipeline

Here is the current rendering pipeline of UHE (Deferred Rendering):

Anything that I didn't mention here are either WIP or haven't done yet. So don't ask me why UHE doesn't render translucent objects, or why there is no shadow passes, or why it's only ray tracing for shadows. I think the first stage of UHE is done and it's time to share :)

Despite it's small at the moment, it should cover all basic usages like object rendering, light/sky rendering, post processing and even ray tracing rendering.

-Simple Editor

Only available in debug build. They're simply created with C++ resources. Yes, those old school C++ window stuffs lol.

File menu only has "exit" now, and Help menu only has "about" now.
View Mode: Contains rendering buffer visualization now. But the shader used for debug view is a simple pass through shader. That is, even buffer format is RG, it outputs RGBA anyway. So you will likely see more red/yellow/green colors in motion vector view mode.

Settings: Under Window->Settings. You can toggle settings here. Note that some settings need a restart to take effect. Settings will also sync to UHESettings.ini file after closing UHE PROPERLY. That is, hitting "Stop Debugging" button in the visual studio. It won't trigger a file save. I shall add a save feature afterward.
Oh, settings with shortcut key hint can be toggled with key input too.

-Profiling

TBH, I have not done any profiling yet besides simple FPS display.
However, UHE profiling is debug only. You won't see FPS on the window caption in release mode.

For now, I simply use FPS overlay tools. I'll add detailed profiling in the future.

-Asset Importing

As I said above, all raw assets put under RawAssets/ will be loaded automatically. Wrong format will be ignored. If you can see a few dummy.txt files in my raw assets folder, that's for test. A cache simply records raw asset source path, UH asset path, and the last modified time of the source file. Different asset type might have different data stored.

Anyway, when a raw asset is considered cache, it won't be loaded from raw file again. Instead, it loads the generated UH asset files.

When importing textures, only image files supported by WICTextureLoader will be loaded. So I didn't use the old school .dds textures for cube mapping.

As for importing FBX meshes, UHE not only loads the mesh data, but also their transformation info, material properties, and texture usages. Since UHE doesn't have a "scene editor" now. I have to rely on the scene info stored in a FBX.

-Shader Compiling

Using HLSL is possible in the Vulkan. I'd say DirectXShaderCompiler team is the true hero!

\path\to\dxc.exe -spirv -T <target-prfile> -E <entry-point>
                 <hlsl-src-file> -Fo <spirv-bin-file>

Simply pass the -spirv and DXC will do the magic for you. If something goes wrong, you should be able to catch errors from CreateProcess() or vkCreateShaderModule().
In UHE, more command arguments are sent to DXC.exe:

std::string CompileCmd = " -spirv -T " + ProfileName + " -E " + EntryName + " "
		+ std::filesystem::absolute(InSource).string()
		+ " -Fo " + std::filesystem::absolute(OutputShaderPath).string()
		+ " -fvk-use-dx-layout "
		+ " -fvk-use-dx-position-w "
		+ " -fspv-target-env=vulkan1.1spirv1.4 ";

	// add define command lines
	for (const std::string& Define : Defines)
	{
		CompileCmd += " -D " + Define;
	}

I preserved the DirectX layout and position w functionality. Also change the target to Vulkan1.1 + Spirv1.4 for ray tracing. Without the target env, it will default as Vulkan1.0 and you can't implement ray tracing!

Another argument worth to mention is the Shader Defines. UHE uses a simple keyword management for shaders. Consider the following code in BasePixelShader.hlsl:

#if WITH_DIFFUSE
	float3 BaseColor = DiffuseTex.Sample(DiffuseSampler, Vin.UV0).rgb * Material.DiffuseColor.rgb;
#else
	float3 BaseColor = Material.DiffuseColor.rgb;
#endif

I don't want that sample function when it's rendered without diffuse texture. So keyword control is important. I implemented a simple shader variant workflow for compiling shaders:

So it is possible the same shader is compiled multiple times. Depending on the usage of shader defines. I'll generate a hash code (DJB2) for different define combinations.
Shader compiling is also cached, UHE only compiles modified shaders.

-Culling

Oh, to be honest. No culling is done at the moment in UHE. Even the most basic frustum culling. Also no draw batching.
Another detail I want to mention is, UHE uses reversed infinite z depth for the best depth precision. So an object will never be culled once it's rendered in GPU.
I'll do culling / optimization in the future.

Implementation Details

TBH, this might be the longest section I've ever written by far. If you're the type of observing codebase directly, referring the GitHub link I provided at the bottom of this article.

-The Game loop

All game engines in the world have a game loop. Which is the most basic unit in a game app.

while (msg.message != WM_QUIT)
    {
        if (PeekMessage(&msg, 0, 0, 0, PM_REMOVE))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
        else
        {
            // call the game loop
            if (GUnheardEngine)
            {
            #if WITH_DEBUG
                GUnheardEngine->GetEditor()->OnEditorUpdate();
            #endif

                // update despite it's minimized (can be opt out in the future)
                GUnheardEngine->Update();

                // only render when it's not minimized
                if (!GIsMinimized)
                {
                    GUnheardEngine->RenderLoop();
                }
            }
        }
    }

It's really simply at the moment. Only Update + Rendering in the UHE.
Note that I didn't pause the Update function but only the RenderLoop. So the logics are still updated when it's offscreen.

EditorUpdate(): This merely syncs the state between the editor UI controls and runtime.
Update(): Game timer ticking, game script calling, rendering feature toggling, scene updating, renderer updating, raw input updating, and frame counter are here.
RenderLoop(): Notify rendering thread.

The entry point file also contains the windows message callbacks. UHE also deals with those messages like WM_INPUT, WM_DESTROY, WM_COMMAND, WM_SIZE.
Following I'll only mention the rendering parts of the UHE.

-Graphic Interface Initialization

Similar to the Vulkan tutorial. It's a long pieces of creating instance, physical device, logical device, window surface, queue family, swap chain...etc.

	// extension defines, hard code for now
	InstanceExtensions = { "VK_KHR_surface"
		, "VK_KHR_win32_surface"
		, "VK_KHR_get_surface_capabilities2"
		, "VK_KHR_get_physical_device_properties2" };

	DeviceExtensions = { VK_KHR_SWAPCHAIN_EXTENSION_NAME
		, "VK_EXT_full_screen_exclusive"
		, "VK_KHR_spirv_1_4"
		, "VK_KHR_shader_float_controls" };

	RayTracingExtenstions = { "VK_KHR_deferred_host_operations"
		, "VK_KHR_acceleration_structure"
		, "VK_KHR_ray_tracing_pipeline"
		, "VK_KHR_ray_query"
		, "VK_KHR_pipeline_library" };

	// push ray tracing extension
	if (GEnableRayTracing)
	{
		DeviceExtensions.insert(DeviceExtensions.end(), RayTracingExtenstions.begin(), RayTracingExtenstions.end());
	}

List above is the extensions used in UHE.
Since ray tracing feature is here, I made the create info more verbose than usual:

	// define features, enable what I need in UH
	VkPhysicalDeviceFeatures DeviceFeatures{};
	DeviceFeatures.samplerAnisotropy = true;
	DeviceFeatures.fullDrawIndexUint32 = true;

	// check ray tracing & AS & ray query feature
	VkPhysicalDeviceAccelerationStructureFeaturesKHR ASFeatures{};
	ASFeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ACCELERATION_STRUCTURE_FEATURES_KHR;

	VkPhysicalDeviceRayQueryFeaturesKHR RQFeatures{};
	RQFeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_RAY_QUERY_FEATURES_KHR;
	RQFeatures.pNext = &ASFeatures;

	VkPhysicalDeviceRayTracingPipelineFeaturesKHR RTFeatures{};
	RTFeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_RAY_TRACING_PIPELINE_FEATURES_KHR;
	RTFeatures.pNext = &RQFeatures;

	// 1_2 runtime features
	VkPhysicalDeviceVulkan12Features Vk12Features{};
	Vk12Features.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_2_FEATURES;
	Vk12Features.pNext = &RTFeatures;

	// device feature needs to assign in fature 2
	VkPhysicalDeviceFeatures2 PhyFeatures{};
	PhyFeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2;
	PhyFeatures.features = DeviceFeatures;
	PhyFeatures.pNext = &Vk12Features;

	vkGetPhysicalDeviceFeatures2(PhysicalDevice, &PhyFeatures);
	if (!RTFeatures.rayTracingPipeline)
	{
		UHE_LOG(L"Ray tracing pipeline not supported. System won't render ray tracing effects.\n");
		GEnableRayTracing = false;
	}

	// get RT feature props
	VkPhysicalDeviceRayTracingPipelinePropertiesKHR RTPropsFeatures{};
	RTPropsFeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_RAY_TRACING_PIPELINE_PROPERTIES_KHR;

	VkPhysicalDeviceProperties2 Props2{};
	Props2.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2;
	Props2.pNext = &RTPropsFeatures;
	vkGetPhysicalDeviceProperties2(PhysicalDevice, &Props2);
	ShaderRecordSize = RTPropsFeatures.shaderGroupHandleSize;

	// device create info, pass raytracing feature to pNext of create info
	VkDeviceCreateInfo CreateInfo{};
	CreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
	CreateInfo.pQueueCreateInfos = &QueueCreateInfo[0];
	CreateInfo.queueCreateInfoCount = 2;
	CreateInfo.pEnabledFeatures = nullptr;
	CreateInfo.pNext = &PhyFeatures;

It's just all kinds of pNext setup. Require the feature and enable them.

-Rendering Updates

Happens in UHDeferredShadingRenderer::Update(). For now it's just uploading constant or storage buffers used for rendering. A dirty flagged workflow is used when uploading buffer:

if (Light->IsRenderDirty(CurrentFrame))
		{
			UHDirectionalLightConstants DirLightC = Light->GetConstants();
			DirectionalLightBuffer[CurrentFrame]->UploadData(&DirLightC, Light->GetBufferDataIndex());
			Light->SetRenderDirty(false, CurrentFrame);
		}

The same workflow also applies to system buffer, material buffer, and object buffer.
You don't have to upload all stuffs every frame for the best performance.
The setting of the dirty flag is based on the component implementation. For example, renderers, lights are uploaded if their transforms are changed.
Only system buffer is uploaded every frame for now.

Also for all buffers that will be uploaded, I created them as so called "Uploading Buffer".
That is, the uploading buffer will be mapped GPU right after creation, and only be unmapped when destroy. This will be more efficient than call vkMapMemory()/vkUnmapMemory() every frame!

The only note is uploading buffer must be carefully synchronized by fences. For now, UHE defines GMaxFrameInFlight as 2. So every upload buffers have 2 instances.

-Rendering Initialization

Happens in UHDeferredShadingRenderer::Initialize(). As how it's named, for initialization.

Mesh initialization

By the time UHE enters Initialize(), all meshes are already loaded on CPU side. Here is simply creating GPU buffers and uploading them.

	// create mesh buffer
	// VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT is necessary for buffer address access
	VkBufferUsageFlags VBFlags = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;
	VkBufferUsageFlags IBFlags = VK_BUFFER_USAGE_INDEX_BUFFER_BIT;

	if (GEnableRayTracing)
	{
		VBFlags |= VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
		IBFlags |= VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
	}

	VertexBuffer->CreaetBuffer(GetVertexCount(), VBFlags);
	IndexBuffer->CreaetBuffer(GetIndicesCount(), IBFlags);

Besides VK_BUFFER_USAGE_VERTEX_BUFFER_BIT and VK_BUFFER_USAGE_INDEX_BUFFER_BIT. A few more buffer usages are set for ray tracing. Because I need to build acceleration structures for them, and use them in the ray tracing shader later.
Inside the CreateBuffer function, it's regular vkCreateBuffer, vkAllocateMemory, vkBindBufferMemory. Just aware if the buffer contains usage VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT, VkMemoryAllocateFlagsInfo must be set to the pNext of VkMemoryAllocateInfo.

        VkMemoryAllocateInfo AllocInfo{};
        AllocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
        AllocInfo.allocationSize = MemRequirements.size;
        AllocInfo.memoryTypeIndex = UHUtilities::FindMemoryTypes(&DeviceMemoryProperties, MemRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

        VkMemoryAllocateFlagsInfo MemFlagInfo{};
        MemFlagInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_FLAGS_INFO;
        MemFlagInfo.flags = VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT;

        if (bIsShaderDeviceAddress)
        {
            // put the VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT flag if buffer usage has VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT
            AllocInfo.pNext = &MemFlagInfo;
        }

Bottom Level Acceleration Structure Initialization

In order for ray tracing, acceleration structure needs to be built. (I'll use AS for indication.) Referring DirectX Raytracing (DXR) for more details about RT if you haven't experienced it before.

In brief, the bottom level AS is for geometries. That is, your meshes. Which contains all triangles that are going to be traced. Let's start from code base:

	// filling geometry info, always assume Opaque bit here, I'll override it in top-level AS when necessary
	uint32_t MaxPrimitiveCounts = InMesh->GetIndicesCount() / 3;
	VkAccelerationStructureGeometryKHR GeometryKHR{};
	GeometryKHR.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR;
	GeometryKHR.geometryType = VK_GEOMETRY_TYPE_TRIANGLES_KHR;
	GeometryKHR.flags = VK_GEOMETRY_OPAQUE_BIT_KHR;

	// filling triangles VB/IB infos
	GeometryKHR.geometry.triangles = VkAccelerationStructureGeometryTrianglesDataKHR{};
	GeometryKHR.geometry.triangles.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_TRIANGLES_DATA_KHR;

	// set format for Vertex position, which is float3
	// with proper stride, system should fetch vertex pos properly
	GeometryKHR.geometry.triangles.vertexFormat = VK_FORMAT_R32G32B32_SFLOAT;
	GeometryKHR.geometry.triangles.vertexStride = InMesh->GetVertexBuffer()->GetBufferStride();
	GeometryKHR.geometry.triangles.vertexData.deviceAddress = GetDeviceAddress(InMesh->GetVertexBuffer()->GetBuffer());
	GeometryKHR.geometry.triangles.maxVertex = InMesh->GetHighestIndex();
	GeometryKHR.geometry.triangles.indexType = (InMesh->GetIndexBuffer()->GetBufferStride() == 4) ? VK_INDEX_TYPE_UINT32 : VK_INDEX_TYPE_UINT16;
	GeometryKHR.geometry.triangles.indexData.deviceAddress = GetDeviceAddress(InMesh->GetIndexBuffer()->GetBuffer());

VkAccelerationStructureGeometryKHR is the first structure needed. As shown in the code, I set the geometry type as VK_GEOMETRY_TYPE_TRIANGLES_KHR. I don't plan ray tracing other types at the moment. The flag is also set as VK_GEOMETRY_OPAQUE_BIT_KHR. Always mark bottom level AS to opaque as a practice. If translucent or alpha test is needed and you really need to use any hit shaders. You can override this flag when building top level AS.

One parameter that might confuse you is the geometry.triangles.vertexFormat. It's impossible to find a format which covers all mesh properties. This format is only for the position data, which is usually the first data in a mesh. Since my position define is float3, I assign VK_FORMAT_R32G32B32_SFLOAT and proper stride. The DXR will do all the jobs for us.

As for geometry.triangles.vertexData.deviceAddress or indexData.deviceAddress, I must get the device address of a buffer, that's why I create VB/IB with VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage. Getting device address is done through vkGetBufferDeviceAddress

VkDeviceAddress UHAccelerationStructure::GetDeviceAddress(VkBuffer InBuffer)
{
	VkBufferDeviceAddressInfo AddressInfo{};
	AddressInfo.sType = VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO;
	AddressInfo.buffer = InBuffer;

	return vkGetBufferDeviceAddress(LogicalDevice, &AddressInfo);
}

After GeometryKHR is filled, you can proceed to VkAccelerationStructureBuildGeometryInfoKHR.

	// filling geometry info
	VkAccelerationStructureBuildGeometryInfoKHR GeometryInfo{};
	GeometryInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR;
	GeometryInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
	GeometryInfo.mode = VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR;
	GeometryInfo.flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR;
	GeometryInfo.geometryCount = 1;
	GeometryInfo.pGeometries = &GeometryKHR;

This part is to fill the build type, AS type, build mode, and flags. Since I create one bottom level AS for one mesh, I set geometryCount to 1 and point to the GeometryKHR.
If you'd like to build multiple meshes as a bottom AS, just filling more GeometryKHRs.

Now, I got the information of the AS. The next thing to do is find the size information for creating the AS, and create it.

// fetch the size info before creating AS based on geometry info
	PFN_vkGetAccelerationStructureBuildSizesKHR GetASBuildSizesKHR = (PFN_vkGetAccelerationStructureBuildSizesKHR)vkGetInstanceProcAddr(VulkanInstance, "vkGetAccelerationStructureBuildSizesKHR");
	VkAccelerationStructureBuildSizesInfoKHR SizeInfo{};
	SizeInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_SIZES_INFO_KHR;
	GetASBuildSizesKHR(LogicalDevice, VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR, &GeometryInfo, &MaxPrimitiveCounts, &SizeInfo);

	// build bottom-level AS after getting proper sizes
	AccelerationStructureBuffer = GfxCache->RequestRenderBuffer<BYTE>();
	AccelerationStructureBuffer->CreaetBuffer(SizeInfo.accelerationStructureSize, VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_STORAGE_BIT_KHR);

	VkAccelerationStructureCreateInfoKHR CreateInfo{};
	CreateInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR;
	CreateInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
	CreateInfo.buffer = AccelerationStructureBuffer->GetBuffer();
	CreateInfo.size = SizeInfo.accelerationStructureSize;

	PFN_vkCreateAccelerationStructureKHR CreateASKHR = (PFN_vkCreateAccelerationStructureKHR)vkGetInstanceProcAddr(VulkanInstance, "vkCreateAccelerationStructureKHR");
	if (CreateASKHR(LogicalDevice, &CreateInfo, nullptr, &AccelerationStructure) != VK_SUCCESS)
	{
		UHE_LOG(L"Failed to create bottom level AS!\n");
	}

Another scary pieces but it's actually straight forward:
● Calling vkGetAccelerationStructureBuildSizesKHR() to fill VkAccelerationStructureBuildSizesInfoKHR.
● Create a buffer which has the same size as accelerationStructureSize returned by device.
● Filling VkAccelerationStructureCreateInfoKHR and call vkCreateAccelerationStructureKHR().

It's not finished yet! The vkCreateAccelerationStructureKHR() is just used for allocating VkAccelerationStructureKHR. We also need to call vkCmdBuildAccelerationStructuresKHR() for AS building on GPU.

	// allocate scratch buffer as well, this buffer is for initialization
	ScratchBuffer = GfxCache->RequestRenderBuffer<BYTE>();
	ScratchBuffer->CreaetBuffer(SizeInfo.buildScratchSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
	GeometryInfo.scratchData.deviceAddress = GetDeviceAddress(ScratchBuffer->GetBuffer());

	// actually build AS, this needs to push command
	VkAccelerationStructureBuildRangeInfoKHR RangeInfo{};
	RangeInfo.primitiveCount = MaxPrimitiveCounts;
	const VkAccelerationStructureBuildRangeInfoKHR* RangeInfos[1] = { &RangeInfo };
	GeometryInfo.dstAccelerationStructure = AccelerationStructure;

	PFN_vkCmdBuildAccelerationStructuresKHR CmdBuildASKHR = (PFN_vkCmdBuildAccelerationStructuresKHR)vkGetInstanceProcAddr(VulkanInstance, "vkCmdBuildAccelerationStructuresKHR");
	CmdBuildASKHR(InBuffer, 1, &GeometryInfo, RangeInfos);

Before building it, a scratch buffer must be created for building AS. And assign it to GeometryInfo. This scratch buffer is a temporary buffer used for creation only. Can be released after the creation.

And finally, bottom level AS is built.

Top Level Acceleration Structure Initialization

Similar to bottom level AS building. You fill all kinds of information structures and build it.
Top level AS building starts with VkAccelerationStructureInstanceKHR.

		// hit every thing for now
		VkAccelerationStructureInstanceKHR InstanceKHR{};
		InstanceKHR.mask = 0xff;

		// set bottom level address
		VkAccelerationStructureKHR BottomLevelAS = InRenderers[Idx]->GetMesh()->GetBottomLevelAS()->GetAS();
		InstanceKHR.accelerationStructureReference = GetDeviceAddress(BottomLevelAS);

		// copy transform3x4
		XMFLOAT3X4 Transform3x4 = MathHelpers::MatrixTo3x4(InRenderers[Idx]->GetWorldMatrix());
		std::copy(&Transform3x4.m[0][0], &Transform3x4.m[0][0] + 12, &InstanceKHR.transform.matrix[0][0]);

		// two-sided flag
		if (Mat->GetCullMode() == VK_CULL_MODE_NONE)
		{
			InstanceKHR.flags |= VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR;
		}

		// non-opaque flag, cutoff is treated as translucent as well so I can ignore the hit on culled pixel
		if (Mat->GetBlendMode() > UHBlendMode::Opaque)
		{
			InstanceKHR.flags |= VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR;
		}

		// set material buffer data index as instance id, so I can fetch material data from StructuredBuffer in hit group shader
		InstanceKHR.instanceCustomIndex = Mat->GetBufferDataIndex();

		InstanceKHRs.push_back(InstanceKHR);

● mask: can be used for culling during ray tracing, for now I set to 0xff.
● accelerationStructureReference: is the address of bottom level AS. The relationship is like a renderer and a mesh. You can reuse the same mesh with different renderers.
● transform: is the world transform for the instance. Be sure converting to 3x4 matrix.
● flags: for now I check the cull mode and decides if to disable triangle face culling during ray tracing. You probably ask: how about back/front culling? Those are specified in the shader, not here.
Another flag is VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR, this is for the alpha test or translucent objects. Any hit shader will be called for this instance.
● instanceCustomIndex: An user-defined index. For now I assign the material buffer data index. So I can call InstanceID() in the ray tracing shader and utilize it.

The second step of top level AS, is to create a AS instance buffer and upload all instance info added.

	// create instance KHR buffer for later use
	ASInstanceBuffer = GfxCache->RequestRenderBuffer<VkAccelerationStructureInstanceKHR>();
	ASInstanceBuffer->CreaetBuffer(InstanceCount, VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
	ASInstanceBuffer->UploadAllData(InstanceKHRs.data());

	// setup instance type
	VkAccelerationStructureGeometryKHR GeometryKHR{};
	GeometryKHR.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR;
	GeometryKHR.geometryType = VK_GEOMETRY_TYPE_INSTANCES_KHR;
	GeometryKHR.geometry.instances = VkAccelerationStructureGeometryInstancesDataKHR{};
	GeometryKHR.geometry.instances.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR;
	GeometryKHR.geometry.instances.data.deviceAddress = GetDeviceAddress(ASInstanceBuffer->GetBuffer());

	// geometry count must be 1 when it's top level
	VkAccelerationStructureBuildGeometryInfoKHR GeometryInfo{};
	GeometryInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR;
	GeometryInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR;
	GeometryInfo.mode = VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR;
	GeometryInfo.flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR | VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR;
	GeometryInfo.geometryCount = 1;
	GeometryInfo.pGeometries = &GeometryKHR;

Remaining parts are similar to bottom level AS, you get the size info, allocate VkAccelerationStructureKHR, and call vkCmdBuildAccelerationStructuresKHR at the end.
Note that I also set VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR flag so I can update top level AS later for dynamic objects.

The code piece for mesh initialization:

	// needs the cmd buffer
	VkCommandBuffer CreationCmd = GraphicInterface->BeginOneTimeCmd();

	for (UHMeshRendererComponent* Renderer : Renderers)
	{
		UHMesh* Mesh = Renderer->GetMesh();
		UHMaterial* Mat = Renderer->GetMaterial();
		Mesh->CreateGPUBuffers(GraphicInterface);

		if (!Mat->IsSkybox())
		{
			Mesh->CreateBottomLevelAS(GraphicInterface, CreationCmd);
		}
	}
	GraphicInterface->EndOneTimeCmd(CreationCmd);

	// create top level AS after bottom level AS is done
	// can't be created in the same command line!! All bottom level AS must be created before creating top level AS
	CreationCmd = GraphicInterface->BeginOneTimeCmd();
	for (int32_t Idx = 0; Idx < GMaxFrameInFlight; Idx++)
	{
		TopLevelAS[Idx] = GraphicInterface->RequestAccelerationStructure();
		RTInstanceCount = TopLevelAS[Idx]->CreateTopAS(Renderers, CreationCmd);
	}
	GraphicInterface->EndOneTimeCmd(CreationCmd);

All bottom level AS must be created (and finish uploading to GPU) before initializing top level AS! Otherwise weird things are going to happen during ray tracing. Such as missing triangles, wrong transformations....

Texture Initialization

Step1: uploading all textures which are really using for rendering
Step2: Generate mip maps for all uploaded textures
Step3: Build all cubemaps in use
The mipmap generation is based on Vulkan Tutorial one. So I'll skip it here.
As for building cube map, I've mentioned I didn't import dds file. So cube map in UHE must be built from Texture2D objects. That is, six separated images for a cube. And here is the building code:

	// simply copy all slices into cube map
	for (int32_t Idx = 0; Idx < 6; Idx++)
	{
		// if texture slices isn't built yet, build it
		Slices[Idx]->UploadToGPU(InGfx, InCmd, InGraphBuilder);
		Slices[Idx]->GenerateMipMaps(InGfx, InCmd, InGraphBuilder);

		// transition and copy, all mip maps need to be copied
		for (uint32_t Mdx = 0; Mdx < Slices[Idx]->GetMipMapCount(); Mdx++)
		{
			InGraphBuilder.ResourceBarrier(Slices[Idx], VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, Mdx);
			InGraphBuilder.ResourceBarrier(this, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, Mdx, Idx);
			InGraphBuilder.CopyTexture(Slices[Idx], this, Mdx, Idx);
			InGraphBuilder.ResourceBarrier(this, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, Mdx, Idx);
			InGraphBuilder.ResourceBarrier(Slices[Idx], VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, Mdx);
		}
	}

The ResourceBarrier function is just a high-level wrapper for vkCmdPipelineBarrier().
And the CopyTexture() for vkCmdCopyImage(). I design these function with mip level, and layer slice support. So it can not only copy whole image, but also a mip slice.
Referring UHGraphicBuilder::CopyTexture() to see how copying is done.
The order of textures sent to cube map need to be in: +X/-X/+Y/-Y/+Z/-Z.

When creating a cube map, image type is still VK_IMAGE_TYPE_2D. Since it's a image array with exactly 6 slices.

if (InInfo.ViewType == VK_IMAGE_VIEW_TYPE_CUBE)
		{
			CreateInfo.arrayLayers = 6;
			CreateInfo.flags |= VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT;
		}

if (InViewType == VK_IMAGE_VIEW_TYPE_CUBE)
{
	CreateInfo.subresourceRange.layerCount = 6;
}

In both image and image view creation. You must specify the layer count as 6 and has VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT set.

Other initialization

As for creating Renderpass, Framebuffer, SwapChain, and descriptor. They're similar to the Vulkan tutorials so I won't mention them here. And they're all wrapped in some high-level classes. So it can be verbose to paste all of them here.

I only want to mention that I didn't call vkUpdateDescriptorSets() every frame. If the resource binding isn't changed. You don't have to update descriptor set. I only call it during initialization, or resizing event occurred, or top level AS is rebuilt, and in the post processing.

-Rendering Loop

Finally, this article enters the last part of UHE - the rendering loop.
For now, the rendering is completely done in a render thread.
Main thread is for updating logic. The following is the code piece of rendering loop:

while (true)
	{
		// wait until main thread notify
		std::unique_lock<std::mutex> Lock(RenderMutex);
		WaitRenderThread.wait(Lock, [this] {return !bIsThisFrameRenderedShared; });

		if (bIsRenderThreadDoneShared)
		{
			break;
		}

		// prepare necessary variable
		UHGraphicBuilder GraphBuilder(GraphicInterface, MainCommandBuffers[CurrentFrame]);

		// similar to D3D12 fence wait/reset
		GraphBuilder.WaitFence(MainFences[CurrentFrame]);
		GraphBuilder.ResetFence(MainFences[CurrentFrame]);

		// begin command buffer, it will reset command buffer inline
		GraphBuilder.BeginCommandBuffer();
		GraphicInterface->BeginCmdDebug(GraphBuilder.GetCmdList(), "Drawing UHDeferredShadingRenderer");
	
		// ****************************** start scene rendering
		RenderBasePass(GraphBuilder);
		BuildTopLevelAS(GraphBuilder);
		DispatchRayPass(GraphBuilder);
		RenderLightPass(GraphBuilder);
		RenderSkyPass(GraphBuilder);
		RenderMotionPass(GraphBuilder);
		RenderPostProcessing(GraphBuilder);

		// blit scene to swap chain
		uint32_t PresentIndex = RenderSceneToSwapChain(GraphBuilder);

		// ****************************** end scene rendering
		GraphicInterface->EndCmdDebug(GraphBuilder.GetCmdList());
		GraphBuilder.EndCommandBuffer();
		GraphBuilder.ExecuteCmd(MainFences[CurrentFrame], SwapChainAvailableSemaphores[CurrentFrame], RenderFinishedSemaphores[CurrentFrame]);

		// present
		bIsResetNeededShared = !GraphBuilder.Present(RenderFinishedSemaphores[CurrentFrame], PresentIndex);

		// advance frame
		CurrentFrame = (CurrentFrame + 1) % GMaxFrameInFlight;

		// tell main thread to continue
		bIsThisFrameRenderedShared = true;
		Lock.unlock();
		RenderThreadFinished.notify_one();
	}

This rendering loop will wait until the notification from main thread.
Main thread will also wait for render thread after notifying it too.
After rendering is done, it will notify the main thread. Such synchronization is needed in case I want to toggle full screen, or toggle vsync..etc. Also for rendering up-to-date result.

Shader & Descriptor Management

All descriptors used in rendering are defined in a custom shader class, I'll use my UHSkyPassShader as an example here:

// Skypass shader implementation
class UHSkyPassShader : public UHShaderClass
{
public:
	UHSkyPassShader() {}
	UHSkyPassShader(UHGraphic* InGfx, std::string Name, VkRenderPass InRenderPass)
		: UHShaderClass(InGfx, Name, typeid(UHSkyPassShader))
	{
		// Sky pass: bind system/object layout and sky cube texture/sampler layout
		AddLayoutBinding(1, VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER);
		AddLayoutBinding(1, VK_SHADER_STAGE_VERTEX_BIT, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER);
		AddLayoutBinding(1, VK_SHADER_STAGE_FRAGMENT_BIT, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE);
		AddLayoutBinding(1, VK_SHADER_STAGE_FRAGMENT_BIT, VK_DESCRIPTOR_TYPE_SAMPLER);

		CreateDescriptor();
		ShaderVS = InGfx->RequestShader("SkyboxVertexShader", "Shaders/SkyboxVertexShader.hlsl", "SkyboxVS", "vs_6_0");
		ShaderPS = InGfx->RequestShader("SkyboxPixelShader", "Shaders/SkyboxPixelShader.hlsl", "SkyboxPS", "ps_6_0");

		// states
		UHRenderPassInfo Info = UHRenderPassInfo(InRenderPass, UHDepthInfo(true, false, VK_COMPARE_OP_GREATER_OR_EQUAL)
			, VK_CULL_MODE_FRONT_BIT
			, UHBlendMode::Opaque
			, ShaderVS
			, ShaderPS
			, 1
			, PipelineLayout);

		GraphicState = InGfx->RequestGraphicState(Info);
	}
};

Managing descriptors can be a nightmare. I suggest wrapping them in a high-level implemenation. In UHE, they're managed in a shader class. Since descriptor binding is reflected to the shader. I think they deserved to be put together.
However, I have not implemented a shader reflection system yet. So descriptor layout binding must have the same order as in the shader now.
Also, the shader class is where the raw shaders and graphic states are requested.

Pool Management

You will see lots of InGfx->RequestXXXX calling in UHE. That's for reuse of all kinds of Vulkan objects. With this Viking Huts scene as an example, despite it has 747 renderers now. UHE only has the following number of objects:
● 5 samplers
● 16 graphic states
The graphic interface will try to find created states and return it when possible.
If I just create graphic states for all renderers, it will be doomsday for the performance.
Trying to reuse all states when possible!

Base Pass Rendering

PBR based calculation is used. But there are not many metallic objects in Viking Huts scene. So only a few shields, the metal parts of barrels might have metallic effects. As shown in the top image. For now, 5 GBuffers are used in UHE.
● Scene Diffuse: R8G8B8A8_SRGB, A channel stores occlusion value
● Scene Normal: R10G10B10A2_UNORM, A channel is not used at the moment.
● Scene PBR: R8G8B8A8_UNORM, RGB for specular, A for roughness.
● Scene Result: R16G16B16A16_SFLOAT, both emissive and indirect specular are added to it too.
● Scene Mip: R16_SFLOAT, store the "DeltaMax" of the result of ddx/ddy of mesh UV. So I can calculate mipmap level outside the pixel shader.

Texture2D DiffuseTex : register(t3);
SamplerState DiffuseSampler : register(s4);

Texture2D OcclusionTex : register(t5);
SamplerState OcclusionSampler : register(s6);

Texture2D SpecularTex : register(t7);
SamplerState SpecularSampler : register(s8);

Texture2D NormalTex : register(t9);
SamplerState NormalSampler : register(s10);

Texture2D OpacityTex : register(t11);
SamplerState OpacitySampler : register(s12);

TextureCube EnvCube : register(t13);
SamplerState EnvSampler : register(s14);

Texture2D MetallicTex : register(t15);
SamplerState MetallicSampler : register(s16);

Texture2D RoughnessTex : register(t17);
SamplerState RoughnessSampler : register(s18);

#if WITH_OPACITY
	float Opacity = OpacityTex.Sample(OpacitySampler, Vin.UV0).r * Material.DiffuseColor.a;
	clip(Opacity - Material.Cutoff);
#endif

If you have a look in BasePixelShader.hlsl, those descriptor defines look scary.
But as I said, I've tried to reuse all state objects in UHE. So there is a chance all SamplerStates here are point to the same object. So performance won't be an issue for now, unless all SamplerStates are different.
In practical, we'd like to combine these textures, for example, put diffuse and opacity together as RGBA. Put specular and metallic texture together...etc.
Since UHE doesn't have an editor and rely on FBX information, they're all separated at the moment.

If you're following Vulkan tutorial, you might be interested how to bind multiple render target(MRT). Since it only mentioned single RT creation.

	std::vector<VkAttachmentDescription> ColorAttachments;
	std::vector<VkAttachmentReference> ColorAttachmentRefs;

	ColorAttachments.resize(RTCount);
	ColorAttachmentRefs.resize(RTCount);

	for (size_t Idx = 0; Idx < RTCount; Idx++)
	{
		// create color attachment, this part desides how RT is going to be used
		VkAttachmentDescription ColorAttachment{};
		ColorAttachment.format = InFormat[Idx];
		ColorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
		ColorAttachment.loadOp = InTransitionInfo.LoadOp;
		ColorAttachment.storeOp = InTransitionInfo.StoreOp;
		ColorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
		ColorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
		ColorAttachment.initialLayout = InTransitionInfo.InitialLayout;
		ColorAttachment.finalLayout = InTransitionInfo.FinalLayout;
		ColorAttachments[Idx] = ColorAttachment;

		// define attachment ref cor color attachment
		VkAttachmentReference ColorAttachmentRef{};
		ColorAttachmentRef.attachment = static_cast<uint32_t>(Idx);
		ColorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
		ColorAttachmentRefs[Idx] = ColorAttachmentRef;
	}


	VkFramebufferCreateInfo FramebufferInfo{};
	FramebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
	FramebufferInfo.renderPass = InRenderPass;
	FramebufferInfo.attachmentCount = static_cast<uint32_t>(InImageView.size());
	FramebufferInfo.pAttachments = InImageView.data();

The key is to push multiple VkAttachmentDescription, and VkAttachmentReference.
Also don't forget to setup FramebufferInfo.attachmentCount. Pretty straight forward :).

Update Top Level AS

Updating top level AS is simple:

	// upload changed data to ASInstanceBuffer
	bool bNeedUpdate = false;
	for (size_t Idx = 0; Idx < InstanceKHRs.size(); Idx++)
	{
		if (RendererCache[Idx]->IsRayTracingDirty(CurrentFrame))
		{
			// copy transform3x4
			XMFLOAT3X4 Transform3x4 = MathHelpers::MatrixTo3x4(RendererCache[Idx]->GetWorldMatrix());
			std::copy(&Transform3x4.m[0][0], &Transform3x4.m[0][0] + 12, &InstanceKHRs[Idx].transform.matrix[0][0]);

			ASInstanceBuffer->UploadData(&InstanceKHRs[Idx], Idx);
			RendererCache[Idx]->SetRayTracingDirty(false, CurrentFrame);
			bNeedUpdate = true;
		}
	}

	if (bNeedUpdate)
	{
		// update it 
		GeometryInfoCache.mode = VK_BUILD_ACCELERATION_STRUCTURE_MODE_UPDATE_KHR;
		GeometryInfoCache.srcAccelerationStructure = GeometryInfoCache.dstAccelerationStructure;
		GeometryInfoCache.pGeometries = &GeometryKHRCache;
		const VkAccelerationStructureBuildRangeInfoKHR* RangeInfos[1] = { &RangeInfoCache };

		PFN_vkCmdBuildAccelerationStructuresKHR CmdBuildASKHR = (PFN_vkCmdBuildAccelerationStructuresKHR)vkGetInstanceProcAddr(VulkanInstance, "vkCmdBuildAccelerationStructuresKHR");
		CmdBuildASKHR(InBuffer, 1, &GeometryInfoCache, RangeInfos);
	}

It's just calling vkCmdBuildAccelerationStructuresKHR() again with new instance information.
Since the parameters passed to GeometryInfo during the initialization are just some address reference. I don't have rebuild everything here. Only update what is needed.

Ray Tracing Shadows

Another exciting part of UHE. I've mentioned how to build AS for raytracing. But for ray tracing, you still need to setup ray generation, closest hit or any hit shaders. And create a RT graphic state.

	VkRayTracingPipelineCreateInfoKHR CreateInfo{};
	CreateInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_PIPELINE_CREATE_INFO_KHR;
	CreateInfo.maxPipelineRayRecursionDepth = InInfo.MaxRecursionDepth;
	CreateInfo.layout = InInfo.PipelineLayout;

	// set RG shader
	std::string RGEntryName = InInfo.RayGenShader->GetEntryName();
	VkPipelineShaderStageCreateInfo RGStageInfo{};
	RGStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
	RGStageInfo.stage = VK_SHADER_STAGE_RAYGEN_BIT_KHR;
	RGStageInfo.module = InInfo.RayGenShader->GetShader();
	RGStageInfo.pName = RGEntryName.c_str();

	// set closest hit shader
	std::string CHGEntryName = InInfo.ClosestHitShader->GetEntryName();
	VkPipelineShaderStageCreateInfo CHGStageInfo{};
	CHGStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
	CHGStageInfo.stage = VK_SHADER_STAGE_CLOSEST_HIT_BIT_KHR;
	CHGStageInfo.module = InInfo.ClosestHitShader->GetShader();
	CHGStageInfo.pName = CHGEntryName.c_str();

	// set any hit shader (if there is)
	std::string AHGEntryName = InInfo.AnyHitShader->GetEntryName();
	VkPipelineShaderStageCreateInfo AHGStageInfo{};
	if (InInfo.AnyHitShader != nullptr)
	{
		AHGStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
		AHGStageInfo.stage = VK_SHADER_STAGE_ANY_HIT_BIT_KHR;
		AHGStageInfo.module = InInfo.AnyHitShader->GetShader();
		AHGStageInfo.pName = AHGEntryName.c_str();
	}

	VkPipelineShaderStageCreateInfo ShaderStages[] = { RGStageInfo, CHGStageInfo, AHGStageInfo };
	CreateInfo.stageCount = 2;
	if (InInfo.AnyHitShader != nullptr)
	{
		CreateInfo.stageCount = 3;
	}
	CreateInfo.pStages = ShaderStages;

	// setup group info for both RG and HG
	VkRayTracingShaderGroupCreateInfoKHR RGGroupInfo{};
	RGGroupInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_SHADER_GROUP_CREATE_INFO_KHR;
	RGGroupInfo.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR;
	RGGroupInfo.closestHitShader = VK_SHADER_UNUSED_KHR;
	RGGroupInfo.anyHitShader = VK_SHADER_UNUSED_KHR;
	RGGroupInfo.intersectionShader = VK_SHADER_UNUSED_KHR;
	RGGroupInfo.generalShader = 0;

	VkRayTracingShaderGroupCreateInfoKHR HGGroupInfo{};
	HGGroupInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_SHADER_GROUP_CREATE_INFO_KHR;
	HGGroupInfo.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR;
	HGGroupInfo.closestHitShader = 1;
	HGGroupInfo.anyHitShader = 2;
	HGGroupInfo.intersectionShader = VK_SHADER_UNUSED_KHR;
	HGGroupInfo.generalShader = VK_SHADER_UNUSED_KHR;

	VkRayTracingShaderGroupCreateInfoKHR GroupInfos[] = { RGGroupInfo , HGGroupInfo };
	CreateInfo.groupCount = 2;
	CreateInfo.pGroups = GroupInfos;

	// set payload size
	VkRayTracingPipelineInterfaceCreateInfoKHR PipelineInterfaceInfo{};
	PipelineInterfaceInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_PIPELINE_INTERFACE_CREATE_INFO_KHR;
	PipelineInterfaceInfo.maxPipelineRayPayloadSize = InInfo.PayloadSize;
	PipelineInterfaceInfo.maxPipelineRayHitAttributeSize = InInfo.AttributeSize;
	CreateInfo.pLibraryInterface = &PipelineInterfaceInfo;

	// create state for ray tracing pipeline
	PFN_vkCreateRayTracingPipelinesKHR CreateRTPipeline = 
		(PFN_vkCreateRayTracingPipelinesKHR)vkGetInstanceProcAddr(VulkanInstance, "vkCreateRayTracingPipelinesKHR");
	
	VkResult Result = CreateRTPipeline(LogicalDevice, VK_NULL_HANDLE, VK_NULL_HANDLE, 1, &CreateInfo, nullptr, &RTPipeline);

So, ray generation shader is the main entry point of a ray tracing process. Where the first TraceRay() call is. You will have different ray generation shaders for different RT renderings. Another shader you need is hit group shader. Hit group shader decides the behavior after a hit. For now, there are 3 shaders for RT shadows. One ray generation + closest hit for opaque object and any hit for alpha test object.

Code pieces from RayTracingShadow.hlsl:

// defined in UHRTCommon.hlsli
struct UHDefaultPayload
{
	bool IsHit() 
	{ 
		return HitT > 0; 
	}

	float HitT;
};


	float MaxDist = 0;
	float ShadowStrength = 0;
	float HitCount = 0;
	for (uint Ldx = 0; Ldx < UHNumDirLights; Ldx++)
	{
		// shoot ray from world pos to light dir
		UHDirectionalLight DirLight = UHDirLights[Ldx];

		// give a little gap for preventing self-shadowing, along the vertex normal direction
		// distant pixel needs higher TMin
		float Gap = lerp(0.01f, 0.5f, saturate(MipRate * RT_MIPRATESCALE));

		RayDesc ShadowRay = (RayDesc)0;
		ShadowRay.Origin = WorldPos + WorldNormal * Gap;
		ShadowRay.Direction = -DirLight.Dir;

		ShadowRay.TMin = Gap;
		ShadowRay.TMax = float(1 << 20);

		UHDefaultPayload Payload = (UHDefaultPayload)0;
		TraceRay(TLAS, 0, 0xff, 0, 0, 0, ShadowRay, Payload);

		// store the max hit T to the result, system will perform PCSS later
		// also output shadow strength (Color.a)
		if (Payload.IsHit())
		{
			MaxDist = max(MaxDist, Payload.HitT);
			ShadowStrength += DirLight.Color.a;
			HitCount++;
		}
	}

	ShadowStrength *= 1.0f / max(HitCount, 1.0f);
	Result[PixelCoord] = float2(MaxDist, ShadowStrength);

It's shooting ray from the world position constructed with depth buffer, to the light directional. I didn't shoot a "test" ray for getting world position. I want to keep minimal number of rays traced. After hitting, I record the max hit distance for PCSS filter and average of shadow strength. So multiple light shadow tracing won't be a problem.

As shown in image, shadows from different lights will be blended.

At last, there is yet a puzzle to solve: how do I know the material and mesh information of hit object? This is done in the hit group shader, a code piece from RayTracingHitGroup.hlsl:

// texture/sampler tables, access this with the index from material struct
Texture2D UHTextureTable[] : register(t0, space1);
SamplerState UHSamplerTable[] : register(t0, space2);

// VB & IB data, access them with InstanceIndex()
StructuredBuffer<VertexInput> UHVertexTable[] : register(t0, space3);
ByteAddressBuffer UHIndicesTable[] : register(t0, space4);

In D3D12, there is a feature "Local Root Signature". Which allows user to use local descriptors. But Vulkan don't seem to have the equivalent. So I can only put my resource as descriptor arrays here.
When fetching material, I'll use InstanceID() defined when creating top level AS.
When fetching VB/IBs, I'll use InstanceIndex() generated by DXR system. For instance 0, this will be 0, and instance 1 as 1...etc. There is a chance that different instances use the same mesh. But I still have to create the same descriptor array length of instances here. Since that's how InstanceIndex() works.
Hopefully Vulkan adds local descriptor feature afterward!

And here, the register(tX, spaceY) will be bound as binding slot X for descriptor Y.
That is, multiple descriptor sets are needed.

	// bind descriptors and RT states
	std::vector<VkDescriptorSet> DescriptorSets = { RTShadowShader.GetDescriptorSet(CurrentFrame), RTTextureTable.GetDescriptorSet(CurrentFrame), RTSamplerTable.GetDescriptorSet(CurrentFrame)
		, RTVertexTable.GetDescriptorSet(CurrentFrame), RTIndicesTable.GetDescriptorSet(CurrentFrame) };
	GraphBuilder.BindRTDescriptorSet(RTShadowShader.GetPipelineLayout(), DescriptorSets);
	GraphBuilder.BindRTState(RTShadowShader.GetRTState());

After these bindings, I can fetch the vertex and material information in the hit group shader properly!

Light Pass Rendering

Calculate lighting BRDF and indirect lighting here based on the GBuffers.
Indirect lighting now simply comes from ground color + saturate(normal.y * sky color); Will consider more realistic formula like SH9 in the future.

On the other hand, shadow PCSS filter is applied when sampling RT shadows. Now it's hard-coded as 5x5 PCSS in the shader. I'll adjust the Penumbra value based on distance to the blocker, mipmap rate and depth difference.
I don't want distant pixels to have high penumbra. Also, the ray tracing shadow is done in the screen space. So there is a change to sampling the wrong neighbor pixels, which causes some artifacts on the object edge.

With these checks, RT shadows work fine for me now. An image of PCSS effect, pixels which are close to blocker will be sharp, and be soft if away from the blocker.
Referring LightPixelShader.hlsl for programming details. And I've only implemented directional light currently.

Sky Pass Rendering

The easiest rendering in the UHE(?) It's just to render a skybox.

Motion Vector Rendering

Since temporal AA is used in UHE. I need the motion vector for dealing with some artifacts of temporal method.

In brief, this has two passes on the same RT:
● CameraMotionPS: This simply builds motions by reconstructing world position from depth, and calculates from current and previous frame ViewProj. Suitable for static objects.
● CameraObjectPS: The camera motion, however, can't deal with moving object. If camera isn't moved, the camera motion will always be 0. So object motion needs to be rendered for moving objects. This pass only renders objects which are marked "motion dirty" for the best performance. I don't have to render object motion for static objects!
Referring MotionVectorShader.hlsl for details.

Post Processing Rendering

In UHE, it prepared another PostProcessRT for this. And use it alternately with SceneResultRT. That is, two RT keeps blit to each other, and accumulating the post processing results. So I created 2 render pass and frame buffer instance in order for the proper image transition behavior.

Tone Mapping

UHE uses Stephen Hill (@self_shadow)'s ACES method, who deserves all credit for coming up with this fit and implementing it!

In brief, he arranged all conversions that are needed in ACES to only 2 matrices.
After tone mapping, the scene won't be that bright as shown in image:

Temporal AA

Details in TemporalAAPixelShader.hlsl. It's a common AA method in modern games despite it can blur the scene. In brief, it samples current frame and previous frame with some jitter sequences. This introduces some issues:
● Ghosting: This can be easily solved by adding motions to sample UV.
● Disocclusion: This happens when an object is occluded in previous frame, but showing up in current frame. I solved it with "motion rejection" method. If the motion difference of the current frame and previous frame is larger than a threshold, it won't sample the history RT.
● Missing Depth: For example, the edge between an object and skybox pixels which have no depth. This will also cause another ghosting problem. For this, I use a 3x3 mask to find the max depth within neighbor pixels. If depth value is still 0, it won't sample history RT too.

Referring TemporalAAPixelShader.hlsl and this awesome TAA introduction website if you're not familiar with it.

Since Temporal AA has 2 samples only. It would still has aliasing in the scene. Usually this method will be combined with FXAA, which isn't implemented yet. I'll consider adding FXAA in the future (and trying to apply FXAA to the "edge" only). The AA comparison:

Swap Chain Presentation

Finally, the last part of the rendering! It's to blit result of post processing to the swap chain. vkCmdBlitImage() will take care of linear-sRGB conversion for me.

There is also a fact, in UHE, swap chain resolution is different than rendering resolution.
The swap chain size will follow the window size, and render resolution is based on setting.
Why am I doing this? Because I found full screen toggling is quite different than DX12.
vkAcquireFullScreenExclusiveModeEXT() doesn't really going full screen as IDXGISwapChain::SetFullscreenState(). It's just to acquire the exclusive as how it's named.

After hitting alt+enter (or through setting window), I'll resize window to the same resolution as desktop. Then calling vkAcquireFullScreenExclusiveModeEXT. That is, the full screen in UHE is more like a borderless full screen (or windowed full screen). This is the main reason I separated swap chain and rendering resolution. I want to change the resolution when it's full screen state too!

Conclusions

Finally, this article is going to finish :D. If you really read these all verbose paragraphs, hold my beer! And hopefully this can help Vulkan learners!

Personal thoughts to Vulkan:
● It's more verbose than D3D12. Especially when comes to object management. Every Vulkan objects need callings of vkDestroyXXXX() functions. Most D3D12 interfaces are inherit from IUnknown, which implements Release() function. Which makes management easier.
● Vulkan doesn't seem to have thread-safe VkQueue, that is, only one thread can call vkQueueSubmit at once. Also, VkCommandBuffer can't be reset and reused immediately after calling vkEndCommandBuffer. While D3D12 allows ID3D12CommandQueue::ExecuteCommandLists to be called in different thread. And ID3D12CommandList can be reset and reused immediately after execution. This can really effect the design of parallel submissions a lot. I'll study this in the future.
● Vulkan doesn't seem to have local descriptors like D3D12. But this can be workaround so it's fine for me.
● Overall, Vulkan is equally powerful as D3D12 :). It provides every thing a engine developer needs. I guess it's more powerful for Linux platform.

The GitHub Link (code only):
https://github.com/EasyJellySniper/Unheard-Engine

The Full Project Link (including assets): https://mega.nz/file/p0ICFJbJ#TL5Vdu6FEyCFdCwd_7rhT3N_UaLfb4HkG1XMP9cYMzA

I recommend the full project link, since it contains assets.
It might have issues if running with GitHub code only.
Also, DXC and FBX SDK dll are not included in GitHub too.

Previous Entry Squall's D3D12 Rendering - MultiThread Rendering/DXR Ray Tracing

Next Entry Unheard Engine: Preliminary optimizations.

2 likes 0 comments

Comments

Nobody has left a comment. You can be the first!

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

SquallLiu

Author

Unheard Engine: 2 months Journey of Vulkan Learning

Comments

SquallLiu

Latest Entries

Unheard Engine: Adding refraction material & Efficient ray-traced reflection.

Unheard Engine: Adding HDR image import, BC6H Compression, Cubemap editing and miscs.

Unheard Engine: Spot lights, HDR support, and compliation-free parameter node in material graph.

Unheard Engine: Point lights and texture compression.

Unheard Engine: Adding translucent pass, misc optimizations and refactoring.

Vulkan - Bringing material data to ray tracing shader and differentiate them.

Unheard Engine: Build a material graph system from scratch.

Unheard Engine: Preliminary optimizations.

Unheard Engine: 2 months Journey of Vulkan Learning

Squall's D3D12 Rendering - MultiThread Rendering/DXR Ray Tracing

Unheard Engine: 2 months Journey of Vulkan Learning

Comments

SquallLiu

Latest Entries

Unheard Engine: Adding refraction material & Efficient ray-traced reflection.

Unheard Engine: Adding HDR image import, BC6H Compression, Cubemap editing and miscs.

Unheard Engine: Spot lights, HDR support, and compliation-free parameter node in material graph.

Unheard Engine: Point lights and texture compression.

Unheard Engine: Adding translucent pass, misc optimizations and refactoring.

Vulkan - Bringing material data to ray tracing shader and differentiate them.

Unheard Engine: Build a material graph system from scratch.

Unheard Engine: Preliminary optimizations.

Unheard Engine: 2 months Journey of Vulkan Learning

Squall's D3D12 Rendering - MultiThread Rendering/DXR Ray Tracing

Reticulating splines