A rendering engine simply takes geometric prrimitives and converts them to 2D data in projection space and finally map that projection space data to your screen (this is what you see on your 2D screen as a rendered screne). What appears on screen is the final presentation stage amongest all computations that a rendering engine has to go through.
I will give an example of what happens in the directx case:
Directx is reqired to stream commands to the graphics adapter, Otherwise there would be no ways of talking to the graphics adapter. so a typical engine will use directx to set things up through directx API functions, the data that is typically set up are models, requesting memory pool from gpu for intermediate computations and setting up your shaders.
For a 3D rendering engine. The typical job is to get 3D data like models and convert them to 2D pixels on screen. This is usually done with setting up a matrix that will convert all vertex points from this 3D model to representable 2D screen data. So largely there's a lot of setting up from the rendering engine with the help of directx. Also you need to learn shader language both shader language and c++ if you are going to use directx.
I don't have any resources on rendering engines but you could look at a directx 11 book by frank luna.