I think the real question you need to solve first is “how does authority work when players are in different areas/zones/maps” ? Everything else will probably come from that.
You may also want to separate out “server” versus “host.” If I'm a player hosting a “server” on my local machine, that may very well also be the “client process” that I'm playing in, and thus it's both a “rendering client process” and a “server” (both topologically and authoritatively.) The alternative is to run two processes on the hosting player's machine, one for topologically being the server, and one for doing the client/presenting.
If you run multiple processes on the hosting client's machine, you may as well run more than two – one per area/map/whatever there's a player in. And one small process that is the one listening to the network, and routing packets to/from the various clients,
If you want to use root motion as the movement authority for simulated entities, you can write some script that extracts only the motion part of the animation, and use that on the server, if you feel that loading the full skeletal animation uses too much memory or whatever.