For my persistent "dynamic" buffers I like to have a "CPUWritable" flag that lets you have two different behaviors. If that flag is set, the buffer is allocated out of an UPLOAD heap and can be written to directly by the CPU. To make sure that the CPU doesn't overwrite something that the GPU is reading, the buffer is internally double-buffered, and the buffers are swapped when the contents are changed by the CPU. With this set up you can only flip the buffer at most once per frame (where a "frame" is denoted by a fenced submission of multiple command lists to the DIRECT queue, followed by a Present), so I have an assert to track which frame the buffer was last updated.
If the CPUWritable flag is false, then the contents have to be updated by writing to temporary UPLOAD memory first, and then copying that to the actual buffer memory in a DEFAULT heap. However I do it a little differently than you're proposing, since I use a COPY queue to do the copy instead of using a DIRECT queue. Doing it on the copy queue is trickier since you have multi-queue synchronization involved, but the upside is that the copy can potentially start earlier and run alongside other graphics work (which you usually want to do for initializing static resources). To again avoid writing something that the GPU is reading from, I also double-buffer in this case and only allow at most 1 update per frame. For the temporary memory from an UPLOAD heap that's used as a staging area, I have a ring buffer that tracks fences to know when it can move the start pointer forward.
With your approach of doing the copy on the DIRECT queue, the nice part would be that it will be synchronized with the graphics work on the GPU timeline. This means that you don't need to double-buffer, or do any synchronization beyond your barriers. But the downside is that the copy will happen synchronously with your graphics work, instead of "hiding" in other work. You'll also have to track your fence on the DIRECT queue to know when to free your chunk from the UPLOAD heap.
For choosing between whether to keep your buffer in UPLOAD memory or copy into DEFAULT memory, the best choice most likely depends on how you access the data. If the data is small and you're not going to do repeated random accesses to it, UPLOAD is probably fine (this covers a lot of constant buffers). If the data is larger and you access it multiple times, then it's probably worth copying it to DEFAULT so that you get full access speeds on the GPU (something like a StructuredBuffer full of lights for a forward+ renderer would probably fall into this category).
Anyway, I just wanted to share what I'm doing to give you a few ideas. I'm not claiming to have the best possible approaches here, so feel free to do what works best for you and your engine.
EDIT: I forgot to add some links to my code for reference. You can find the buffer code here, and the upload queue code here. Just be aware that the descriptor management is a bit complicated since that code uses persistent bindless descriptor indices, so there's some jumping through hoops to make sure that the descriptor index doesn't have to change when the buffer is updated.