It's been decades since any but the most rudimentary microcontroller has executed instructions serially.
Pipelining, for example, is a technique in which several consecutive instructions are in different phases of interpretation and execution simultanously. While one instruction is being loaded, the previous one is being interpreted by microcode and the one previous to that is being executed. The pipeline can be several stage deep.
Superscalar execution means several instructions can be in the same stage of the pipeline at the same time. Both branches of a decision can be loaded and interpreted and even executed in parallel until the actual results of the branch decision are known and the non-taken branch results discarded. This is the essence of the Spectre exploit that hit many CPUs recently: it took advantage of cache loads of non-executed branches.
Coprocessors allow the simultaneous execution of multiple instructions in parallel. For many years, all floating point operations were done by coprocessors and CPUs only had built-in integer arithmetic. The CPU had to wait for floating point operations to complete, but could execute additional instructions in the mean time until the copro signalled its results with an interrupt – a floating-point division could take the same time of dozens of CPU instructions. Today we have similar behaviour, only the math copro had evolved into the embarrassingly parallel coprocessor known as the GPU and the CPU itself often has built-in vector processing that issue synchronously.
Most CPUs today have multiple cores, and a good compiler can vectorize a lot of code to allow multiple CPU cores to execute the same instructions at the same time (on different parts of the data).
To summarize, today's CPUs tend to execute multiple assembly instructions at the same time, either in different phases of execution or with several execution cores at the same time.
Control flow is still conceptually linear, however. Interpretation of assembly starts at the beginning and the instruction pointer is either incremented with each instruction read or adjusted by some kind of jump instruction. Each instruction conceptually has the CPU in an initial state, performs a transformation of that state, and leaves the CPU in a final state. One at a time. Conceptually.