Advertisement

ASM program flow

Started by October 16, 2020 04:28 PM
18 comments, last by Gnollrunner 4 years ago

Im trying to understand the sequence in which ASM instructions are executed. Does the execution take place in the order the instructions appear in the text editor? Does it work the same as a c++ program with difference residing in the fact that there are no operators and context brackets.

So if it works like a c++ program (with commands getting executed top to bottom row by row ) what happens to the registers? Does one CPU cycle execute one ASM code row?

My project`s facebook page is “DreamLand Page”

In a broad term, ASM is executed top to bottom. Of course you are going to jump around for function calls, control-flow and loops (represented by jumps to labels). There is also a lot of nuance to what happens between the C++ you see, the ASM you see and what the CPU actually does, for instance pipelining (which very broadly means the CPU tries to execute multiple instructions concurrently). Not sure in what aspects you are interested to getting more details.

Calin said:
Does one CPU cycle execute one ASM code row?

From my understand, not at all. Lets keep pipelining aside, but even so instructions have different latencey, some can take multiple cpu-cycles to execute. Then there is the whole memory/caching-ordeal, where the CPU will have to burn cycles waiting on some result from memory, making predicting how many cycles an instruction uses very unpredicatable in modern systems.

Advertisement

Thanks for sharing Juliean

My project`s facebook page is “DreamLand Page”

back in the days of the '90s era, it depended on what cpu processor you were running on and the version of ASM…

Say if you were running on an intel 386sx, the asm instructions on those used to execute one after the other;

then when RISC CPUs were released starting with pentiums (if i remember well), u could then write ASM instructions one after the other, which were executed in parallel at the same time because they did not have dependencies on each other: this process was called pipelining and compilers such as Borland-C++ in those days did not support this so you had to write the asm code yrself -boom-

// retro pseudo

void shade_tri(int b,...)
{
  int a = b + 3;
  asm
  {
  ....
  	// hardcoding unknown and forgettable magical numbers was allowed
    li a0, 987641 		// this could run in pipeline 1 (TMapping)
    mv esx,7	 		// this on the 2nd one (Gouraud lighting)
    goto home 			// good ol' goto :-)
  ....
  }
  return ccc;
}

pipelining executed your code fast and prior to (for example) hardware texture mapping support on video cards, this was one way u could do perspective correct texture mapping on pipeline1 and gouraud lighting on pipeline2 at the same time then mix registers result to shade your triangle…. mad days !

today let your compiler do it for u, if u can … as of me, i'm home now ?

That's it … all the best ?

It's been decades since any but the most rudimentary microcontroller has executed instructions serially.

Pipelining, for example, is a technique in which several consecutive instructions are in different phases of interpretation and execution simultanously. While one instruction is being loaded, the previous one is being interpreted by microcode and the one previous to that is being executed. The pipeline can be several stage deep.

Superscalar execution means several instructions can be in the same stage of the pipeline at the same time. Both branches of a decision can be loaded and interpreted and even executed in parallel until the actual results of the branch decision are known and the non-taken branch results discarded. This is the essence of the Spectre exploit that hit many CPUs recently: it took advantage of cache loads of non-executed branches.

Coprocessors allow the simultaneous execution of multiple instructions in parallel. For many years, all floating point operations were done by coprocessors and CPUs only had built-in integer arithmetic. The CPU had to wait for floating point operations to complete, but could execute additional instructions in the mean time until the copro signalled its results with an interrupt – a floating-point division could take the same time of dozens of CPU instructions. Today we have similar behaviour, only the math copro had evolved into the embarrassingly parallel coprocessor known as the GPU and the CPU itself often has built-in vector processing that issue synchronously.

Most CPUs today have multiple cores, and a good compiler can vectorize a lot of code to allow multiple CPU cores to execute the same instructions at the same time (on different parts of the data).

To summarize, today's CPUs tend to execute multiple assembly instructions at the same time, either in different phases of execution or with several execution cores at the same time.

Control flow is still conceptually linear, however. Interpretation of assembly starts at the beginning and the instruction pointer is either incremented with each instruction read or adjusted by some kind of jump instruction. Each instruction conceptually has the CPU in an initial state, performs a transformation of that state, and leaves the CPU in a final state. One at a time. Conceptually.

Stephen M. Webb
Professional Free Software Developer

Seriously, all this text wall to confuse the OP?

Whatever the processor does in the background is TOTALLY IRRELEVANT for your question.
Your assembly will appear to be processed serially of course. If it isn't the processor is BROKEN.

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

Advertisement

Seriously, all this text wall to confuse the OP?

It doesn`t confuse me I think. It`s useful to have a broad idea about what`s hiding behind the `pipelining` word, (introduction level type of knowledge)

My project`s facebook page is “DreamLand Page”

by the way @calin , i remembered this link, this site is well known for its optimization series of C++, i don't know if u know it, but have a look here: https://www.agner.org/optimize

there are 5 parts, part 2 and 4 maybe of particular interest to you…or all of them ? (free to download…)
the instruction tables show you latencies per instruction (in clock cycles) ;

Also depending on your compiler's optimization settings for the targeted platform:

  • your c++ will be reordered by your compiler if necessary (new compilers can reorder your compiled c++ object code for the sake of parallel execution, this is also known as code scheduling) (in some cases, the compiler will add "code" suitable for an optimized step)
  • your ASM can be reordered by your CPU (today's CPUs can reorder instructions without the help of the compiler), but the compiler can make it easier for the CPU
  • there maybe instruction code jumps taking place between CPU cores (if your processor has more than 1 core and you have coded your program to use more than 1 core)

you as a programmer can also do some reordering AND/OR reorganizing in your c++ code to tell the compiler what you want to see happen, for example:

  • you can reorder a struct members to maximise its fitting into the L1 cache of your CPU, …
  • you can as another example reorganise your c++ code loops to take advantage of vectorization or loop unrolling feature

(Finally, but maybe not so much directly related to your OP but still good to know), the other thing you maybe interested in is C++ memory model. In short, it has been revamped. You can specify how memory access around atomic operations (to say the least for example) is to be ordered; you can read more about it here: https://en.cppreference.com/w/cpp/atomic/memory_order

In layman's terms it's like saying: “I want to code this in 2 or 3 threads, I want this memory code access intact, this other one not so much, but I'll let you -compiler- deal with it” or not or etc.. or etc..

these are good days to code, we've never been treated like this before ?

anyway, hope this clarifies it a bit more;

Until then ?

Endurion said:
Seriously, all this text wall to confuse the OP?

Calin said:
It doesn`t confuse me I think

There is a lesson to be learned here. But I'm not getting banned by saying.

🙂🙂🙂🙂🙂<←The tone posse, ready for action.

Why would you get banned? This is a discussion forum after all ?

Of course all kind of pipelining and instruction reordering is done by CPU and compiler, but for the logic any of these MUST not make any difference. For your code all appears to be running serially.

Since the question was posted in “For Beginners” a simple yes or no would've been enough, followed up by a more thorough explanation. Maybe I just interpreted too much into the replies.
Glad the OP got the gist though ?

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

This topic is closed to new replies.

Advertisement