In Computer Science when do you learn the fundamentals of high level languages and the methodologies a compiler uses to create assembly instructions? What is the primary book used for this course? Like, if you’re using Ada or Ghidra and trying to piece together what is happening in binary execution, I want to know structures to look for on this basic level.

I’m asking about the simple stuff like what you find in Arduino sketches with variables, type declarations, branching, looping, booleans, flags, interrupts etc. Also how these might differ across architectures like CISC/RISC, Harvard/von Neumann, and various platform specifics like unique instruction set architecture implementations.

I have several microcontrollers with Flash Forth running the threaded interpreter. I never learned to branch and loop in FF like I can in Bash, Arduino, or Python. I hope exploring the post topic will help me fill in the gap in my understanding using the good ol’ hacker’s chainsaw. If any of you can read between the lines of this inquiry and make inference that might be helpful please show me the shortcuts. I am a deeply intuitive learner that needs to build from a foundation of application above memorization or theory. TIA

  • @abhibeckert
    link
    9
    edit-2
    10 months ago

    What is the primary book used

    There isn’t one. Most people don’t learn this stuff by reading a book.

    The best way to learn is by looking at actual assembly code, then research what each instruction does. I wouldn’t start with actual compiler generated code. Being computer generated it’s often quite messy and obviously undocumented. Best to start with easier to read code like the example I’ve included below — a simple “print Hello World” in CISC, then in RISC.

    Notice CISC uses mov, int and xor, while RISC uses mov, ldr, and svc. You should look those up in a manual (plenty of free ones online) but in simple terms:

    • mov: move memory from one place to another. RISC and CISC have the same instruction but but they’re not identical
    • int: means interrupt, essentially stop execution (for a moment) and hand execution over to other software
    • xor: modifies a value (an XOR operation)
    • ldr: is “load register”, it loads a value from elsewhere in memory
    • svc: means “supervisor call” which, is used in much the same way as int. The code is asking the kernel to do something (once to write to stdout, and once to terminate the program).

    section .data
        helloWorld db 'Hello World',0xa  ; 'Hello World' string followed by a newline character
    
    section .text
        global _start
    
    _start:
        ; write(1, helloWorld, 13)
        mov eax, 4          ; system call number for sys_write
        mov ebx, 1          ; file descriptor 1 is stdout
        mov ecx, helloWorld ; pointer to the string to print
        mov edx, 13         ; length of the string to print
        int 0x80            ; call kernel
    
        ; exit(0)
        mov eax, 1          ; system call number for sys_exit
        xor ebx, ebx        ; exit status 0
        int 0x80            ; call kernel
    

    .section .data
    helloWorld:
        .asciz "Hello World\n"
    
    .section .text
    .global _start
    
    _start:
        ; write(1, helloWorld, 13)
        mov r0, #1                  ; file descriptor 1 is stdout
        ldr r1, =helloWorld         ; pointer to the string to print
        mov r2, #13                 ; length of the string to print
        mov r7, #4                  ; system call number for sys_write
        svc 0                       ; make system call
    
        ; exit(0)
        mov r0, #0                  ; exit status 0
        mov r7, #1                  ; system call number for sys_exit
        svc 0                       ; make system call