2008/03/25

Socket Overview

A TCP/IP socket is a bi-directional stream between two services.

 Server                Client
--------              --------
 socket                socket
 bind
 listen
   accept   ---------  connect
   | recv*  <--------  send*
   | send*  -------->  recv*
   | close  ---------  close
 close
socket
Gets socket resources (file descriptor, buffers).
Specifies the protocol family.
bind
Gives the socket a local address (port, IP address).
listen
Establish that this socket will be a server socket (a/k/a a welcome socket) for accepting incoming connections.
Sets the length of the listen queue (the maximum number of waiting incoming connections).
accept
Receive an incoming connection.
Returns a new socket connected to the client.
connect
Connect a client socket to a remote host.
send / recv
Send and receive data over the connected socket stream.
close
Close the socket.

See: Beej's Networking Guide (especially sections 4 and 5).

2008/03/18

Files in C

A file is a flat representation of data as an ordered sequence of bytes.

Files are written to and read from as a stream, but a file handle can have the concept of a file cursor. The file cursor represents the position in the file of the "next" byte to be read or written.

In Java, the use of the file cursor for random access to a file is discouraged (seeking is fairly expensive, and often leads to errors). Most file I/O happens strictly through the stream abstraction. In C, it's a lot easier to seek backwards and forwards. There are methods to move the file cursor to different locations in the file, but reading and writing still happens through the stream abstraction. When you read or write a byte, the file cursor is automatically advanced one byte.

C's I/O libraries are set up to be type agnostic so that you can read or write any type to/from the stream without excessive overloading (overloading is actually not allowed in C). They work by giving the I/O function a pointer to a buffer of unknown type (a void*) and telling the function how large that buffer is and how many elements it contains.

Reading and writing a single integer value looks like this:

int val;
fread(&val, sizeof(val), 1, f_in);
fwrite(&val, sizeof(val), 1, f_out);

Reading and writing a struct looks like this:

struct AStructType val;
fread(&val, sizeof(val), 1, f_in);
fwrite(&val, sizeof(val), 1, f_out);

Reading and writing an array looks like this:

int[10] val;
fread(val, sizeof(int), 10, f_in);
fwrite(val, sizeof(int), 10, f_out);


Because the read and write values take pointers to buffers, you cannot write any temporary values to a stream.

frwite(atoi("7"), sizeof(int), 1, f);

The above example will generate a compile warning and segfault because the value returned from atoi() is the value 7 and not a pointer to the value 7. The compiler warns us that we are trying to convert an integer to a pointer without a cast, and it segfaults because fwrite tries to read what is stored at memory location 7 (which is not a valid pointer address).

frwite(&atoi("7"), sizeof(int), 1, f);

The above example does not compile because we are trying to take the address of a temporary value. This value may not have an address at all, it may only exist in a register, so we cannot take its address.

int val = atoi("7");
frwite(&val, sizeof(int), 1, f);

Corrects the problem and works as intended.

See: stdio.h (cppreference.com, cplusplus.com)

Stream Abstraction

A common abstraction used for I/O is a "stream." A stream is a reliable connection between two components through which one device can send data to the other. Streams usually operate on bytes (and are called byte streams).

A stream gets it's name because it is like a river of data. The data length is unknown, and you cannot randomly access elements in the stream. When data is added to the stream it flows down the river. You are only allowed to insert data at the start of the stream -- you cannot change it, remove it, or insert something ahead of it -- once it has gone into the stream, it is out of your control. When you remove data from a stream, you can only remove it from the end. You cannot "go back" in the stream (doing so would be the same as pushing a river backwards) and you cannot read a value later in the stream (without reading the data earlier in the stream). The data pours out of the end, and once it is removed, it is no longer part of the stream.

Imagine a stream as a river, and your program is using it to send/receive messages. Putting data in the stream is the same as putting a message into a bottle and floating it down the river. Taking data from the stream is the same as removing a bottle from the river. However, in this case the river is a perfect delivery mechanism. Bottles always arrive in the same order that they were sent, they are never lost (or missed), and there are no gaps between the bottles.

Output streams are occasionally called a "sink," whereas input streams are sometimes called a "source."

Streams are a useful abstraction because they hide the underlying transport mechanism used between components, and they hide the details of the component at the other end. The stream could be a network socket, a pipe to another program, a terminal, a keyboard, a disk, a printer, a modem, a billboard, or a buffer in memory. Programs written to use streams as their I/O can be easily adapted to substitute one implementation of a stream for another.

See: Java's InputStream, OutputStream, C++'s iostream, and C's I/O with streams (stdio.h).

2008/03/06

Using GDB

GDB (project, documentation) is a simple debugging tool for GNU command line systems.

You can Google around for lots of information about it. A short tutorial, a longer tutorial, and a reference card are pretty easy to find.

For GDB to be most effective, you should compile your code with the -g flag to leave debugging symbols in your executable. This will do things like leave variable names and line numbers in the executable so that GDB can show you what/were things are.

The commands that you will use the most are:

help
r arguments (run)
r 5
file executable
bt (backtrace)
p expression (print)
p argv[1]
b file:symbol (breakpoint)
b main.c:5
b main.c:parseArgs
d (delete breakpoints)
s (step) [step-into]
n (next) [step-over]
c (continue)
up
q (quit)

Segmentation Faults and Bus Errors

A segmentation fault occurs when your program tries to access a region of memory in a way that is not permitted. A bus error occurs when your program tries to access a region of memory that does not exist, or it tries to access an unaligned region. See the wiki pages for examples.

SegFualts and BusErrors are runtime errors that come from accessing memory through pointers. Java has protected you from these kinds of errors by abstracting pointers away from the programing language. The only thing in Java that is similar is an ArrayIndexOutOfBounds exception that is raised when you try to access an element that is not contained in an array. In C, buffer accesses are unchecked, so this may or may not raise a segmentation fault depending on how memory has been laid out by the compiler and OS. [Note: unchecked buffers are arguably the largest source of security bugs in C/C++.]

Consider the code:
int buff[3];
buff[3] = 8;

In Java, this would throw an ArrayIndexOutOfBounds. In C, it may or may not segfault.

If the buffer buff was placed in memory next to a memory page boundary it has a greater chance of raising a segfault. The buffer buff ends at buff[2]; or rather buff has been reserved space for 3 elements. If buff[3] (or more precisely, *(buff+3), the 4th element) was on a read only page, then the assignment will generate a segfault (but a read would not). If it was on a page marked "no access," then either an attempt to read or to write would raise an error. If it was on a page that is marked as read/write, then no fault signal will be raised and the value 8 will overwrite whatever value was previously at that location. This may be empty memory (so nothing bad will happen), another variable (the old variable contents will be lost), part of the call stack (possibly causing the program to "return" to an arbitrary location), or even a piece of instruction code. If the value 8 was inputed from the user, this bug could allow a user to execute arbitrary code within the program.

In Java, these errors are always fail-stop, so they are easier to find. In C, they may or may not be fail-stop depending on how memory is arranged, so they are harder to find.

Bonus Material: The libraries Electric Fence and Dmalloc change the way in which pages are allocated to ensure that each buffer (in the heap) ends at a page boundary and that the following page is "no access." The tool Valgrind translates and instruments code to look for similar (and more) errors, and Die Hard (acm, draft) takes a probabilistic approach for laying out memory to detect these errors.

2008/03/04

Bonus: Automated Testing

One of the principles of agile development is that your code should have completely automated unit tests. Your tests should require no interaction on your part, and they should require no interpretation; they should simply run and say pass or fail. If your tests require input, then they are not repeatable. If they require interpretation, then they are not automated.

By building an automated regression suite, you can use the tests as an enabler of change to allow you to test every change that you make to your code-base with little effort. Ideally, you would make running tests as part of your build process. In most agile environments, the automated execution of tests is at least part of the integration process (when you commit your code changes to the repository).

There are many testing tools available for agile developers. The most popular is the xUnit framework. (Java: JUnit, .NET: NUnit, C++: CppUnit). These tools may be too heavyweight to learn just for testing your cs213 assignment, so we're going to see how to write an automated testing framework using some more basic primitives. The C language includes a basic assertion package <assert.h>.

Consider the file:
testable.c:
#include <stdbool.h>
#include <assert.h>

int isBiggerThanFive(const int x) {
    return x > 5;
}

int main() {

    assert(true == isBiggerThanFive(10));
    assert(true == isBiggerThanFive(6));
    assert(false == isBiggerThanFive(5));
    assert(false == isBiggerThanFive(4));
    assert(false == isBiggerThanFive(-10));

    return 0;
}

Compiling and running this file with gcc -Wall testable.c -o testable; ./testable produces no output since all of the tests pass.

[Note: an assertion is a statement that something is believed to be true, so all of your assertions should pass.]

If we add the incorrect assertion that assert(false == isBiggerThanFive(50)); to the bottom of our method, we get the output:

testable: testable.c:15: main: Assertion `0 == isBiggerThanFive(50)' failed.

This is telling us that for the executable testable, in the file testable.c, on line 15, in method main, our assertion that false == isBiggerThanFive(50) failed. In this case, it's because our test is invalid, but it could have just as easily been an error in our implementation.

If we add another incorrect assertion to our test method true == isBiggerThanFive(0) and re-build, we still get the same output. There is no output from the second test, even though it should fail. This is the expected behavior.

In unit testing, it's desired that a test harness fails on the first error for each test. There is no sense in continuing to run a test after a failure since it means that the unit under test (or the test itself) has entered an unknown state, and therefore we can no longer make assertions (our belief was wrong).

Failing an assert in C causes the program to abort execution. This has the unfortunate effect that only one unit test can fail per executable (subsequent tests will not run if there was a failure). The xUnit testing frameworks handle testing multiple units better, but assert is good enough for testing small programs.

If you don't want to use the assert library, you can write your test harnesses to use the printf function, but all that you should be printing is pass or fail [details]; don't print things that require you to read them to verify that they are correct, as this will destroy any chance of test automation.

See also:

2008/02/26

Make

A make file consists of a set of make rules. Each rule has three parts: a target, a set of zero or more dependencies, and a list of zero or more commands.

target: dependencies
(tab) command1
(tab) command2

Make is invoked by specifying a target on the command line. If no target is given, then the first target in the file is taken as the default. To compete a target, all dependencies are evaluated first. If a dependency is another target in the makefile, then those dependent targets are evaluated. If the dependency is not a target in the makefile, then the file system is checked for the existence of a file with the name of the dependency. If the file has been modified more recently than the file matching the target, then the commands for that target are run.

Consider the files:
main.c:
  #include "bar.h"
  void main() {
    bar();
  }
bar.c:
  #include "bar.h"
  void bar() {
  }
bar.h:
  void bar();
The corresponding naive makefile would be:
main.exe: main.o bar.o
(tab) gcc main.o bar.o -o main.exe

main.o: main.c bar.h
(tab) gcc -c man.c

bar.o: bar.c bar.h
(tab) gcc -c bar.c

clean:
(tab) rm *.o main.exe

Note: the "-o" flag specifies the name of the output file. The "-c" flag tells the compiler to stop after the "compile" stage and before the "link" stage, generating an object file as the output.

Typing "make" on the command line would build this project and produce an executable named "main.exe". Typing "make" again would do nothing since all of the dependencies would evaluate to false since none of the source files had changed. Changing any of the source files and then typing "make" would cause only the required segments of the project to be recompiled. The make utility knows what files need to be rebuilt based on the dependencies that we have given it, so it is important that these dependencies are correct.

See also: GNU make documentation.

C: Declaration vs Definition

In computer science, there is a subtle but important distinction between a declaration and a definition.

A declaration is a statement that something exists and what its characteristics are (name, and type). A definition is saying what that thing is.

A function declaration in C looks like this: void foo();. Note the semicolon. If we were to create a variable like that (int i;) we would declare that the variable exists without assigning it a value. When we declare a method without giving it a body, was are saying that this method exists somewhere, but we are not saying what it is. [Note: the variable in the example would still be "defined" since it was allocated space.] In C, these are referred to as function prototypes. In Java, they are called method signatures. They are functionally and conceptually the same.

To define a function, we must give it a body: void foo() {...}. This is saying what the function is and is a shorthand for the concept of saying: void foo() = {...}.

You can think of all of the symbols in a program as entries in a dictionary. The declarations are listing all of the words, and the definitions are assigning them meanings. Because programs must be unambiguous (so that they can be machine executed), each word that we declare must have exactly one definition. It would be ambiguous to have words with no definition (no meaning) or multiple definitions (multiple meanings) -- the machine would not know what to do.

Declarations can be used by the compiler to verify that function calls and function definitions match the exposed prototypes. C can use the function declarations (separated from their definitions) to achieve seperate compilation.

Separate compilation allows the differing functional units of a program to be compiled independently. If the function "foo" calls the function "bar," the two of them can be compiled separately if they both know the prototype for the "bar" method. Bar does not have to know anything about foo, and foo only has to know the declaration for bar. The linker can link the definition of bar to the call to bar in the method foo.

Separate compilation allows for large projects to be compiled without having to recompile every function every time a change is made. In the previous example, a change to foo would not require bar to be recompiled, and a change to bar would not require foo to be recompiled. A change to the declaration of bar would require both foo and bar to be recompiled though.

See also: Variable Definition vs Declaration

Stages of Compilation

Compilation in C has two major stages. The first stage, "compiling" consists of preprocessing, lexical, syntactic, and semantic analysis, followed by optimization and code generation. All of the compile errors that you are familiar with in Java come from this first stage.

The input to the first stage is your .c and .h source files, and the output is a .o object file. The object file consists of two parts, the machine code equivalent of the source file, and a symbol table which lists all of the symbols (functions, variable, and a few other things) that this object files provides and requires.

The second stage of C compilation is "linking." A linker takes in a set of object files and "links" the symbols in the symbol tables together. If one symbol table requires a "bar" function, the linker connects it to the object file that provides it. The output from the linker is an executable file.

In Java, there is very little static linking -- most is done dynamically at run-time, so this will be a new source of errors for you. Fortunately, the errors generated by the linker are usually very straightforward.

There are two major sources of errors from the linker. The first occurs when a symbol is required by one object file, but not provided by any other. In this case, the linker cannot make a required link, so it will generate a "symbol not defined" error. [The closest equivalent in Java is a "NoClassDefFoundError," a "NoSuchFieldError," and a "NoSuchMethodError" at runtime.] The other kind of error comes when a symbol has been provided by more than one object file. In this case, the linker does not know which symbol to link to, so it will generate a "multiply defined symbol" error.

Linking errors for methods come about because of the distinction between method declaration and method definition in C, which is the subject of the next post.

2008/02/12

Branches (While Loop - !0)

Consider the code block:

int x = 0;

while(x <= 10) {
  x++;

}
x += 4;

while(x > 10) {
  x--;
}
x -= 4;  

When we have a loop condition that involves a non-zero argument, we need to re-express that condition in terms of a zero argument. The code block above translates to:

int x = 0;

while(x -10 <= 0) {         // or [ 0 <= 10 - x]
  x++;
}
x += 4;

while(x -10 > 0) {          // or [ 0 > 10 - x]
  x--;
}
x -= 4;  

Converting to assembly gives:

0:   ld $0x0000, r0       # x = 0

# first loop, false conditional at top
2:   ld $0xfffffff6, r1   # r1 = -10
8:   mov r0, r2            
     add r1, r2           # r2 = x - 10
     bgt r2, 0x0012       # if x-10 > 0, x > 10, !(x <= 10) goto +3

     inc r0               # x++
     br 0x0008            # goto -4

12:  add $4, r0           # x += 4

# second loop, true conditional at bottom
14:  ld $0xfffffff6, r1   # r1 = -10
1a:  br 0x0022            # goto +4
1c:  dec r0               # x--;

     mov r0, r2
     add r1, r2           # r2 = x - 10;
22:  bgt r2, 0x001c       # if x-10 > 0, x > 10 goto -3

     add $-4, r0          # x -= 4  

Note: the condition being evaluated in each case is identical, it is just treated differently. In one block, it is used as a condition to exit the loop; in the other, it is used as a condition to continue the loop.

2008/02/07

Bonus: Advanced Branching (Switch Statement)

Consider a switch statement:


switch(r0) {
  case 0: ~~~0;
          break;
  case 1: ~~~1;
          break;
  ...
  case 0xf: ~~~f;
            break;
  default:
            ~~~default;
}
~~~done

If / Else Naive Implementation

  0: rr move r1, r0     # r1 = r0
     beq r1, 0x??0      # goto "0" case
     dec r1             # r1 = r0 - 1
     beq r1, 0x??1      # goto "1" case
     ...
     dec r1             # r1 = r0 - f
     beq r1, 0x??f      # goto "f" case
     ~~~default         # execute "default" case
     br 0x???           # exit
??0: ~~~0               # execute "0" case
     br 0x???           # exit
??1: ~~~1               # execute "1" case
     br 0x???           # exit
...
??f: ~~~f               # execute "f" case
???: ~~~done            # outside the switch
Case     # conditions evaluated      # branches taken
  0                 1                      2
  1                 2                      2
 ..
  e                15                      2
  f                16                      1
 <0                17                      1
 >f                17                      1
Sum:              170                     31

Every additional case adds progressively more and more to the number of conditions evaluated. The number of branches taken remains constant for each execution path.

Switch / Jump Table Implementation

   0:  move r1, r0          # r1 = r0
       dec r1               
       bgt r1 0x08          # ensure >=0 (in range)
       br "def"             # if not, goto "default" case
   8:  move r1, r0          # r1 = r0
       ld $0xFFFFFFF1, r2   # r2 = -f
       add r2, r1           # r1 = r0 - f
       bgt r1 "def"         # goto "default" case if > f
       move r1, r0          # r1 = r0 (and r1 is in range)
       ld $0x00001000, r2   # r2 = & jump table
       jmp *(4*r1, r2)      # jump to the right case
"c0":  ~~~0                 # case "0"
       br "done"            # exit
"c1":  ~~~1                 # case "1"
       br "done"            # exit
...
"cf":  ~~~f                 # case "f"
       br "done"            # exit
"def": ~~~~default         # "default" case
"done" ~~~~done            # outside the switch
...
1000:  &"c0"    # jump table, filled with addresses of cases
       &"c1"
       ...
       &"cf"
Case     # conditions evaluated      # branches taken
  0                 3                      3
  1                 3                      3
 ..
  e                 3                      3
  f                 3                      3
 <0                 1                      1
 >f                 2                      1
Sum:               48                     47

More cases do not grow the number of conditions evaluated per path, and the number of branches taken per path remains constant.

Special Note: the number of branches taken in the above example is higher than that of the if/else example due to the branch on the ">0" case. This branch can be eliminated by reversing the test:

   0:  move r1, r0          # r1 = r0
       not r1               # 
       inc r1               # r1 = -r0
       bgt r1 "def"         # goto "default" if <0
   8:  move r1, r0
       ld $0xFFFFFFF1, r2   # r2 = -f
       add r2, r1           # r1 = r0 - f
       bgt r1 "def"         # goto "default" case if > f
       move r1, r0          # r1 = r0 (and r1 is in range)
       ld $0x00001000, r2   # r2 = & jump table
       jmp *(4*r1, r2)      # jump to the right case
...
Case     # conditions evaluated      # branches taken
  0                 3                      2
  1                 3                      2
 ..
  e                 3                      2
  f                 3                      2
 <0                 1                      1
 >f                 2                      1
Sum:               48                     32

For more information, see Implementing Common Control Structures in Assembly Language and/or C to Assembly Translation III.

Branches (While Loop - false)

while(x != 0 ) {
  ~~~i
  ~~~i
  ~~~i
}
~~~o
Address Machine Assembly Notes
assume r0 = x
0: 9005 beq r0, 0x10 branch to exit
2: ~~~i true case
~~~i
~~~i
8: 80FC br 0x00 loop
10: ~~~o exited loop

Note 1: 2 branches total, 1 branch taken on critical path.

Note 2: conditional at top.

2008/02/05

Branches (While Loop - true)

while( x == 0 ) {
 ~~i
 ~~i
 ~~i
}
~~ o

Naive / Literal

Address Machine Assembly Notes
assume r0 = x
0: 9002 beq r0, 0x04 conditional
2: 8005 br 0x0c exit
4: ~~~i true case
~~~i
~~~i
a: 80FB br 0x00 loop
c: ~~~o exited loop

Note: 3 branches total, 2 branches in critical path.

Preferred

Address Machine Assembly Notes
assume r0 = x
0: 8004 br 0x08 prime (need to check condition)
2: ~~~i true case
~~~i
~~~i
8: 90FD beq r0, 0x02 loop conditional
~~~o exited loop

Note 1: 2 branches total, 1 branch in critical path.

Note 2: conditional at bottom.

Branches (If Statements)

If without Else

if( x == 0 ) {
  x = 1;
}
x++;
Address Machine Assembly Notes
assume r0 = x
0: 9002 beq r0, 0x0004 if true, skip false
2: 8004 br 0x000a false case, skip true
4: 0000 0000 0001 ld $0x0001, r0 true case
a: 6300 inc r0 after if block

Note: the false case comes first since we branch somewhere else when true.

If with Else

if( x == 0 ) {
  x = 1;
} else {
  x = 2;
}
x++;
Address Machine Assembly Notes
assume r0 = x
0: 9005 beq r0, 0x000a if true, skip false
2: 0000 0000 0002 ld $0x0002, r0 false case
8: 8004 br 0x0010 skip true
a: 0000 0000 0001 ld $0x0001, r0 true case
10: 6300 inc r0 after if block

Structures in Machine Memory

typedef struct {
  int a;
  int b;
} T;

T foo;
T bar[2];
T* h = (T*) malloc( 3* sizeof(T) );  // reserve heap space
address value offset c name region
0: ~~~~~ program
...
1000: ???? ???? 00 foo.a global variables
???? ???? 01 foo.b
1008: ???? ???? 00 bar[0].a
???? ???? 01 bar[0].b
???? ???? 02 bar[1].a
???? ???? 03 bar[1].b
1018: 0000 2000 00 h
...
2000: ???? ???? 00 h[0].a heap
???? ???? 01 h[0].b
???? ???? 02 h[1].a
???? ???? 03 h[1].b
???? ???? 04 h[2].a
???? ???? 05 h[2].b

2008/02/04

Gold Machine's Branching Instructions

opcode format semantics example example assembly (RISC)
branch 8-oo pc = pc-2 + 2*o 1000: 8006 br 0x100c
branch if equal 9roo if r[r] == 0,
pc = pc-2 + 2*o
1000: 9206 beq r2, 0x100c
branch if greater aroo if r[r] > 0,
pc = pc-2 + 2*o
1000: a206 bgt r2, 0x100c
jump b--- aaaaaaaa pc = a 1000: b000 00008000 jmp 0x8000
get pc 6f-d r[d] = pc-2 6f01 gpc r1
jump indirect croo pc = r[r] + 2*o c103 jmp 0x3(r1)
jump indirect, b+o droo pc = m[ r[r] + 4*o ] d103 jmp *0x3(r1)
jump indirect, index eir- pc = m r[r] + 4*r[i] ] e120 jmp *(4*r1,r2)

Note 1: All "branch" machine language instructions are relative to the current PC (though their assembly versions are absolute). All "jump" instructions are absolute.

Note 2: the semantics for each branch instruction include a -2. This is to undo the +2 increment to PC during the fetch stage.

Corollary to Note 2: a branch of "0" should make the branch instruction branch to itself. So if you ever branch by 0, you will enter an infinite loop. This simplifies branching forwards and backwards.

2008/01/22

Bitwise Operations

Given a number in hex, 0x1234
to extract an individual digit, you need to do bitwise manipulations.

WLOG, extract the third lowest digit (third from the right):
int x = 0x1234;
int third = (x >> (2*4)) & 0xf;
assert (third == 0x2);

The ">>" shift operator moves the number to the right by the specified number of bits. So:

   01001000110100b          0x1234
>>              8        >> 0x0008
------------------       ---------
           010010b          0x0012

The "&" integer bitwise operator masks the number. So:

  00010010b           0x0012
& 00001111b         & 0x000f
-----------         --------
  00000010b           0x0002

Pointer Example

C Gold Assembly Gold Machine Notes
int a = -1; 1000: ffffffff &a = 1000
int b = -1; 1004: ffffffff &b = 1004
int* p = &a; 2000: 00001000 &p = 2000
void foo() {
*p = 5; ld $0x5, r0
ld $0x2000, r1
ld 0x0(r1), r2
st r0, 0x0(r2)
0000 0000 0005
0100 0000 2000
1012
3002
r0 ~ *p
r1 = &p
r2 = p
*p = *p
p = &c; ld $0x1004, r2
st r2, 0x0(r1)
0200 0000 1004
3201
r2 = &c
p = &c
*p = 8; ld $0x8, r0
st r0, 0x0(r2)
0000 0000 0008
3002
r0 ~ *p
*p = *p
}
TODO: insert slide show

2008/01/15

Tracing Machine Execution

Short Example


previous next

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13.

Store Example


previous next

1, 2, 3, 4, 5, 6, 7.

Mapping C to Assembly to Machine Language

Short Example

C Gold Assembly Gold Machine Notes
void foo() {
int a = 0; ld $0x0, r1 0100 0000 0000 r1 ~ a
a++; inc r1 6301
a *= 2; add r1, r1 6111
}

Store Example

C Gold Assembly Gold Machine Notes
int a = 2; 1000: 0000 0002 &a = 1000
void foo() {
ld $0x1000, r0
ld 0x0(r0), r1
0: 0000 0000 1000
1001
r0 = &a
r1 = a
a++; inc r1 6301
a *= 2; add r1, r1 6111
st r1, 0x0(r0) 3001 a = r1
}

Array Example

C Gold Assembly Gold Machine Notes
int[] a =
{0,1,2,3};
1000: 0000 0000
0000 0001
0000 0002
0000 0003
&a[] = 1000
int b = -1; 2000: ffff ffff &b = 2000
void sum() {
b = 0; ld $0x0, r2 0: 0200 0000 0000 r2 ~ b
ld $0x1000, r0 0000 0000 1000 r0 = &a
b += a[0]; ld 0x0(r0), r1
add r1, r2
1001
6112
b += a[1]; ld 0x4(r0), r1
add r1, r2
1101
6112
b += a[2]; ld 0x8(r0), r1
add r1, r2
1201
6112
b += a[3]; ld 0xc(r0), r1
add r1, r2
1301
6112
ld $0x2000, r0
st r2, 0x0(r0)
0000 0000 2000
3002
r0 = &b
b = r2
}