Subsections

Coding Examples

The following sections show only the difference to the Intel386 ABI.

Architectural Constraints

The AMD64 architecture usually does not allow to encode arbitrary 64-bit constants as immediate operand of the instruction. Most instructions accept 32-bit immediates that are sign extended to the 64-bit ones. Additionally the 32-bit operations with register destinations implicitly perform zero extension making loads of 64-bit immediates with upper half set to 0 even cheaper.

Additionally the branch instructions accept 32-bit immediate operands that are sign extended and used to adjust instruction pointer. Similarly an instruction pointer relative addressing mode exists for data accesses with equivalent limitations.

In order to improve performance and reduce code size, it is desirable to use different code models depending on the requirements.

Code models define constraints for symbolic values that allow the compiler to generate better code. Basically code models differ in addressing (absolute versus position independent), code size, data size and address range. We define only a small number of code models that are of general interest:

Small code model
The virtual address of code executed is known at link time. Additionally all symbols are known to be located in the virtual addresses in the range from 0 to 231-210 - 1.

This allows the compiler to encode symbolic references with offsets in the range from -231 to 210 directly in the sign extended immediate operands, with offsets in the range from 0 to 231+210 in the zero extended immediate operands and use instruction pointer relative addressing for the symbols with offsets in the range -210 to 210.

This is the fastest code model and we expect it to be suitable for the vast majority of programs.

Kernel code model

The kernel of an operating system is usually rather small but runs in the negative half of the address space. So we define all symbols to be in the range from 264-231 to 264-210.

This code model has advantages similar to those of the small model, but allows encoding of zero extended symbolic references only for offsets from 231 to 231+210. The range offsets for sign extended reference changes to 0-231+210.

Medium code model

The medium code model does not make any assumptions about the range of symbolic references to data sections. Size and address of the text section have the same limits as the small code model.

This model requires the compiler to use movabs instructions to access static data and to load addresses into register, but keeps the advantages of the small code model for manipulation of addresses to the text section (specially needed for branches).

Large code model

The large code model makes no assumptions about addresses and sizes of sections.

The compiler is required to use the movabs instruction, as in the medium code model, even for dealing with addresses inside the text section. Additionally, indirect branches are needed when branching to addresses whose offset from the current instruction pointer is unknown.

It is possible to avoid the limitation for the text section by breaking up the program into multiple shared libraries, so we do not expect this model to be needed in the foreseeable future.

Small position independent code model (PIC)

Unlike the previous models, the virtual addresses of instructions and data are not known until dynamic link time. So all addresses have to be relative to the instruction pointer.

Additionally the maximum distance between a symbol and the end of an instruction is limited to 231-210-1, allowing the compiler to use instruction pointer relative branches and addressing modes supported by the hardware for every symbol with an offset in the range -210 to 210.

Medium position independent code model (PIC)

This model is like the previous model, but makes no assumptions about the distance of symbols to the data section.

In the medium PIC model, the instruction pointer relative addressing can not be used directly for accessing static data, since the offset can exceed the limitations on the size of the displacement field in the instruction. Instead an unwind sequence consisting of movabs, lea and add needs to be used.

Large position independent code model (PIC)

This model is like the previous model, but makes no assumptions about the distance of symbols.

The large PIC model implies the same limitation as the medium PIC model regarding addressing of static data. Additionally, references to the global offset table and to the procedure linkage table and branch destinations need to be calculated in a similar way.

Position-Independent Function Prologue

AMD64 does not need any function prologue for calculating the global offset table address since it does not have an explicit GOT pointer.

Data Objects

This section describes only objects with static storage. Stack-resident objects are excluded since programs always compute their virtual address relative to the stack or frame pointers.

Because only the movabs instruction uses 64-bit addresses directly, depending on the code model either %rip-relative addressing or building addresses in registers and accessing the memory through the register has to be used.

For absolute addresses %rip-relative encoding can be used in the small model. In the medium model the movabs instruction has to be used for accessing addresses.

Position-independend code cannot contain absolute address. To access a global symbol the address of the symbol has to be loaded from the Global Offset Table. The address of the entry in the GOT can be obtained with a %rip-relative instruction in the small model.

Figure: Absolute Load and Store (Small Model)
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Figure: Absolute Load and Store (Medium and Large Models)
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Figure: Position-Independend Load and Store (Small and Medium PIC Model)
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Function Calls

Figure: Position-Independent Direct Function Call (Small and Medium Model)
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Figure: Position-Independent Indirect Function Call
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Branching

Not done yet.

Variable Argument Lists

Some otherwise portable C programs depend on the argument passing scheme, implicitly assuming that 1) all arguments are passed on the stack, and 2) arguments appear in increasing order on the stack. Programs that make these assumptions never have been portable, but they have worked on many implementations. However, they do not work on the AMD64 architecture because some arguments are passed in registers. Portable C programs must use the header file <stdarg.h> in order to handle variable argument lists.

When a function taking variable-arguments is called, %rax must be set to the total number of floating point parameters passed to the function in SSE registers.3.16

Figure: Parameter Passing Example with Variable-Argument List
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}

Figure: Register Allocation Example for Variable-Argument List
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\begin{tabular}{l...
... & & \\
\end{tabular}\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}


The Register Save Area

The prologue of a function taking a variable argument list and known to call the macro va_start is expected to save the argument registers to the register save area. Each argument register has a fixed offset in the register save area as defined in the figure [*].

Only registers that might be used to pass arguments need to be saved. Other registers are not accessed and can be used for other purposes. If a function is known to never accept arguments passed in registers3.17, the register save area may be omitted entirely.

The prologue should use %rax to avoid unnecessarily saving XMM registers. This is especially important for integer only programs to prevent the initialization of the XMM unit.

Figure: Register Save Area
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\begin{tabular}{l...
...88$\ \\
\end{tabular}\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}


The va_list Type

The va_list type is an array containing a single element of one structure containing the necessary information to implement the va_arg macro. The C definition of va_list type is given in figure [*].

Figure: va_list Type Declaration
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\texttt{
\begin{t...
...end{tabular}}\xspace
\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}


The va_start Macro

The va_start macro initializes the structure as follows:

reg_save_area
The element points to the start of the register save area.
overflow_arg_area
This pointer is used to fetch arguments passed on the stack. It is initialized with the address of the first argument passed on the stack, if any, and then always updated to point to the start of the next argument on the stack.
gp_offset
The element holds the offset in bytes from reg_save_area to the place where the next available general purpose argument register is saved. In case all argument registers have been exhausted, it is set to the value 48 (6*8).
fp_offset
The element holds the offset in bytes from reg_save_area to the place where the next available floating point argument register is saved. In case all argument registers have been exhausted, it is set to the value 304 (6*8+16*16).


The va_arg Macro

The algorithm for the generic va_arg(l, type) implementation is defined as follows:

  1. Determine whether type may be passed in the registers. If not go to step [*].
  2. Compute num_gp to hold the number of general purpose registers needed to pass type and num_fp to hold the number of floating point registers needed.
  3. Verify whether arguments fit into registers. In the case:

    \begin{displaymath}\texttt{l->gp_offset}\xspace > 48 - \texttt{num_gp}\xspace * 8\end{displaymath}

    or

    \begin{displaymath}\texttt{l->fp_offset}\xspace > 304 - \texttt{num_fp}\xspace * 16\end{displaymath}

    go to step [*].
  4. Fetch type from l->reg_save_area with an offset of l->gp_offset and/or l->fp_offset. This may require copying to a temporary location in case the parameter is passed in different register classes or requires an alignment greater than 8 for general purpose registers and 16 for XMM registers.
  5. Set:

    \begin{displaymath}\texttt{l->gp_offset}\xspace = \texttt{l->gp_offset}\xspace + \texttt{num_gp}\xspace * 8\end{displaymath}


    \begin{displaymath}\texttt{l->fp_offset}\xspace = \texttt{l->fp_offset}\xspace + \texttt{num_fp}\xspace * 16.\end{displaymath}

  6. Return the fetched type.
  7. Align l->overflow_arg_area upwards to a 16 byte boundary if alignment needed by type exceeds 8 byte boundary.
  8. Fetch type from l->overflow_arg_area.
  9. Set l->overflow_arg_area to:

    \begin{displaymath}\texttt{l->overflow_arg_area}\xspace + \texttt{sizeof}\xspace (\texttt{type}\xspace )\end{displaymath}

  10. Align l->overflow_arg_area upwards to an 8 byte boundary.
  11. Return the fetched type.

The va_arg macro is usually implemented as a compiler builtin and expanded in simplified forms for each particular type. Figure [*] is a sample implementation of the va_arg macro.

Figure: Sample Implementation of va_arg(l, int)
\begin{figure}\noindent\rule{\linewidth}{0.3mm}
\begin{center}
\begin{tabular}{\...
...ine{1-4}
\end{tabular}\end{center}\noindent\rule{\linewidth}{0.3mm}
\end{figure}



Footnotes

...3.16
This implies that the only legal values for %rax when calling a function with variable-argument lists are 0 to 8 (inclusive).
... registers3.17
This fact may be determined either by exploring types used by the va_arg macro, or by the fact that the named arguments already are exhausted the argument registers entirely.
Jan Hubicka 2003-05-04