Subsections

Function Calling Sequence

This section describes the standard function calling sequence, including stack frame layout, register usage, parameter passing and so on.

The standard calling sequence requirements apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions. Nevertheless, it is recommended that all functions use the standard calling sequence when possible.

Registers and the Stack Frame

The AMD64 architecture provides 16 general purpose 64-bit registers. In addition the architecture provides 16 SSE registers, each 128 bits wide and 8 x87 floating point registers, each 80 bits wide. Each of the x87 floating point registers may be referred to in MMX/3DNow! mode as a 64-bit register. All of these registers are global to all procedures in a running program.

This subsection discusses usage of each register. Registers %rbp, %rbx and %r12 through %r15 ``belong'' to the calling function and the called function is required to preserve their values. In other words, a called function must preserve these registers' values for its caller. Remaining registers ``belong'' to the called function.^3.5 If a calling function wants to preserve such a register value across a function call, it must save the value in its local stack frame.

The CPU shall be in x87 mode upon entry to a function. Therefore, every function that uses the MMX registers is required to issue an emms or femms instruction before accessing the MMX registers.^3.6 The direction flag in the %eflags register must be clear on function entry, and on function return.

The Stack Frame

In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure shows the stack organization.

**Figure:** Stack Frame with Base Pointer
$\begin{figure}\noindent\rule{\linewidth}{0.3mm} \begin{center} \begin{tabular}... ...one\\ \end{tabular} \end{center}\noindent\rule{\linewidth}{0.3mm} \end{figure}$

The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value $({\texttt{\%rsp}}\xspace - 8)$ is always a multiple of 16 when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame. ^3.7

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers.^3.8 Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue.

Parameter Passing

After the argument values have been computed, they are placed in registers, or pushed on the stack. The way how values are passed is described in the following sections.

Definitions

We first define a number of classes to classify arguments. The classes are corresponding to AMD64 register classes and defined as:

INTEGER: This class consists of integral types that fit into one of the general purpose registers.
SSE: The class consists of types that fits into a SSE register.
SSEUP: The class consists of types that fit into a SSE register and can be passed and returned in the most significant half of it.
X87, X87UP: These classes consists of types that will be returned via the x87 FPU.
COMPLEX_X87: This class consists of types that will be returned via the x87 FPU.
NO_CLASS: This class is used as initializer in the algorithms. It will be used for padding and empty structures and unions.
MEMORY: This class consists of types that will be passed and returned in memory via the stack.

Classification

The size of each argument gets rounded up to eightbytes.^3.9

The basic types are assigned their natural classes:

Arguments of types (signed and unsigned) _Bool, char, short, int, long, long long, and pointers are in the INTEGER class.
Arguments of types float, double and __m64 are in class SSE.
Arguments of types __float128 and __m128 are split into two halves. The least significant ones belong to class SSE, the most significant one to class SSEUP.
The 64-bit mantissa of arguments of type long double belongs to class X87, the 16-bit exponent plus 6 bytes of padding belongs to class X87UP.
Arguments of type __int128 offer the same operations as INTEGERs, yet they do not fit into one general purpose register but require two registers. For classification purposes __int128 is treated as if it were implemented as:
```
typedef struct {
  long low, high;
} __int128;
```
with the exception that arguments of type __int128 that are stored in memory must be aligned on a 16-byte boundary.
Arguments of complex T where T is one of the types float or double are treated as if they are implemented as:
```
struct complexT {
  T real;
  T imag;
};
```
A variable of type complex long double is classified as type COMPLEX_X87.

The classification of aggregate (structures and arrays) and union types works as follows:

If the size of an object is larger than two eightbytes, or in C++, is a non-POD ^3.10 structure or union type, or contains unaligned fields, it has class MEMORY.^3.11
Both eightbytes get initialized to class NO_CLASS.
Each field of an object is classified recursively so that always two fields are considered. The resulting class is calculated according to the classes of the fields in the eightbyte:
1. If both classes are equal, this is the resulting class.
2. If one of the classes is NO_CLASS, the resulting class is the other class.
3. If one of the classes is MEMORY, the result is the MEMORY class.
4. If one of the classes is INTEGER, the result is the INTEGER.
5. If one of the classes is X87, X87UP, COMPLEX_X87 class, MEMORY is used as class.
6. Otherwise class SSE is used.
Then a post merger cleanup is done:
1. If one of the classes is MEMORY, the whole argument is passed in memory.
2. If SSEUP is not preceeded by SSE, it is converted to SSE.

Passing

Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:

If the class is MEMORY, pass the argument on the stack.
If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used^3.12.
If the class is SSE, the next available SSE register is used, the registers are taken in the order from %xmm0 to %xmm7.
If the class is SSEUP, the eightbyte is passed in the upper half of the least used SSE register.
If the class is X87, X87UP or COMPLEX_X87, it is passed in memory.

**Figure:** Register Usage
$\begin{figure}\noindent\rule{\linewidth}{0.3mm} \begin{center} \begin{tabular}... ...\\ \end{tabular}\par\end{center}\noindent\rule{\linewidth}{0.3mm} \end{figure}$

If there is no register available anymore for any eightbyte of an argument, the whole argument is passed on the stack. If registers have already been assigned for some eightbytes of this argument, those assignments get reverted.

Once registers are assigned, the arguments passed in memory are pushed on the stack in reversed (right-to-left^3.13) order.

For calls that may call functions that use varargs or stdargs (prototype-less calls or calls to functions containing ellipsis (...) in the declaration) %al ^3.14 is used as hidden argument to specify the number of SSE registers used. The contents of %al do not need to match exactly the number of registers, but must be an upper bound on the number of SSE registers used and is in the range 0-8 inclusive.

Returning of Values

The returning of values is done according to the following algorithm:

Classify the return type with the classification algorithm.
If the type has class MEMORY, then the caller provides space for the return value and passes the address of this storage in %rdi as if it were the first argument to the function. In effect, this address becomes a ``hidden'' first argument.
On return %rax will contain the address that has been passed in by the caller in %rdi.
If the class is INTEGER, the next available register of the sequence %rax, %rdx is used.
If the class is SSE, the next available SSE register of the sequence %xmm0, %xmm1 is used.
If the class is SSEUP, the eightbyte is passed in the upper half of the last used SSE register.
If the class is X87, the value is returned on the X87 stack in %st0 as 80-bit x87 number.
If the class is X87UP, the value is returned together with the previous X87 value in %st0.
If the class is COMPLEX_X87, the real part of the value is returned in %st0 and the imaginary part in %st1.

As an example of the register passing conventions, consider the declarations and the function call shown in Figure . The corresponding register allocation is given in Figure , the stack frame offset given shows the frame before calling the function.

**Figure:** Parameter Passing Example
$\begin{figure}\noindent\rule{\linewidth}{0.3mm} \begin{center} \texttt{ \begin{t... ...end{tabular}}\xspace \end{center}\noindent\rule{\linewidth}{0.3mm} \end{figure}$

**Figure:** Register Allocation Example
$\begin{figure}\noindent\rule{\linewidth}{0.3mm} \begin{center} \begin{tabular}{l... ... \\ \end{tabular}\par\end{center}\noindent\rule{\linewidth}{0.3mm} \end{figure}$

Footnotes

... function.^3.5: Note that in contrast to the Intel386 ABI, %rdi, and %rsi belong to the called function, not the caller.
... registers.^3.6: All x87 registers are caller-saved, so callees that make use of the MMX registers may use the faster femms instruction.
... frame.^3.7: The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available.
... handlers.^3.8: Locations within 128 bytes can be addressed using one-byte displacements.
... .^3.9: Therefore the stack will always be eightbyte aligned.
...POD\xspace ^3.10: The term POD is from the ANSI/ISO C++ Standard, and stands for Plain Old Data. Although the exact definition is technical, a POD is essentially a structure or union that could have been written in C; there cannot be any member functions, or base classes, or similar C++ extensions.
...^3.11: A non-POD object cannot be passed in registers because such objects must have well defined addresses; the address at which an object is constructed (by the caller) and the address at which the object is destroyed (by the callee) must be the same. Similar issues apply when returning a non-POD object from a function.
... used ^3.12: Note that %r11 is neither required to be preserved, nor is it used to pass arguments. Making this register available as scratch register means that code in the PLT need not spill any registers when computing the address to which control needs to be transferred. %rax is used to indicate the number of SSE arguments passed to a function requiring a variable number of arguments. %r10 is used for passing a function's static chain pointer.
... (right-to-left ^3.13: Right-to-left order on the stack makes the handling of functions that take a variable number of arguments simpler. The location of the first argument can always be computed statically, based on the type of that argument. It would be difficult to compute the address of the first argument if the arguments were pushed in left-to-right order.
...%al\xspace ^3.14: Note that the rest of %rax is undefined, only the contents of %al is defined.

Jan Hubicka 2003-05-04