How sysdepCallMethod works?

Introduction

Kaffe, a free implementation of Java Virtual Machine, is known to be a very portable software. But there are several points where implementors to new environments shall consider writing some codes. One of which is sysdepCallMethod routine, which is usually implemented as C language macro. This memo describes what is sysdepCallMethod, and how it works, and (finally) how you can write new sysdepCallMethod to a new type of CPU.

Who calls sysdepCallMethod?

The sysdepCallMethod routine is called by callMethodA or callMethodV in "kaffe/kaffevm/support.c" file. These two routines handle how to call (Java) methods including native methods. When a native method is called, the sysdepCallMethod is called when all arguments in Java world are formatted. The argument for the sysdepCallMethod is the pointer to the type of callMethodInfo defined in "kaffe/kaffevm/support.h". The description of members of this data type is also included in that file.

The reason why we can not directly call C language function from the Java world should be trivial. But at least I have to point out, there are so many different way to pass arguments between C language functions from one type of CPU to the other. Some only use the stack to store arguments, but some use general registers to store arguments. And of course, packing or alignment of short length data are different from CPU to CPU.

In the callMethodInfo data type, the caller (callMethodA or callMethodV) packs each (Java's) argument into calltype array, and the sizes in 4 bytes word are stored in callsize array. Since the 4 bytes word is the natural size for most types of CPU, if you have longer length of data (Java's long or double), the callsize[i] becomes 2 and callsize[i+1] becomes 0 to eliminate later unpacking.

How sysdepCallMethod is expanded?

It is easier to describe what actually sysdepCallMethod does by using an example. So, I use the m68k port as an example. Partially, it is because I did the port, but also it is because this machine uses the simplest calling convention.

In the case of m68k/{sunos4, netbsd1 or linux}, all arguments are pushed onto stack. Even if the size of data is shorter than 4 bytes, stack are incremented or decremented by the multiple of 4 bytes, and hence they are 4 bytes aligned.

The following code is extracted from config/m68k/common.h with slight modification for clarity.

#define sysdepCallMethod(CALL) do {                             \
        int extraargs[(CALL)->nrargs];                          \
        register int d0 asm ("d0");                             \
        register int d1 asm ("d1");                             \
        int *res;                                               \
        int *args = extraargs;                                  \
        int argidx;                                             \
        for(argidx = 0; argidx < (CALL)->nrargs; ++argidx) {    \
                if ((CALL)->callsize[argidx])                   \
                        *args++ = (CALL)->args[argidx].i;       \
                else                                            \
                        *args++ = (CALL)->args[argidx-1].j;     \
        }                                                       \
        asm volatile (   "jsr    %2@\n"				\
         : "=r" (d0), "=r" (d1)                                 \
         : "a" ((CALL)->function),                              \
           "r" ((CALL)->nrargs * sizeof(int))                   \
         : "cc", "memory");                                     \
        if ((CALL)->retsize != 0) {                             \
                res = (int *)(CALL)->ret;                       \
                res[1] = d1;                                    \
                res[0] = d0;                                    \
        }                                                       \
} while (0)

In this macro, only one line with one instruction is written in assembly language, and everything else is written in C language. But still it heavily depends on how the program written in C language is translated to assembly language.

The first declaration for local variable (extraargs) is used to store the contents of arguments. In gcc, the first local variable occupies lowest position in the memory, and when subroutine call occurs, this first declaration guarantees that it is just above the point in stack where return address is stored.

The second and third declarations look like we declare two new variables. But actually, they are used to alias CPU's register to C language. This is newly introduced feature in GCC from 2.7.1, and which will be used later to get the result back from the function.

If you once understand the magic of extraargs, you can easily understand the behavior of the 'for' loop. But keep in mind that, if you have 8 bytes (two 4 bytes words) long data, the second callsize will be 0 and by casting, lower 4 bytes are stored into stack.

Then you have one line assembly. The 'jsr' instruction pushes the address to be returned into stack, and jump to the address specified. In this case, the address is specified by 'function' field in callMethodInfo and it was prepared by the caller of sysdepCallMethod.

When function call returns, the value will be stored in d0 and d1, and adjustment of stack pointer is caller's responsibility in m68k. But the arguments are stored in locally declared array variable, and we need not do any stack pointer adjustment.

To get the value in the general register is done by the last 'if' clause. As I mentioned earlier, the local variables d0 and d1 are actually the aliases for hadrware registers.

In the case of RISC machine.

In most types of RISC machine, some arguments are passed by using general registers. It makes writing sysdepCallMethod harder and longer.

I use the arm port as an example to describe RISC type sysdepCallMethod, and here is the code.

#define sysdepCallMethod(CALL) do {                                     \
  int extraargs[((CALL)->nrargs>4)?((CALL)->nrargs-4):0];               \
  switch((CALL)->nrargs) {                                              \
    register int r0 asm("r0");                                          \
    register int r1 asm("r1");                                          \
    register int r2 asm("r2");                                          \
    register int r3 asm("r3");                                          \
    int *res;                                                           \
  default:                                                              \
    {                                                                   \
      int *args = extraargs;                                            \
      int argidx = 4;                                                   \
      if ((CALL)->callsize[3] == 2) args++;                             \
      for(; argidx < (CALL)->nrargs; ++argidx) {                        \
        if ((CALL)->callsize[argidx]) {                                 \
          *args++ = (CALL)->args[argidx].i;                             \
          if ((CALL)->callsize[argidx] == 2)                            \
            *args++ = ((CALL)->args[argidx].j) >> 32;                   \
        }                                                               \
      }                                                                 \
    }                                                                   \
  case 4:                                                               \
    if ((CALL)->callsize[3]) {                                          \
      r3 = (CALL)->args[3].i;                                           \
      if ((CALL)->callsize[3] == 2)                                     \
        *extraargs = ((CALL)->args[3].j) >> 32;                         \
    }                                                                   \
  case 3:                                                               \
    if ((CALL)->callsize[2]) {                                          \
      r2 = (CALL)->args[2].i;                                           \
      if ((CALL)->callsize[2] == 2)                                     \
        r3 = ((CALL)->args[2].j) >> 32;                                 \
    }                                                                   \
  case 2:                                                               \
    if ((CALL)->callsize[1]) {                                          \
      r1 = (CALL)->args[1].i;                                           \
      if ((CALL)->callsize[1] == 2)                                     \
        r2 = ((CALL)->args[1].j) >> 32;                                 \
    }                                                                   \
  case 1:                                                               \
    if ((CALL)->callsize[0]) {                                          \
      r0 = (CALL)->args[0].i;                                           \
      if ((CALL)->callsize[0] == 2)                                     \
        r1 = ((CALL)->args[0].j) >> 32;                                 \
    }                                                                   \
  case 0:                                                               \
    asm ("mov lr, pc\n                                                  \
          mov pc, %2\n"                                                 \
        : "=r" (r0), "=r" (r1)                                          \
        : "r" ((CALL)->function),                                       \
          "0" (r0), "1" (r1), "r" (r2), "r" (r3)                        \
        : "ip", "rfp", "sl", "fp", "lr"                                 \
        );                                                              \
    res = (int *)(CALL)->ret;                                           \
    res[0] = r0;                                                        \
    res[1] = r1;                                                        \
    break;                                                              \
  }                                                                     \
} while (0)                                                             \

Even if it looks long and complicated macro, the structure is straight forward. Since arm uses 4 registers r0, r1, r2 and r3 to pass arguments, we have to handle first four arguments specially. Remainder may go into stack as the same manner as CISC machine, and extraargs technique (in this case, this is exactly 'extra' args) involved in m68k port is used.

As we did in m68k port, we allocate enough memory in stack by extraargs variable, and make aliases for registers used to pass arguments and return value. It is so lucky that we only need to declare four registers for arm, compared with sparc (in this case we use 6 registers), or powerpc (in this case we use 8 general registers and some for float).

The switch statement surrounded register variable declaration, is to handle first four arguments specially. I always use the term 'first four' but when we have Java's long or double value, the calling convention uses two adjacent registers. Each case statements do not have break statements, and which allows us to write the entier code simpler.

When all arguments are put into proper position, two line assembler code exists. Even if I am not familiar with arm assembler (or actually, I've never used arm machine...), it is so easy to understand the first line saves the current programming counter into some position, and the second line sets the programming counter to the function given, or simply, makes jump to the function. The return values are treated similarly to the m68k case.

How to write new sysdepCallMethod

Of course writing sysdepCallMethod need some skill for assembly language, but if you know some it is not so hard. You can first try to see the assembler output of gcc how functions are called. Of course, if you have proper documents for calling convention, it is very helpful.

When I tried to write a mips version, until I get ABI specification, I can not understand very special case. This is one example, how such documents are useful.

How to debug sysdepCallMethod?

If you install GDB to debug Kaffe, it can help you to check whether your port of sysdepCallMethod works properly or not.

The first step to check the behavior of sysdepCallMethod is to make sure the callee of sysdepCallMethod gets proper argument. You have to set breakpoints in callMethodA and callMethodV functions in "kaffe/kaffevm/support.c" file. It is clever to use interpreter rather than jit for the first try, and it is easier to use statically linked version.

When program reached at the point where sysdepCallMethod will be invoked, you just do 'step' gdb command several times. It will guide you to the callee function of sysdepCallMethod. When you reached in the function, check each arguments are OK. Since, complicated system uses different way of parameter passing depends on the types of the data, it is better to test this with several different types of parameter lists.

You may get integer or java's long (which is C's long long) data in arguments, and when they look OK, you have to test float or double are passed properly.


Last modified 1998/7/27 ??:?? (JST).