top of page
Writer's pictureThe Tech Platform

LINUX – INSIDE THE BUILD PROCESS

The Executable and Linking Format (ELF) is the file format standard for executables, objects, shared libraries and core dumps in Linux/Unix. The newest debug information format, compatible with ELF, is DWARF 5


If you are not writing a new compiler or debugger it is not necessary to understand every bit on those formats but there are some concepts and some tools that can help you write and manage your code.

Lets take a simple code example and compile it:

#include<stdio.h>
 
int a1=10,a2=20;
 
void f2()
{
printf("X");
}
 
void f1()
{
 int i;
 for(i=0;i<100;i++)
 {
 if(i % 20 == 0)
 a1++;
 f2();
 }
}
 
void main()
{
 char *str = "hello have a good day.....";
 f1();
 puts(str);
 printf("hello %d\n",a1);
}

compile with defaults

# gcc -o app ./test.c

Now we can obtain the ELF information using readelf(1) tool:

# readelf -a ./app

We can see:

  • The ELF header

1. used by file(1) command to display general information and code hardware architecture

  • Section headers

1. code, data, strings and more

  • Program headers

1. headers for dynamic binaries, stack, etc with their permissions (Read/Write/Exe)

  • Dynamic sections

  • and more


The Section Headers

Section Headers:
[Nr] Name Type  Address  Offset Size  EntSize  Flags  Link  Info  Align
 [ 0] NULL 0000000000000000 0000000 0000000000000000 0000000000000000 0 0 0
 [ 1] .interp           PROGBITS 0000000000400238 00000238 000000000000001c 0000000000000000 A 0 0 1
 [ 2] .note.ABI-tag     NOTE 0000000000400254 00000254 0000000000000020 0000000000000000 A 0 0 4
 [ 3] .note.gnu.build-i NOTE 0000000000400274 00000274 0000000000000024 0000000000000000 A 0 0 4
 [ 4] .gnu.hash         GNU_HASH 0000000000400298 00000298 000000000000001c 0000000000000000 A 5 0 8
 [ 5] .dynsym           DYNSYM 00000000004002b8 000002b8 0000000000000090 0000000000000018 A 6 1 8
 [ 6] .dynstr           STRTAB 0000000000400348 00000348 0000000000000062 0000000000000000 A 0 0 1
 [ 7] .gnu.version      VERSYM 00000000004003aa 000003aa 000000000000000c 0000000000000002 A 5 0 2
 [ 8] .gnu.version_r    VERNEED 00000000004003b8 000003b8 0000000000000030 0000000000000000 A 6 1 8
 [ 9] .rela.dyn         RELA 00000000004003e8 000003e8 0000000000000018 0000000000000018 A 5 0 8
 [10] .rela.plt         RELA 0000000000400400 00000400 0000000000000060 0000000000000018 AI 5 24 8
 [11] .init             PROGBITS 0000000000400460 00000460 000000000000001a 0000000000000000 AX 0 0 4
 [12] .plt              PROGBITS 0000000000400480 00000480 0000000000000050 0000000000000010 AX 0 0 16
 [13] .plt.got          PROGBITS 00000000004004d0 000004d0 0000000000000008 0000000000000000 AX 0 0 8
 [14] .text             PROGBITS 00000000004004e0 000004e0 00000000000002a2 0000000000000000 AX 0 0 16
 [15] .fini             PROGBITS 0000000000400784 00000784 0000000000000009 0000000000000000 AX 0 0 4
 [16] .rodata           PROGBITS 0000000000400790 00000790 000000000000001c 0000000000000000 A 0 0 4
 [17] .eh_frame_hdr     PROGBITS 00000000004007ac 000007ac 0000000000000054 0000000000000000 A 0 0 4
 [18] .eh_frame         PROGBITS 0000000000400800 00000800 0000000000000174 0000000000000000 A 0 0 8
 [19] .init_array       INIT_ARRAY 0000000000600e10 00000e10 0000000000000008 0000000000000000 WA 0 0 8
 [20] .fini_array       FINI_ARRAY 0000000000600e18 00000e18 0000000000000008 0000000000000000 WA 0 0 8
 [21] .jcr              PROGBITS 0000000000600e20 00000e20 0000000000000008 0000000000000000 WA 0 0 8
 [22] .dynamic          DYNAMIC 0000000000600e28 00000e28 00000000000001d0 0000000000000010 WA 6 0 8
 [23] .got              PROGBITS 0000000000600ff8 00000ff8 0000000000000008 0000000000000008 WA 0 0 8
 [24] .got.plt          PROGBITS 0000000000601000 00001000 0000000000000038 0000000000000008 WA 0 0 8
 [25] .data             PROGBITS 0000000000601038 00001038 0000000000000018 0000000000000000 WA 0 0 8
 [26] .bss              NOBITS 0000000000601050 00001050 0000000000000008 0000000000000000 WA 0 0 1
 [27] .comment          PROGBITS 0000000000000000 00001050 0000000000000034 0000000000000001 MS 0 0 1
 [28] .shstrtab         STRTAB 0000000000000000 00001084 00000000000000fc 0000000000000000 0 0 1

In each section the build system places different entities. The important sections are

  • .text – the compiled code

  • .data – initialized data

  • .bss – uninitialized data


.bss and .data

To see the difference lets look at a simple code:

int arr[1000];
 
void main()
{
arr[0]=1;
arr[1]=2;
 
....
}

If we declare a global array without initialization – the build system place it in .bss section:


readelf output:



On application loading the .bss section allocated in ram. The ELF file size:



But if we add some values to initailize the array:

int arr[1024] = {1,2};
 
void main()
{
char *str = "hello have a good day.....";
...
}

We can see that the array is now on .data section:



and the ELF size is now bigger:



means that the ELF file contains the array even if we only initialized only 2 elements


So if the same program declare the array size as 1000000 :

int arr[1000000] = {1,2};
 
void main()
{
char *str = "hello have a good day.....";
...
}

the size is now:




i.e. the ELF file is now full of zeros

Debugging Information

debugging info is placed inside the ELF file to let the debugger know the source line correspond to the machine code. The debugger load the program and use the debug info to know where to place breakpoints – It replace the machine instruction with a trap (in x86 – int 3) , this will cause the CPU to generate an exception and it replace it back after resuming


if we compile the code with debug info we will see it size get bigger:



And the sections:




Other useful tools


nm – list the symbols with addresses:

developer@:~/testapp$ nm app2
0000000000601048 D a1
000000000060104c D a2
0000000000601050 B __bss_start
0000000000601050 b completed.7585
0000000000601038 D __data_start
0000000000601038 W data_start
.....

objdump – display information for object file:

  • headers (-x)

  • debugging information (-g)

  • disassembly (-d)

  • disassembly with source code (-S)

  • and more …

# objdump -S ./app2
...
void main()
{
  400626: 55                    push  %rbp
  400627: 48 89 e5             	mov   %rsp,%rbp
  40062a: 48 83 ec 10           sub   $0x10,%rsp

char *str = "hello have a good day.....";
   40062e: 48 c7 45 f8 f4 06 40       movq    $0x4006f4,-0x8(%rbp)
   400635: 00 

f1();
 400636: b8 00 00 00 00         mov    $0x0,%eax
 40063b: e8 87 ff ff ff       	callq  4005c7 <f1>
puts(str);


addr2line – display the source code line number from known address


This is very useful if the program crashed and we wrote a fault handler to display the faulty address (program counter). We can use this tool with ELF containing debug info (the crashed program can be stripped):

developer@:~/testapp$ addr2line -e app2 0x400630
/home/developer/testapp/./a.c:25

strip – remove symbols and debugging information from ELF file


Before we deploy our app, we can remove all the symbols and debug info without the need to compile it again.

developer@:~/testapp$ strip -o appstripped ./app
developer@:~/testapp$ ls -l
total 44
-rw-rw-r-- 1 developer developer 290 dec 7 19:04 a.c
-rwxrwxr-x 1 developer developer 29864 dec 7 19:05 app
-rwxrwxr-x 1 developer developer 6336 dec 7 21:17 appstripped


objcopy -copy and translate object files


you can use objcopy to copy content from one elf file to another, for example if you want to copy only debug information to a separate file use:

# gcc -g3 -o app ./a.c 
# objcopy --only-keep-debug app app.debug
# strip -s ./app

And you can add the debug info later:

# objcopy --add-gnu-debuglink app.debug app
# gdb ./app

size – display sections size

# size ./app
 text	   data	    bss	    dec	    hex	 filename
 1594       576      8      2178    882  ./app



Source: devarea


The Tech Platform

0 comments

コメント


bottom of page