PeerHood performance and memory usage analysis

Used tools

Google Performance Tools

Available for Linux, in Ubuntu install libgoogle-perftools0 and libgoogle-perftools-dev -packages:

sudo apt-get install libgoogle-perftools0 libgoogle-perftools-dev

Analyses only heap usage, memory leaks and object / function calls. Very usable to see which objects or function calls are performed while the program was running.

Usage:

  • For programs to be analyzed use handles:
    -ltcmalloc -lpthread

    when compiling

  • Running heap profiler:
    HEAPPROFILE=/path/to/profile/name_of_the_profile program_executable
    • Where HEAPPROFILE will create a profile named as name_of_the_profile.0001.heap in /path/to/profile after program has been closed.
      • The 0001' means that it is the first part of the heap dump, by default the limit is 1GB of reserved memory, this limit can be changed with a certain parameter, see homepages for further details.


Analyzing the performance:

  • The created profile can be analysed with
    pprof

    using appropriate viewer or file format, see

    man pprof

    for further details.

    • General format for pprof:
      pprof --<insert_viewer> program_executable /path/to/profile/name_of_the_profile.0001.heap
    • For example using gv as viewer (naturally) requires gv and also graphviz-library must be installed.
    • By default pprof doesn't look dynamic symbols from files with nm. It can be added into pprof by editing the file (written in Perl)
      /usr/bin/pprof

      and adding -D -switch into function

      open(NM, $nm -C -n $image ... )

      Change it to

      open(NM, $nm -D -C -n $image ...)
    • With additional flags for pprof the output type can be changed:


flag meaning
–inuse_space Display the number of in-use megabytes (i.e. space that has been allocated but not freed). This is the default.
–inuse_objects Display the number of in-use objects (i.e. number of objects that have been allocated but not freed).
–alloc_space Display the number of allocated megabytes. This includes the space that has since been de-allocated. Use this if you want to find the main allocation sites in the program.
–alloc_objects Display the number of allocated objects. This includes the objects that have since been de-allocated. Use this if you want to find the main allocation sites in the program.


See homepages of Google Performance Tool for further information.

Exmap

Available for Linux, in Ubuntu:

sudo apt-get install exmap

Can be used to analyze the real amount of memory used by different parts of program and the libraries it is using. Can show the amount of virtual, resident and mapped memory currently in use (apparently shows the usage when exmap was launched, doesn't follow usage in real time), it also shows the effective (real) sizes of mapped and resident memory. Shows the symbols from used files and libraries if found. See documentary at home pages for further information. Exmap doesn't support exporting of analyzed data.

If exmap kernel modules aren't installed on the system download, one solution is to:

  1. Install package exmap-modules-source:
    apt-get install exmap-modules-source
  2. Build modules for example with
    sudo module-assistant build exmap
  3. Install modules with
    sudo modprobe exmap


Usage:

  • Run as (as root or super user)
    gexmap

    opens a graphical window showing all running processes.

  • Clicking a process will show detailed information about the memory usage and loaded libraries of that process.
  • Clicking a detailed process information or loaded library will show the symbols found from symbol table.

Preliminary results

Google Performance Tools

Test was performed only for PeerHood daemon without any program using it via PeerHood library. Only Bluetooth plugin was used (no gprs or wlan installed yet) in testing. PeerHood daemon was tested with and without debug information, the tests didn't run exactly the same amount of time and number of surrounding Bluetooth devices wasn't known (maybe should rerun tests in “neutral” environment).

Memory leaks could not be analyzed because PeerHood daemon makes a daemon from itself and Google Performance Tool apparently doesn't notice when the program is terminated (by sending TERM signal).

Debug Heap size
total allocated (MB)
Total allocated objecs /
function calls
Of which related to debugging In use objects /
function calls
Of which related to debugging
ON 2.3 721 367 73 6
OFF 1.7 424 67 67 6

Profiling adds an overhead of 1.0 MB into heap size and only couple of objects / additional function calls.

Exmap

Exmap was run after PeerHood daemon has been running for some hours with Bluetooth plugin only and no application using it via library. Following table shows detailed information about PeerHood related files, heap and the part of application that Exmap could not examine closely (anon).

Filename Effective resident Effective mapped Writable Virtual Sole Mapped Mapped Resident
anon 30 K 57 K 26 K 16513 K 57 K 57 K 30 K
heap 232 K 452 K 232 K 1536 K 452 K 452 K 232 K
phd 48 K 48 K 4 K 92 K 48 K 48 K 48 K
btplugin 76 K 96 K 4 K 96 K 96 K 96 K 76 K


No overhead was added by Exmap because it only examines the memory usage of running processes, i.e. phd must be running when Exmap is started to get the memory usage.

2nd test run results

Google Performance Tools

Test was performed only for PeerHood daemon without any program using it via PeerHood library. Only Bluetooth plugin was used (no gprs or wlan installed yet) in testing. PeerHood daemon was tested with and without debug information. Tests were run approximately for 30 minutes and there were 4 different Bluetooth devices in the neighborhood, apparently one was a PeerHood capable device.

Debug Heap size
(in use)
Heap size
(total allocated)
In use objects /
function calls
Debugging calls Total allocated objecs /
function calls
Debugging calls Responses from
devices (total)
Requests from devices
ON 1.4 MB 16.5 MB 21261 6 31138 3743 181 1
OFF 1.1 MB 3.4 MB 3457 6 5377 47 N/A N/A


Profiling adds an overhead of 1.0 MB into heap size.

Profiling diagrams :

Exmap

Exmap was run after PeerHood daemon had been running for 30 minutes with 4 other Bluetooth devices in the neighborhood. No overhead was caused by Exmap.

Results: The result of this test showed that roughly 16 megabytes of virtual memory is reserved by something but only 65 kilobytes of the reserved memory is in use (resident memory). Heap size is roughly the same as it was in previous tests. Other parts of the program (libraries, plugins etc.) didn't reserve big chunks of memory. phd_memory_usage_with_exmap.png.png

3rd test with debugger

GDB debugger was used to follow and control the program execution step by step with the help of breakpoints. Test was run with bluetooth plugin only and no services were using the PeerHood daemon. PeerHood was built with debugging information (required by GDB).

Result:
Test did show that when a new thread (inquiry, advert) was started through “dummy” function it did reserve roughly 8 megabytes of virtual memory. Although only a small portion of this reserved memory chunk is used like previous test showed.

Before starting plugins Inquiry thread started Advert thread started
VM size (KB) 3300 11496 19696
Heap size (KB) 1352 1368 1452


This might be a result from that according to article: Memory Manacement in C++, in C++ every thread is given a private heap that is a lock-free memory region reserved only for one thread. This should remove bottlenecks when multiple threads are in use → no need to synchronize and wait for locks to open.

Testing the memory usage of threads

This issue was tested with a smaller program which used multiple threads. When a thread was started roughly 8 MB of memory was reserved for the started thread, in this case, there was very little activity inside one thread (create random number and print it and sleep for a while). When amount of threads was increased by X, the size of reserved memory was X * 8 MB.

The pthread-library reserves same amount of memory for each thread that is set as stack size in operating system. In Ubuntu Linux the default stack size seems to be 8 MB. When the default stack size was reduced to half the memory consumption did also cut in half. The second test shows that PeerHood daemon uses only a small portion of the reserved memory so the size of the stack could be reduced greatly.

Solution

The stack size can be set at operating system level with:

ulimit -s <stack_size_in_kilobytes>

This is not a very good way to do the stack size adjustment when thinking about a solution for PeerHood.

A better solution is to set a stack size for every thread (each thread can have different stack size):

// Required variables
size_t stack_size = xxxxxxx; // In bytes
pthread_attr_t pt_attr;
.
.
.
// Initialize variable
pthread_attr_init(&pt_attr);
// Set stack size
pthread_attr_setstacksize(&pt_attr,stack_size);
.
.
.
// When creating a thread give attribute as a second parameter
pthread_create(&thread,&pt_attr,threadfunction,function_args);