Issue Discovery
There has been one application inside the production environment whose process memory usage increases steadily. Every 48 to 96 hours, the server goes OOM.
Need to restart server every day.
Environment Info:
- Java 8, Spring 4.3.12, Tomcat 7 with exploded war
Heap memory has no issue, being cleaned up by GC properly.
Â
Pressure Test
To reproduce the issue, we created some script to run load testing on the server in staging environment. We successfully reproduced the issue on staging server.
- Start of the Pressure Test: Process Memory 457MB
Â
Native Memory Tracking: Total: reserved=1966086KB, committed=696534KB - Java Heap (reserved=524288KB, committed=524288KB) (mmap: reserved=524288KB, committed=524288KB) - Class (reserved=1099919KB, committed=55655KB) (classes #7923) (malloc=6287KB #13507) (mmap: reserved=1093632KB, committed=49368KB) - Thread (reserved=40426KB, committed=40426KB) (thread #40) (stack: reserved=40092KB, committed=40092KB) (malloc=127KB #201) (arena=207KB #79) - Code (reserved=255135KB, committed=29847KB) (malloc=5535KB #6863) (mmap: reserved=249600KB, committed=24312KB) - GC (reserved=24934KB, committed=24934KB) (malloc=5774KB #196) (mmap: reserved=19160KB, committed=19160KB) - Compiler (reserved=203KB, committed=203KB) (malloc=72KB #288) (arena=131KB #3) - Internal (reserved=6701KB, committed=6701KB) (malloc=6669KB #10234) (mmap: reserved=32KB, committed=32KB) - Symbol (reserved=11753KB, committed=11753KB) (malloc=8613KB #87963) (arena=3141KB #1) - Native Memory Tracking (reserved=2060KB, committed=2060KB) (malloc=155KB #2453) (tracking overhead=1905KB) - Arena Chunk (reserved=667KB, committed=667KB) (malloc=667KB)
- 3 hours later: Process Memory 945MB
There are more than 200 MB not tracked by JVM.
Native Memory Tracking: Total: reserved=2000277KB, committed=757741KB - Java Heap (reserved=524288KB, committed=524288KB) (mmap: reserved=524288KB, committed=524288KB) - Class (reserved=1113008KB, committed=70280KB) (classes #9328) (malloc=7088KB #30407) (mmap: reserved=1105920KB, committed=63192KB) - Thread (reserved=50942KB, committed=50942KB) (thread #50) (stack: reserved=50372KB, committed=50372KB) (malloc=160KB #251) (arena=410KB #99) - Code (reserved=260743KB, committed=60935KB) (malloc=11143KB #13060) (mmap: reserved=249600KB, committed=49792KB) - GC (reserved=24942KB, committed=24942KB) (malloc=5782KB #430) (mmap: reserved=19160KB, committed=19160KB) - Compiler (reserved=265KB, committed=265KB) (malloc=134KB #430) (arena=131KB #3) - Internal (reserved=9720KB, committed=9720KB) (malloc=9688KB #13441) (mmap: reserved=32KB, committed=32KB) - Symbol (reserved=13389KB, committed=13389KB) (malloc=10120KB #104655) (arena=3268KB #1) - Native Memory Tracking (reserved=2791KB, committed=2791KB) (malloc=197KB #3111) (tracking overhead=2594KB) - Arena Chunk (reserved=189KB, committed=189KB) (malloc=189KB)
Analysis
It looks like a Glibc 64MB arena memory block problem.
Comparing the pmap result of 1h and 3h load test, several consecutive 64MiB (65536 KB) blocks came up.
~/ diff pmap_1h.txt pmap_3h.txt ... 45,46c45,51 < 00007f4d00000000 6740 6740 6740 rw--- [ anon ] < 00007f4d00695000 58796 0 0 ----- [ anon ] --- > 00007f4cf0000000 17156 16992 16992 rw--- [ anon ] > 00007f4cf10c1000 48380 0 0 ----- [ anon ] > 00007f4cf8000000 65504 65504 65504 rw--- [ anon ] > 00007f4cfbff8000 32 0 0 ----- [ anon ] > 00007f4cfc000000 65492 65492 65492 rw--- [ anon ] > 00007f4cffff5000 44 0 0 ----- [ anon ] > 00007f4d00000000 131072 131072 131072 rw--- [ anon ] ...
Reason Speculation
Under multithreading racing conditions, the Glibc will issue multiple 'arena' s with 64MiB in size.
The max number of arenas that can be issued should be
NUMBER_OF_CPU_CORES * (sizeof(long) == 4 ? 2 : 8)
.If there's memory fragmentation, when freeing large chunks of memory, the Glibc won't return the chunks to the system immediately. Instead, it will store the memory inside the unsorted bin and give it out when the user requests new memory. This keeps the process's memory use increasing as the arena is issued but never returned to the system.
Possible Source of Issue
According to many online articles, this problem seems to have something to do with the Java
ServiceLoader
and UrlClassLoader
Classes interfering with each other.When the buffer is enabled, the racing condition between these two classes causes the file handler to leak. JDK 9 fixed this issue by disabling the cache inside
UrlClassLoader
.Two websites regarding this bug. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8156014 https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8013099
However it seems like tomcat didn't have the issue of ServiceLoader and URLClassloader. Jetty had this issue because it didn't use try block when calling the inputstream reader.
Tomcat was not having this issue.
Most of the glibc problem comes from the Inflater class, which should be calling end() method when garbage collector calls finalize() method, thus no file handler leak will be created.
Conclusion
This problem seems to be a pure glibc issue.
For non-main arena(per thread arena), multiple heaps can be assigned to the arena (non-consecutive memory space), glibc will not restrain per thread arena size so that it will keep asking for virtual memory by calling mmap(). Every time 64MB is claimed.
Main arena dont have multiple heaps and hence no heap_info structure. When main arena runs out of space, sbrk’d heap segment is extended (contiguous region) until it bumps into memory mapping segment.
After the thread finishes using the block, it gets stored inside the arena's unsorted bin and never returned back to the system because of memory fragmentation. Even though the fragmentation is freed in future, the block still stays inside the unsorted bin.
Solution
Setting
export MALLOC_ARENA_MAX=1
to the system environment variable.For previous versions, the difference between Native Memory Tracking (
770553KB
) and Process Memory Usage (991204KB
) is around 200MB
.JVM does not track this part of the memory.
This time, the difference between these values is negligible (Native Memory Tracking
768180KB
, Process Memory Usage 765832KB
).There's no more memory leak since JVM tracks all memory. Also, the performance impact of the application is negligible.
Switching to tcmalloc or jemalloc will also solve the issue.
References
- A blog post about this:Â https://blog.arkey.fr/drafts/2021/01/22/native-memory-fragmentation-with-glibc/
Â