2023-12-10, 10:10–10:40 (Asia/Taipei), NYCU
Profiling is widely used to analyze a program’s runtime performance characteristics. It collects performance metric such as CPU usage, function call frequencies and generates reports, graphs, or visualizations that highlight hotspots. There are two kinds of profiling tools, non-intrusive profiling and intrusive profiling. The former gathers information about the program’s behavior by periodically observe the program’s state without modifying its code, the later involves modifying the program’s code or behavior to gather performance data, including source code instrumentation or binary rewriting.
However, most of the existing profiling tools are sampling based such as linux perf, meaning that it can’t capture precise hit count of each function being executed.
In this proposal, I would like to talk about how linux perf tool gathers performance metric and how can one benefit from it to build a source code level profiling tool for a C++ application.
The optimization of software performance is a critical concern for developers, especially in HPC, where efficiency laid the basic foundation of the success of an application. To tweak a program’s performance, one can use profiling to identify areas of code that consume significant resources, such as CPU time, memory, or I/O operations.
Sampling-based profiler offers a non-intrusive method for analyzing the performance of a software application. Generally, it has lower overhead compared to instrumentation-based profilers. They regularly collect data, minimizing the impact on the application’s runtime performance. Though easy to use, sampling-based profilers sometimes miss short-lived or infrequent events, resulting in imprecise profiling result as specific code segments may be responsible for performance issues.
To make profiling more precise, one can implement a scope-based timer by placing macro at each function to record how many times a function is called and how long it takes for each function to get executed.
Modmesh, a numerical software aiming at solving partial differential equations, is a application written in C++ and python. A scope-based profiler has been implemented along with the computing engine to record the execution time and hit count of each function.
However, the scope-based profiler implemented in Modmesh only supports function level profiling, it didn’t support call path level profiling. A single function may have been executed multiple times, but it may have been called from different functions at run time.
The key to achieve precise call path profiling is the data structure to stroe call path information during runtime. Linux perf tool already provided a good example to store each sampling call stack information. I will discuss how linux perf manipulate different kinds of data structure to generate the call graph and how I borrow the its idea to implement the call path profiling in Modmesh.
No, previous knowledge expected
Language –Mandarin talk w. English slides
Quentin Tsai received a Bachelor’s degree in Computer Science from National Yang Ming Chiao Tung University, Taiwan. In 2022, he completed his Masters in Cybersecurity from National Yang Ming Chiao Tung University, Taiwan. His research interests include software testing, financial engineering and high performance computing.
Quentin is now working at Nvidia as a QA automation engineer.