Add torch profiler support#245
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a TorchProfiler utility and integrates it into the engine and worker modules to enable distributed profiling. The engine's worker communication logic was refactored to support multi-rank responses. Feedback suggests improving the portability of the trace compression by checking for the gzip utility, ensuring returned file paths accurately reflect whether compression succeeded, and removing a redundant variable initialization in the worker loop.
| try: | ||
| subprocess.Popen(["gzip", "-f", json_file]) | ||
| logger.info(f"[Rank {rank}] Triggered background compression for {json_file}") | ||
| # Update variable to point to the eventual file | ||
| json_file = f"{json_file}.gz" | ||
| except Exception as compress_err: | ||
| logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}") |
There was a problem hiding this comment.
The background compression relies on the system gzip utility, which may not be available in all environments (e.g., Windows or minimal containers). If gzip is missing, subprocess.Popen will raise a FileNotFoundError. While this is caught, the return values of start and stop will still incorrectly indicate a .gz extension. Consider checking for gzip availability using shutil.which and adjusting the returned file path accordingly, or use the built-in gzip module in a separate thread for better portability.
| try: | |
| subprocess.Popen(["gzip", "-f", json_file]) | |
| logger.info(f"[Rank {rank}] Triggered background compression for {json_file}") | |
| # Update variable to point to the eventual file | |
| json_file = f"{json_file}.gz" | |
| except Exception as compress_err: | |
| logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}") | |
| import shutil | |
| if shutil.which("gzip"): | |
| try: | |
| subprocess.Popen(["gzip", "-f", json_file]) | |
| logger.info(f"[Rank {rank}] Triggered background compression for {json_file}") | |
| json_file = f"{json_file}.gz" | |
| except Exception as compress_err: | |
| logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}") | |
| else: | |
| logger.warning(f"[Rank {rank}] gzip utility not found, skipping compression") |
| cls._profiler.start() | ||
|
|
||
| # Return the expected final path | ||
| return f"{trace_path_template}_rank{rank}.json.gz" |
There was a problem hiding this comment.
| world_group = get_world_group() | ||
|
|
||
| while True: | ||
| should_reply = rank == 0 |
No description provided.