Skip to content

Add torch profiler support#245

Open
akaitsuki-ii wants to merge 1 commit into
v1from
dev/add-torch-profiler
Open

Add torch profiler support#245
akaitsuki-ii wants to merge 1 commit into
v1from
dev/add-torch-profiler

Conversation

@akaitsuki-ii
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a TorchProfiler utility and integrates it into the engine and worker modules to enable distributed profiling. The engine's worker communication logic was refactored to support multi-rank responses. Feedback suggests improving the portability of the trace compression by checking for the gzip utility, ensuring returned file paths accurately reflect whether compression succeeded, and removing a redundant variable initialization in the worker loop.

Comment on lines +63 to +69
try:
subprocess.Popen(["gzip", "-f", json_file])
logger.info(f"[Rank {rank}] Triggered background compression for {json_file}")
# Update variable to point to the eventual file
json_file = f"{json_file}.gz"
except Exception as compress_err:
logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The background compression relies on the system gzip utility, which may not be available in all environments (e.g., Windows or minimal containers). If gzip is missing, subprocess.Popen will raise a FileNotFoundError. While this is caught, the return values of start and stop will still incorrectly indicate a .gz extension. Consider checking for gzip availability using shutil.which and adjusting the returned file path accordingly, or use the built-in gzip module in a separate thread for better portability.

Suggested change
try:
subprocess.Popen(["gzip", "-f", json_file])
logger.info(f"[Rank {rank}] Triggered background compression for {json_file}")
# Update variable to point to the eventual file
json_file = f"{json_file}.gz"
except Exception as compress_err:
logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}")
import shutil
if shutil.which("gzip"):
try:
subprocess.Popen(["gzip", "-f", json_file])
logger.info(f"[Rank {rank}] Triggered background compression for {json_file}")
json_file = f"{json_file}.gz"
except Exception as compress_err:
logger.warning(f"[Rank {rank}] Background gzip failed to start: {compress_err}")
else:
logger.warning(f"[Rank {rank}] gzip utility not found, skipping compression")

cls._profiler.start()

# Return the expected final path
return f"{trace_path_template}_rank{rank}.json.gz"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method returns a .gz path regardless of whether compression actually started or succeeded. This can lead to issues if the caller expects the file to exist at the returned path. It would be more robust to return the actual path based on the success of the compression trigger.

world_group = get_world_group()

while True:
should_reply = rank == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable should_reply is initialized here but is immediately overwritten a few lines later (line 118) before being used. This initialization is redundant and can be removed to improve clarity.

Suggested change
should_reply = rank == 0
try:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant