C
Hustler

Hustler

 • 

31 Messages

 • 

1.7K Points

Tuesday, December 12th, 2023 10:24 AM

Closed

Seeking Help with High-Performance Computing Issue

Hello Everyone,

I hope this message finds you well. I'm reaching out to the community because I'm currently facing an issue with my high-performance computing setup, and I'm seeking your expertise and advice.

Here's a brief overview of my situation:

I have been using a Lenovo high performance computing (HPC) system for the past 18 months, and everything has been running smoothly until recently. I primarily use it for running complex simulations using software like Lenovo Intelligent Computing Orchestration (LiCO). However, I've noticed a significant drop in performance, and I'm struggling to pinpoint the root cause.

Here are some details about my setup:

  • Hardware Configuration:

    • CPU: Lenovo ThinkSystem SR650 with Intel Xeon Gold 6248R
    • GPU: NVIDIA A100 Tensor Core GPU
    • RAM: 128GB Lenovo TruDDR4
    • Storage: 2TB Lenovo PCIe NVMe SSD
  • Software Environment:

    • Operating System: Lenovo Cloud Enabled Intelligent Computing OS
    • Lenovo Intelligent Computing Orchestration (LiCO)

I've noticed a significant drop in performance during my simulations, with runtimes taking much longer than usual. I've also observed occasional system freezes during these tasks. I've already tried basic troubleshooting steps, such as updating drivers and ensuring the system is free from malware or unwanted processes, but unfortunately, the problem persists.

Specifically, I'm interested in hearing about:

  1. Any common pitfalls or issues associated with high-performance computing setups.
  2. Recommended tools or methods for performance monitoring and debugging.
  3. Tips for optimizing performance in [mention specific applications or tasks].

If anyone has any insights or suggestions, I would greatly appreciate your input. Your expertise could be immensely helpful in getting my high-performance computing setup back on track.

Thank you in advance for your time and assistance. I look forward to learning from the community's collective knowledge and experience.

Best regards,

Charlie

Accepted Solution

 Superstar

 • 

311 Messages

 • 

7.6K Points

8 months ago

"I've already tried basic troubleshooting steps"

Did you try disabling or uninstalling TM to see if it is related???

If during your "high end" simulations , you are not connected to internet. why do you need an antivirus active at that time????

Brand User

Trend Security Expert

 • 

169 Messages

 • 

3.4K Points

8 months ago

Hi@charliekthrn ,

It's crucial for these different components to function at consistent speeds and performance levels. If they fail to maintain synchronization, high-performance computing (HPC) becomes unattainable, and the entire system will encounter issues. In setting up this kind of system, you should consider that the servers must efficiently handle data ingestion and processing from the storage components, while these components should be capable of efficiently providing data to the servers to facilitate HPC. Similarly, the networking components must enable seamless, high-speed data transfer between the other elements.

Hope this helps! Let me know if you have any other insights regarding this.

Need Help?

Ask the Community

Latest Tech Insights

Loading...