HPC Systems Administrator Job at Vector Institute, Toronto, ON

K1dwaUgrOWhsQkhlRUhNSHVFclI4WC85M1E9PQ==
  • Vector Institute
  • Toronto, ON

Job Description

Position Summary

The Vector Institute is seeking an HPC Systems Administrator to join our growing team in Toronto as we continue the work of making Canada a centre of expertise for AI in the world.

The incumbent in this role will participate in the building and maintenance of High-Performance Computing environments for world-class research in Machine Learning.

As a member of the Scientific Computing team, the role will share responsibility for managing servers, networks, storage, and security for the High-Performance Computing infrastructure, as well as provide support for the office local area network, servers and scientific computing workstations. The role will also perform installation and maintenance of server and AI & machine learning layered software to support our 1000+ researchers and affiliates.

We are seeking a highly motivated System Administrator with a hands-on, problem-solving approach to managing and troubleshooting high-tech environments. The role will be a combination of remote, on-site at the office, and at our co-location facility as required.

Key Responsibilities

  • Support the Vector HPC systems formed by more than 250+ node/10,000+ core/1,200+ GPU/and growing HPC compute clusters;
  • Support our GPU-enabled workstation office environment;
  • Provide guidance and support to our research community;
  • Develop and maintain solutions for automatic installation and configuration of infrastructure;
  • Perform hardware and software system upgrades and maintenance;
  • Install new scientific software, libraries, on servers, workstations, or laptops, in a variety of operating systems (Linux, Mac OS, Windows);
  • Support researchers in all their computing needs;
  • Maintain network infrastructure and assist users;
  • Maintain system security: firewall, IPS, system logs; and,
  • General enterprise IT operations.

KEY SUCCESS MEASURES

  • Ensures the smooth functioning of the research systems, by undertaking troubleshooting, maintenance and installation tasks;
  • Researchers and the enterprise operations feel supported in all other computing needs;
  • Builds and maintains tools that facilitate the automated or direct administration of network and computing infrastructure, both locally and on the cloud.

PROFILE OF THE IDEAL CANDIDATE

  • Degree or diploma in computer science or engineering, or equivalent, or more than three (3) years of proven, hands-on experience: Linux/UNIX systems administration preferably in a research environment (e.g., Ubuntu, RedHat, CentOS)
  • Hands-on experience in managing an HPC grid, Slurm, or equivalent scheduler;
  • Proven programming/scripting skills as it pertains to systems administration;
  • Managing and troubleshooting environments using mostly open-source software;
  • Demonstrated ability to learn quickly;
  • Demonstrated ability to prioritize tasks and resolve problems in a timely manner;
  • Ability to work autonomously, multi-task and work in a fast-paced and stressful environment;
  • Being proactive, addressing potential problems before they occur;
  • Possessing a strong attention to detail;
  • Having a problem-solving outlook;
  • Excellent verbal and written communication skills.

Qualifications And Experiences Below Are Considered An Asset

  • Hands-on experience in managing HPC workload management systems such as, Slurm, SGE, Moab/Torque, or equivalent scheduler;
  • Experience supporting large scale-out storage infrastructure technologies (SAN/NAS) and a good understanding of file systems such as ZFS and GPFS;
  • Good understanding of high speed internetworking technologies such as 100GE, Infiniband, link aggregation;
  • Good understanding of and experience with data management at scale, including performance, backups, archive, and monitoring;
  • Experience maintaining application tools and databases e.g., MySQL, PostgreSQL;
  • Experience with open source infrastructure systems such as openLDAP, NFS, openZFS, 2FA systems.

At the Vector Institute, we are committed to driving excellence and leadership in Canada’s knowledge, creation, and use of AI to foster economic growth and improve the lives of Canadians. We strive for greater inclusion in the programs and culture that we build by welcoming and encouraging applications from all qualified candidates. This includes, but is not limited to, applicants who are Indigenous, 2SLGBTQIA+, racialized persons/visible minorities, women, and people with disabilities.

If you require an accommodation at any point throughout the recruitment and selection process, please contact hr@vectorinstitute.ai and we will happily work with you to meet your needs.

Job Tags

Work at office, Local area, Remote work,

Similar Jobs

Hilton

IT Supervisor (Shift Leader) @ Hilton Okinawa Miyako Island Resort Job at Hilton

With thousands of hotels in over 100 countries and territories, Hilton offers countless opportunities to delight. From an open door to a welcoming smile and an exceptional experience, we offer the millions oftravellerswho stay with us ev...

Barclays

Investment Banking Graduate Programme 2027 Tokyo Job at Barclays

 ...ulated by FSA Join the world of banking Every day, corporations worldwi...  ...nd idea generation external and internal stakeholder communication. A ...  ...n the day-to-day activities of the investment banking division. Gain broad expos... 

Host Healthcare

Local Contract Case Manager - Utilization Review - $50-54 per hour Job at Host Healthcare

 ...Host Healthcare is seeking a local contract nurse RN Case Management for a local contract nursing job in Melrose, Massachusetts. Job Description & Requirements ~ Specialty: Case Management ~ Discipline: RN ~ Start Date: 10/10/2025 ~ Duration: 13 weeks... 

Alberta Health Services

Pharmacist I Job at Alberta Health Services

Your Opportunity: Pharmacists at Northern Lights Regional Health Centre (NLRH) are an integrated member of the clinical team. Pharmacists provide direct patient care completing Best Possible Medication Histories, attending patient care rounds, providing patient education... 

株式会社フィールドサーブジャパン 大阪支店

バッグ販売スタッフ Job at 株式会社フィールドサーブジャパン 大阪支店

 ...15() ZERO HALLIBURTON()POINT...