hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Computing – thinking out loud works in progress and scattered thoughts, often about computers. See the original post here.

happy living close (-ish) to the metal

For various reasons, I’ve been doing a little bit of career introspection lately. One of the interesting realizations to come out of this is that, despite in practice doing mostly software work, I’ve been happiest when my work involved a strong awareness of the hardware I was running on.

I suppose it shouldn’t be a surprise, exactly, but I hadn’t exactly thought about it in those terms before! Before I got into computing, I got a bachelors degree in physics, and got through much of a PhD in materials science. While I wasn’t building computers directly, I was definitely working regularly on hardware, building experimental apparatus involving various combinations of vacuum chambers, lasers, exotic microscopes, custom electronics, and microfluidics.

In terms of my computing career, I’ve generally worked in the area of “high-performance computing”, a buzzword that means I’ve focused on building fast parallel systems aimed at researchers.

It’s a sub-field that lends itself to awareness of hardware: even as a new baby sysadmin, I was staring at motherboard block diagrams and thinking about the performance differences between different PCIe topologies.

And because HPC is one of the areas that took the longest to embrace cloud computing, I spent a lot of years doing work in datacenters. Most of my work would usually involve writing code, doing configuration management, and managing Linux systems… but on a regular basis I’d head into a big loud room full of air conditioners and server racks, carrying a screwdriver.

Amusingly, my relatively recent stint at a hyperscaler was the first time I had worked on computers, but didn’t have my office in the same building as the computers I was running! Even there I was at least somewhat cognizant of hardware specifics, and one of my early projects was performance testing on the Bryce Canyon storage node, to see if it was ready for use in a large-scale distributed filesystem.

And these days, at NVIDIA, I’m enjoying being even closer to the metal. (At least conceptually; I still work remote…) I spend my days thinking about datacenter requirements, cable lengths, firmware upgrades, hardware health checks, and application performance tests on large clusters. And I love getting to play with these shiny toys.

Anyway, this is just a ramble. But a useful one. While I’d be the first to admit that cloud has its place, and I use it for some personal projects, I really enjoy understanding the hardware I run on. I have trouble thinking of computers as remote abstractions with no underlying detail. They are pleasingly physical in my mind, even if they’re thousands of miles away.