hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   VanessaSaurus dinosaurs, programming, and parsnips. See the original post here.

Things that are Hard

I saw a funny tweet on Twitter the other night - it was someone from a large consumer company sharing their vision for “the next generation shopping experience” and it was a virtual person walking through a supermarket aisle and reaching out to pick up a bottle of wine. I can’t find the specific tweet, but it said something to the effect of:

Nobody asked for this. Stop making stuff to solve problems that people don’t have

My dear reader, it me! 😲️ This message hit me really hard, because I am definitely one to build niche tools for use cases that likely don’t exist but seem fun or interesting to me. I also feel pretty disconnected from communities that are innovating and testing new ideas.

What is hard?

This is a problem that a lot of us have. We build things that nobody needs. We need to focus more on the problems that people are actually facing. I would also scope that to developer workflows, which includes automation, testing, and development. Since I have a nice view into my own mental space, here is my list of things that are hard.

  1. When I am trying to develop software and I can't open an interface with the code and environment I need
  2. That my main interaction with a resource is via SSH
  3. When a workflow or even container works in one place but not another
  4. When I need to develop, build in CI, push to a registry, and pull. One mistake? Start from scratch
  5. When I need to run a job and I have to interact with a job manager and it's hard and annoying
  6. Logging or monitoring means looking at text files with cryptic names
  7. Automated testing on HPC is not a thing. Build on GitHub and pray.
  8. When I can't easily navigate code, documentation, or it's completely missing
  9. When I set up everything the way I like it and I have to login to a new system and do it all over again
  10. When I want to develop something that uses a cluster resource but there are no exposed APIs.
  11. When it's impossible to compare between systems because they are special snowflakes
  12. When I can't easily test across the systems that my software is intended for.
  13. To scale anything I have to use a job manager, wait hours, and then again if there is one mistake
  14. When it takes 2 hours or more to get a node allocated
  15. When I can't really make tools for HPC because I'm trying to find workarounds for all these problems

And I’d add a “thing that is annoying” to be this obsessive focus on power and scale, in a competitive sense, and this race to be in the top 500 and beat the other guy over all else. The constant need to rebuild clusters means we never focus on the details of how we use them. We do the same things over again. I’ve mentioned these things before, possibly many times, but I need to point it out again.

Our current developer environments are more like handcuffs than places we are enabled to thrive.

The reality for me is that I tend to put myself in a new role or environment, and then think of lots of cool ways to extend a particular tool, or build something before it. This is why I’ve made a ton of visualizations, associated tools, or posts for spack - it’s literally just the thing that is right in front of me. Put something else in front of me, such as an entire infrastructure with APIs, and I’d do the same. So why can’t a nice set of developer tools be available for the resources I’m supposed to be using?

Develop based on specific problems

I think I want to develop more focusing on these problems. Don’t get me wrong - I’ll definitely keep making silly projects. But my vision for the future needs to be oriented toward these pains. These in particular are the problems that I think our community needs to look at, at least given this developer perspective. I say this because I’ve seen and used the dark side - having free rein of beautiful cloud APIs to let me automate to my heart’s content! I only now, without being a part of some cloud or container cluster deployed project, am aware that I don’t have access to these development tools. What is my solution now? I largely don’t ssh into an HPC cluster until I absolutely have to - either to scale something, or reproduce a workflow on GitHub actions that works there (but then is really challenging to get it working on the cluster resource). Indeed this entire thread resulted after a frustrating evening of exactly that.

What isn’t helpful? What isn’t helpful is telling me “This center / place / person has this thing that has solved this problem.” Can I easily access it, and what about the entire research software engineering community? This kind of response shuts down the conversation and makes the developer (myself for example) realize that the person I’m talking to is not interested in thinking about how to inspire change. I’ve been really frustrated recently with mentioning even an abstract idea, and getting shut down that “Oh that sounds like this other tool.” For a project to reach this “mention status” it needs to be easy to install or use, and not have a barrier of privilege that you have to work at a certain place or know people. Telling me that there is a solution that requires some convoluted steps and permissions not only implies that it is only available to those in privilege, but that the solution is not well adopted enough or shared enough to be truly a solution for our community.

Inspiring Vision

If we aren’t happy with the current state of the world, what are our options? Well, we could leave our current roles to find another state that is more similar to what we want. Personally speaking, I haven’t hit that point quite yet. I want to try my hardest to formulate a vision for how I want the world to be, and then find opportunity to work on it from where I am. The wisdom here is that no specific role is perfect, and optimally we should place ourself somewhere where there are resources and open mindedness for change. it’s up to us to extend our influence as best we can to help drive some possible future. If you try that and fail? At least you tried.

These are the things that are hard. I am going to try harder to have them be the focus of my thinking about the future. I want to make them easier. I’m starting to realize that possibly the reality is that I should think beyond the constraints of HPC, and more toward the kind of infrastructure that I want, and then figure out how to slowly integrate it as a part of our culture too. We can start with a core vision for a future that we want, and then slowly build up tooling and community around that.

Happy Friday, friends!