Senior Software Engineer - C++

  • Indefinido
  • Tiempo completo
  • Híbrido (08034, Barcelona, Barcelona/Catalunya/Espanya, España)
  • SOFTWARE

The Role:

The Senior HPC Engineer at Openchip will design and implement high-performance infrastructure to accelerate the training and serving of AI models across heterogeneous hardware backends. This role is ideal for someone who understands the deep mechanics of modern compute systems - memory bandwidth, threading models, instruction pipelines - and knows how to squeeze every drop of performance out of them. You will work closely with AI optimization and compiler teams to build the low-level, CUDA/C++ driven components that sit at the core of the AI OS runtime and toolchain. Your contributions will directly influence how efficiently we deploy large-scale models across GPUs, CPUs, and custom accelerators. You will report directly to the AI Model Optimization Lead and be a key driver in making secure AI practical, performant, and production-ready.


Key Responsibilities:

  • Design and implement high-throughput, low-latency components for AI model training and inference using CUDA and C++.
  • Profile, analyze, and eliminate bottlenecks in compute, memory, and threading across multi-GPU and CPU environments.
  • Collaborate with compiler, optimization, and runtime teams to build backend components that integrate with TVM, MLIR, and/or ONNX Runtime stacks.
  • Develop performant kernels, data movement routines, and scheduling mechanisms tailored to real-world AI workloads (e.g. LLMs, transformers, sparse ops).
  • Contribute to the development of internal performance libraries and execution runtimes that abstract heterogeneous hardware backends.
  • Drive improvements to build tooling (e.g. CMake), benchmarking pipelines, and system-level validation harnesses.


Qualifications:

  • More than 7 years of experience in C++ systems or HPC engineering; strong familiarity with C++14 or newer.
  • Deep experience writing and optimizing CUDA kernels; comfortable with warp-level programming and memory hierarchy tuning.
  • Strong knowledge of systems-level performance: cache coherence, memory paging, NUMA architectures, and multithreaded programming models (OpenMP, pthreads, etc.).
  • Demonstrated experience debugging and profiling performance using tools like Nsight, nvprof, perf, or VTune.
  • Comfortable working close to hardware, including on Unix/Linux systems with low-level tooling and build systems (CMake, Make).
  • Experience with ML runtimes (e.g., TensorRT, TVM, XLA, or IREE) or scientific computing stacks (BLAS, cuDNN, etc.) is a plus.
  • Familiarity with Rust or Python FFI/bindings (e.g., pybind11, pyo3) is a plus.


Soft Skills:

  • Obsessed with performance and deeply motivated to push systems to their limits.
  • Collaborative engineer who thrives at the intersection of hardware and AI.
  • Comfortable owning technically complex systems and mentoring others in low-level software practices.
  • Communicates clearly across engineering disciplines—from compiler internals to model optimization.


What We Offer:

  • Join an innovative team and experience company growth.
  • We believe in investing in our employees and providing them with opportunities for growth and career development.
  • Work in a hybrid environment with flexible scheduling.
  • We offer a remuneration package that values your experience.
  • A chance to work on one of the most transformative AI and silicon engineering companies in Europe.
  • The position will be based Barcelona (Spain).


We are looking for outstanding people willing to join our mission to change the silicon industry and help build a better world. If you feel identified with Openchip, please contact us.

At Openchip & Software Technologies S.L., we believe a diverse and inclusive team is the key to groundbreaking ideas. We foster a work environment where everyone feels valued, respected, and empowered to reach their full potential—regardless of race, gender, ethnicity, sexual orientation, or gender identity.