We are seeking an expert to drive RAS (Reliability, Availability, Serviceability) strategies at the chip and system level for our advanced accelerator product line.
About Openchip’s Technology Office:
The technology office team is responsible for driving technological innovation and aligning the company’s technical strategy with business objectives. We oversee research and development (R&D) of next-generation architectures, manage collaborations with foundries and ecosystem partners, and ensure efficient execution of the technology roadmap. The team also monitors emerging trends, explores experimental technologies, and leads efforts in process node advancements. Additionally, we set technical standards, guide cross-functional teams, and promote talent development while ensuring products are optimized for market needs, sustainability, and regulatory compliance, positioning Openchip as a technology leader.
Key Responsibilities:
- Define and implement RAS strategies for accelerator products at both chip and system levels, ensuring high reliability and availability.
- Develop fault-tolerant architectures, error-detection mechanisms, and recovery protocols.
- Collaborate with hardware, firmware, and software teams to design RAS features seamlessly integrated across the stack.
- Lead failure analysis efforts and contribute to continuous improvement of RAS methodologies.
- Define metrics and performance indicators for reliability and availability, driving optimization through design and testing phases.
- Oversee the validation and verification of RAS features, ensuring compliance with industry standards.
- Provide technical guidance to cross-functional teams and mentor junior engineers in RAS practices.
- Monitor industry trends and contribute to the evolution of Openchip's RAS-related capabilities and roadmap.
Required Skills and Experience:
- Reliability Engineering Expertise: Proven experience in developing RAS features for high-performance computing systems or accelerators.
- Fault Tolerance and Error Management: Deep understanding of error detection, correction, and recovery techniques, including ECC (Error-Correcting Code) and parity mechanisms.
- System-Level RAS Design: Experience designing RAS solutions across hardware and software layers, with a focus on integration and scalability.
- Failure Analysis and Debugging: Strong skills in failure analysis, root-cause debugging, and corrective action development.
- Chip-Level Reliability: Experience with power, thermal, and process variability effects on reliability, including techniques like wear-leveling and redundancy.
- Performance Metrics Definition: Ability to define, measure, and optimize reliability and availability metrics at chip and system levels.
- Standards Compliance: Familiarity with industry RAS standards, such as JEDEC, ISO 26262, or similar.
- RISC-V Ecosystem Knowledge: Knowledge of RISC-V architecture and its implications for RAS design.
- Leadership and Collaboration: Strong ability to work with cross-functional teams, mentor engineers, and lead RAS initiatives.
- Tools and Methodologies: Proficiency in tools and techniques for reliability modeling, fault injection, and simulation.
Preferred Qualifications:
- Master’s or PhD in Electrical Engineering, Computer Engineering, or a related field.
- Experience in designing RAS features for AI/ML or data center workloads.
- Knowledge of secure RAS implementations, including cryptographic protections for fault and error recovery.
- Familiarity with advanced interconnect protocols like Infiniband or Omni-Path and their reliability implications.
Soft skills
- The candidate should be equipped with a unique skills-set: Self-starter, self-motivated, humility, excellent communications skills; outstanding human qualities as honesty, integrity, fellowship, generosity and commitment with his/her mission to change the world.
- He/she is a natural team player with the aim to create a positive impact in the society.
What do we offer?
- Join an innovative team and experience company growth.
- We believe in investing in our employees and providing them with the opportunities they need to grow and develop their careers.
- Enjoy a hybrid work environment.
- We also offer flexible schedule.
- We offer a remuneration that values your experience.
- The position will have the base in Barcelona preferably .
We are looking for outstanding people willing to join our mission to change this industry and help to build a better world.
If you feel identified with Openchip, please contact us. We can offer a competitive compensation package in a flexible work schema that will help you to keep a balance between your personal and professional life.
At Openchip & Software Technologies S.L., we believe a diverse and inclusive team is the key to groundbreaking ideas. We foster a work environment where everyone feels valued, respected, and empowered to reach their full potential – regardless of race, gender, ethnicity, sexual orientation, or gender identity.