Job Title: Principal Architect - LLM Agents & Multi-Agent Frameworks
Exp: 14+ yrs
Location: Hyderabad/Chennai
Summary: We are looking for a visionary Principal Architect to lead the design, development, and deployment of AI-powered solutions. This role is crucial in shaping our platform's future by architecting intelligent agents and multi-agent frameworks using the latest advancements in Large Language Models (LLMs). The ideal candidate will have over 14 years of software engineering experience, with at least 5 years focused on AI/ML. They will possess deep expertise across the full stack, including frontend frameworks like React and Angular, backend technologies like Node.js and Python, and cloud platforms such as AWS, Azure, or GCP. Responsibilities include leading architectural design, integrating AI agents with core systems, developing APIs, and collaborating with data scientists to fine-tune LLMs. The role also involves establishing ML Ops practices, designing data pipelines, and mentoring junior engineers. Strong problem-solving skills, excellent communication, and a passion for continuous learning are essential.
Responsibilities:
· Lead Architectural Design:
o Define and evolve the overall architecture for LLM-powered agents and multi-agent systems that optimize agent economics over time.
o Design and implement robust, scalable, and maintainable microservices architectures.
o Ensure seamless integration of AI Agents with other core systems and databases.
o Oversee the development of APIs and SDKs for internal and external consumption.
· Full-Stack Expertise:
o Possess deep expertise across the full stack, including:
o Frontend: React, Angular, Vue.js, or similar frameworks.
o Backend: Node.js, Python (with frameworks like Flask, Django, or FastAPI), Java, or other relevant languages.
o Database: SQL (MySQL, PostgreSQL), NoSQL (MongoDB, Cassandra), and experience with database design, optimization, and management.
o Cloud Platforms: Any 02 out of AWS, Azure, GCP (experience with serverless computing, containerization, and cloud-native technologies is a must).
· Model Building & Fine-tuning:
o Collaborate with data scientists and ML engineers to fine-tune and optimize LLMs for specific tasks and domains.
o Develop and implement robust model evaluation and monitoring systems such as Arize, LangSmith etc.
o Stay abreast of the latest advancements in LLM research and development.
o Hands-on experience with agent frameworks like Autogen, AWS Agent Framework, LangGraph etc.
· Prompt Engineering & LLM Integration:
o Develop and refine effective prompting strategies to maximize the performance of LLMs.
o Design and implement mechanisms for safe and reliable LLM integration.
o Address challenges related to bias, hallucinations, and other potential LLM limitations.
· ML Ops & Observability:
o Establish and maintain robust ML Ops practices, including CI/CD pipelines, model versioning, and experiment tracking.
o Implement comprehensive monitoring and observability solutions to track model performance, identify anomalies, and ensure system stability.
· Data Engineering:
o Design and implement data pipelines for efficient data ingestion, transformation, and storage.
o Ensure data quality and security throughout the data lifecycle.
· Team Leadership & Mentorship:
o Guide and mentor junior engineers in best practices for development and deployment of agents.
o Foster a culture of innovation, collaboration, and continuous learning within the team.
Qualifications:
- Proven Experience: 14+ years of experience in software engineering with a strong focus on AI/ML for at least 05 years.
- Deep LLM Expertise: Demonstrated expertise in building and deploying applications powered by LLMs (e.g., GPT, Claude 3.5).
- Full-Stack Proficiency: Strong hands-on experience with frontend, backend, and cloud technologies.
- Architectural Skills: Proven ability to design and implement complex, scalable, and maintainable architectures.
- Data Engineering Skills: Experience with data pipelines, data warehousing, and data analysis.
- ML Ops & Observability: Experience with ML Ops best practices, including CI/CD, model versioning, and monitoring.
- Communication & Collaboration: Excellent communication, collaboration, and leadership skills.
- Strong Problem-Solving & Analytical Skills: Ability to analyze complex problems and develop innovative solutions.
- Continuous Learning: Passion for learning and staying up to date with the latest advancements in AI/ML.