Lead LLM Development Engineer

London, UK

Hybrid

About the Company

Our client is at the forefront of developing hyper-realistic, AI-driven content using advanced multi-modal language models. They focus on delivering seamless, real-time experiences to a diverse audience. Specialising in the creation and refinement of custom AI models, they enable highly personalised user interactions at scale by leveraging cutting-edge open-source technologies and continuously optimising for performance and reliability.

Our client is seeking a Lead LLM Development Engineer to take ownership of the technical roadmap and development of advanced large language models (LLMs). This role offers an exciting opportunity for candidates to lead cutting-edge AI projects, optimise large-scale systems, and drive innovation in multi-modal user interactions.

Job Description

This role involves overseeing the fine-tuning, optimization, deployment, and integration of open-source models, with a focus on real-time performance across multiple modalities such as text, audio, and image.

Fine-tune and optimise large-scale LLMs using custom and synthetic datasets to ensure accuracy, responsiveness, and scalability.
Develop memory-efficient model deployments on GPU platforms, managing resources effectively.
Adapt and implement additional open-source models to extend capabilities, such as image generation, safety filtering, and reasoning.
Curate, clean, and create datasets for training and refining models, ensuring relevance and diversity.
Optimise models for low-latency performance in real-time applications, ensuring seamless user interactions.
Embed robust safety and moderation controls for responsible content management.
Establish monitoring and diagnostics tools to maintain high system reliability and continuity.
Create comprehensive documentation for processes, workflows, and protocols to support scalability and team knowledge sharing.

Requirements

Strong expertise in Python for model training, fine-tuning, and pipeline development. Familiarity with Bash scripting for automation.
Extensive experience with PyTorch and libraries such as Transformers and Accelerate for managing large-scale models.
Proven experience deploying models on GPU platforms like Runpod or AWS, including proficiency in CUDA and containerization tools like Docker and Kubernetes.
Advanced skills in data preprocessing and manipulation using libraries like Pandas and Dask, with a focus on synthetic data generation.
Expertise in techniques like mixed-precision training, quantization, and memory-efficient strategies for real-time performance.
Practical experience embedding safety mechanisms such as token filters and controlled response generation.
Proficiency in tools like Prometheus, Grafana, and logging frameworks for performance monitoring.

The ideal candidate will have deep technical expertise in LLM architecture, dataset management, and scalable deployment strategies on platforms like Runpod GPU clusters.

Apply Now