
AI/ML Engineer - Search
At , we’re redefining how creative teams search, manage, and collaborate on content. While most companies apply AI to text, we’ve gone further: our platform embeds and indexes actual media files — video, image, design, and audio — enabling true semantic search across time-coded content and millions of creative assets.
Our flagship platform, Zoom MAM, is trusted by global broadcasters, agencies, and top brands including Inter MIlan FC, Merck, Mercedes Benz to power their visual workflows. We’re now rebuilding our metadata and search architecture from the ground up — using Python-based LLM pipelines, vector embeddings, and a high-performance object store designed for AI-native media search.
This isn’t just “chat with your documents.” It’s AI that understands a scene, shot, logo, or layout — and finds the right clip, version, or layout at the speed of thought.
What You’ll Own
Build and extend backend services that power AI-driven media search and metadata enrichment
Develop, integrate, and deploy AI/ML inference pipelines (embeddings, vision/audio models, transcription, background removal, etc.)
Fine-tune and optimize computer vision and generative models (e.g., U²Net, BiRefNet, CLIP, Whisper, YOLO, diffusion models)
Work with large datasets (100k–5M images): preprocessing, augmenting, and structuring for training/inference
Contribute to building pipelines for tasks like background removal, inpainting/outpainting, banner generation, logo/face detection, and multimodal embeddings
Integrate with vector databases (e.g., FAISS, Pinecone, Weaviate, Qdrant) for similarity and semantic search
Collaborate with the engineering team to deploy scalable AI inference endpoints (Docker + GPU/EC2/SageMaker)
Skills & Experience We Expect (2–3 Years)
Core Python (Required) – solid programming and debugging skills in production systems
AI/ML Libraries – hands-on experience with PyTorch and/or TensorFlow, NumPy, OpenCV, Hugging Face Transformers
Model Training/Fine-Tuning – experience fine-tuning pre-trained models for vision, audio, or multimodal tasks
Data Handling – preprocessing and augmenting image/video datasets for training and evaluation
Vector Search – familiarity with FAISS, Pinecone, or similar for embeddings-based search
Comfortable with chaining or orchestrating multimodal inference workflows (e.g., image + audio + OCR → unified embedding)
Bonus Points If You…
Have worked with generative models (diffusion, inpainting, or outpainting)
Understand large-scale media workflows (video, design files, time-coded metadata)
Enjoy experimenting with new models and pushing them into production
Care about making AI useful in real-world creative pipelines
Do you like this job?
About the company
Similar Remote Jobs
- Opened 9 days ago Featured Job Remote Job
- Opened 9 days ago Featured Job Remote Job
- Opened 9 days ago Featured Job Remote Job
- Closes in 11 days Featured Job Remote Job
- Closes in 4 days Featured Job Remote Job