Rahul Somani

Full-Stack ML Practitioner

I build multimodal AI systems that help us understand visual storytelling.

About

I spent hours analyzing a single three-minute scene from Pulp Fiction for a film-analysis class in college. Wondering if machines could help with that kind of analysis is what pulled me into programming and ML in 2019.

Six years later, my work still circles the same question: how do you teach ML systems to capture what experts rarely articulate —how a cinematographer knows a shot "works," how an editor feels when a cut lands, how a great piece of cinema moves you in ways you can't quite explain?

In a past life, I was competing in the juniors tennis circuit, ranked in the world's top 2000. I've also acted in a couple of short films (Crumpled, Normal) and composed the soundtrack for the former.

Download full resume

Experience

2020 — 2022

Built systems for long form narrative understanding, and a platform to help people discover stories. My technical work here involved conceptualizing nuanced subjective human understanding into concrete tasks we could teach ML systems. As part of this, we built custom datasets, fine-tuned hybrid multi-task CLIP/classifier models, pushed VLMs to their limits, and built the infrastructure around these to serve them to users in realtime.

Co-Founder / Head of ML @ OZU

2020 — 2022

Founding Partner @Special Circumstances

A boutique ML consultancy that built custom cinematography models, and end-to-end systems for multimodal analysis for clients like YouTube, HBO, and Michael Kors. We were invited to present our ML work at the inaugral CVEU workshop in 2021

2019 — 2020

Independent Researcher

Ever since spending 3 hours analysing a 2 minute scene from Pulp Fiction for a class in college, I'd been curious about being able to do this for all of cinema. Some choices that make great movies great are deliberate, a lot in the latent space of the brain. What's waiting on the other side of understanding cinema en masse?

Pursuing this question led me down the path of programming and machine learning.