Shot Type Classification

Independent Research - 2019

An introduction to the visual language of cinema: what it is and why it matters, and how neural networks can can be used to analyse one specific aspect of this language: shot framing

Read the full technical write-up with detailed methodology, dataset breakdown, and results in the comprehensive blog post.


Abstract

Analysing cinema is a time-consuming process. In the cinematography domain alone, there's a lot of factors to consider, such as shot scale, shot composition, camera movement, color, lighting, etc. Whatever you shoot is in some way influenced by what you've watched. There's only so much one can watch, and even lesser that one can analyse thoroughly.

This is where neural networks offer ample promise. They can recognise patterns in images that weren't possible until less than a decade ago, thus offering an unimaginable speed up in analysing cinema. I've developed a neural network that focuses on one fundamental element of visual grammar: shot types. It's capable of recognising 6 unique shot types, and is ~91% accurate.



Extreme Close-Up example

Extreme Close-Up

Close-Up example

Close-Up

Medium Close-Up example

Medium Close-Up

Medium Shot example

Medium Shot

Long Shot example

Long Shot

Extreme Wide Shot example

Extreme Wide Shot