Distributed video processing at Facebook scale

Video Pipelining DAG


1. What will the text be about?

How Facebook ingests video in a scalable, available way

2. What questions am I looking to answer?
  • How kind of scale does Facebook experience?
  • How does it compare to other online video companies?
  • What unique solutions have they come up with that I am not yet familiar with?
  • What principles are they guided by?
3. What do I know about the topic already?

Facebook consumes a lot of video

4. Skim the text to get an idea for the structure of the text
  • Why does FB need to roll their own video processing?
  • What applications need to use this pipeline?
  • How FB used to process video
  • How FB processes video now

During reading

Convert headers and sub-headers into questions. After a meaningful chunk, take the time to summarize the major points

If I had to build a video processing pipeline, what would I do?
  • AWS - Elastic Transcoder, S3, and lambda in less than a day
  • GCP
  • Azure Media Services

Provided you are not Facebook, where you are handling tens of millions of uploads a day and growing fast, you may not have to RYO (roll your own)

How does Facebook view the role of video in its core business?
  • FB handles 8 billion views per day on average
  • FB has ~15 applications that have a video component

    • Facebook video posts
    • Messenger videos
    • Instagram stories
    • 360 videos
  • Many of these apps also employ app-specific computer vision extraction and speech recognition
  • A video upload processing graph contains ~153 nodes
  • Messenger videos have the shortest processing pipeline to maximize real-time experience - 18 tasks
  • Instagram videos have 22 tasks
  • 360 videos have thousands of tasks
  • The metric these video teams hold high is time to share
  • Time to share - How long does it take from when a person uploads (starts uploading) a video, to when it is available for sharing?
  • This leads to three major requirements for the video processing pipeline at Facebook:

    1. Low latency
    2. Flexibility to support a range of different applications (with custom processing pipelines)
    3. Handle system overload and faults
How did Facebook used to process video?
  • Using MES - the monolithic encoding script
  • Batch oriented sequential processing pipeline
  • Very difficult to define dynamic pipelines of custom extraction and transformation per application
  • Poor time-to-share because all videos were treated the same
  • batch is the enemy of latency
What does SVE stand for?
  • The central idea is to process tracks streaming (strictly speaking mini-batches) in parallel
  • In what ways is SVE parallel

    1. It overlaps uploading and processing because they need not be dependent
    2. It splits videos into many smaller chunks to transcode
    3. The storage - with replication for fault tolerance
  • Results in a 2.3-9.3x improvement in time-to-share depending on the application and other factors
  • Leverages the client device to address point 2 above
How can we design video processing pipelines based on DAGs?
  • By abstracting videos into tracks

    1. Video
    2. Audio
    3. Metadata
  • Dynamic definitions of video processing pipelines are difficult to define in AWS, GCP, and Azure
How are fault tolerance and retries addressed?
I will just highlight here the retry policy on failure. 
A failed task will be tried up to 2 times locally 
on the same worker, then up to 6 more times on another 
worker – leading to up to 21 execution attempts 
before finally giving up.

Post reading

Generate a few questions that you'd ask someone else to gauge how well they comprehended this text

  1. How kind of scale does Facebook experience?
  2. How does it compare to other online video companies?
  3. What unique solutions have they come up with that I am not yet familiar with?
  4. What principles are they guided by?


original post

Next: What's wrong with SBT?