MP4文件格式解析

這篇具有很好參考價值的文章主要介紹了MP4文件格式解析。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

寫在前面的

讀完本文你將收獲：

知道如何讀取一個mp4文件的基本信息。
知道如何從一個mp4文件中分離對應(yīng)的視頻數(shù)據(jù)或者音頻數(shù)據(jù)的基本原理。
知道播放一個mp4文件時seek到指定時間，如何在mp4文件中查找到對應(yīng)的媒體數(shù)據(jù)。

MP4文件概述

一個mp4文件通常由音頻和視頻兩部分組成（當(dāng)然有些還包含字幕和一些自定義的信息），音頻或視頻數(shù)據(jù)：一段在時間上相關(guān)聯(lián)的sample序列（sample：對于視頻而言是一幀壓縮后的圖像數(shù)據(jù)（如264/h265數(shù)據(jù)包），對于音頻而言是：一小段語音信號采樣編碼后的數(shù)據(jù)（aac編碼數(shù)據(jù)包）），我們將這種sample序列定義為一個track，于是就有了video track和audio track的說法。

一段數(shù)據(jù)協(xié)議通常由兩部分組成：HEADER + DATA，其中Header一般具有一種固定的格式，它的作用是描述其后的Data部分。例如我們熟知的TCP/IP協(xié)議，MP4協(xié)議也不外如是。mp4協(xié)議定義了一種稱之為Box的數(shù)據(jù)結(jié)構(gòu)，一個mp4文件就是一個個Box拼接而成，如下圖所示。解析MP4文件其實就是對Box的讀取和解析。
MP4文件格式解析

Box的基本結(jié)構(gòu)

Basic Box

Box結(jié)構(gòu)其本質(zhì)上也是HEADER+DATA的模式，如下圖所示：
MP4文件格式解析
其中前4個字節(jié)用來表示Box的大小(size)（包括了header和data），緊接著4個字節(jié)用來表示Box類型(type)，由于size字段4個字節(jié)的限制，當(dāng)其后的data內(nèi)容很大時，可能會超出size字段所能表示的范圍，此時size字段將會被設(shè)置為1，如果一個Box的size == 1，則會在type字段后補充8個字節(jié)的空間用于表示該Box的大小；size字段還有可能是0，它表示該Box的內(nèi)容一直到文件結(jié)束。對于有些box，可能還會有一些擴展數(shù)據(jù)存放入header中，一般用16個字節(jié)來存放，header數(shù)據(jù)后跟著的就是Box的data部分，其大小是header中size字段或者largesize字段所表示的大小減去其header自身所占用的大小。

下面是標準協(xié)議文檔對Box的語法定義

aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) {
   unsigned int(32) size;
   unsigned int(32) type = boxtype;
   if (size==1) {
      unsigned int(64) largesize;
   }  else if (size==0) {
      // box extends to end of file
   }
    if (boxtype==‘uuid’) {
    	unsigned int(8)[16] usertype = extended_type;
    } 
}

MP4協(xié)議文檔中定義了很多種類型的Box，從上面的語法中可看出，除去一些超大的Box和一些用戶擴展的Box除外，很多Box基本結(jié)構(gòu)其實如下圖所示。
MP4文件格式解析

Full Box

基于上面的Box結(jié)構(gòu)，標準協(xié)議在其Header基礎(chǔ)上擴充了兩個字段：1個字節(jié)的version字段和3個字節(jié)的flags字段；我們將擴充后的Box稱之為Full Box，而將沒有擴充這兩個字段的Box稱之為Basic Box, 下面是標準協(xié)議對Full box的定義：

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f)
   extends Box(boxtype) {
     unsigned int(8) version = v;
     bit(24) flags = f;
}

MP4文件格式解析

注：這里提到的Basic Box和Full Box指的是從數(shù)據(jù)結(jié)構(gòu)上的一種分類，而下文中提到的不同類型的Box是根據(jù)Box中的boxtype字段的不同做的區(qū)分，這些不同種類的Box在結(jié)構(gòu)上屬于Basic Box或Full Box的一種，這里不要混淆這兩個概念。

雖然mp4協(xié)議文檔中定義了很多種類型的Box，但對于一個mp4文件而言，只需要用到其中的必需的幾種box即可，其他的box是為了滿足某些特定需求的場景，用戶可以根據(jù)需求選擇性插入。讓我們通過工具來概覽一下一個mp4文件都需要用到哪些必須的Box吧。
MP4文件格式解析
圖上展示了多個不同類型的Box，對于一個mp4而言只需要fytyp,moov,mdat這三個Box即可，其中uuid box屬于用戶自定義box,并不是必須的box。上圖中我們還看到moovbox和其他的box似乎稍有不同，它可以展開，是的，對于某些box，其data部分可能是由一個或者多個其他種類的box組成，我們也將這種box稱之為Container Box（容器Box），而將組成其Data的Box稱為其子Box。

下面讓我們來對一個mp4文件中的必需box做一個簡要分析吧，看看這些Box到底具備什么功能，又是怎樣定義的。

MP4 Box解析

File Type Box

類型：ftyp
父容器：File (表示Box沒有嵌套，直接位于文件層，屬于最頂層的box）
是否必須有：yes

數(shù)量：有且僅有一個

aligned(8) class FileTypeBox extends Box(‘ftyp’) {
   unsigned int(32)  major_brand;
   unsigned int(32)  minor_version;
   unsigned int(32) compatible_brands[];
}

major_brand: 因為兼容性一般可以分為推薦兼容性和默認兼容性。這里 major_brand 相當(dāng)于是推薦兼容性。一般而言都是使用 isom 這個萬金油即可。如果是需要特定的格式，可以自行定義。
minor_version: 指最低兼容版本。
compatible_brands: 和 major_brand 類似，兼容協(xié)議，通常是針對 MP4 中包含的額外格式，比如:AVC，AAC 等, 相當(dāng)于的音視頻解碼格式，每四個字節(jié)為一組。

ftyp box通常位于mp4文件的起始位置，上圖是某一個mp4文件的二進制數(shù)據(jù)，由上圖可知該box的大小占用24個字節(jié)，type是ftyp, 根據(jù)協(xié)議定義，ftyp是一個Basic Box，因此除去頭部的8個字節(jié)剩余16個字節(jié)的內(nèi)容為其data部分：解釋如下
```
69 73 6F 6D (ASCII碼)==> major_brand :isom
00 00 00 01 ==> minor_version :1 
compatible_brands數(shù)組(4字節(jié)為一個Item),因此有兩個
69 73 6F 6D ==> isom
61 76 63 31 ==> avc1
```

Movie Box

類型：moov
父容器：File
是否必須有：必須

數(shù)量：有且僅有一個

aligned(8) class MovieBox extends Box(‘moov’){ 
}

moov box是一個非常重要的box：主要用來存放描述mp4文件媒體數(shù)據(jù)的metadata信息，包括視頻的寬，高，總時長，音頻的采樣率，通道數(shù)，總時長等，以及如何查找真正的媒體數(shù)據(jù)信息。moov box是一個容器box，其內(nèi)容分布到各個子box中。如下圖所示：
MP4文件格式解析
moov中的子box有很多種，具體可以在文章末尾的表格中查看，下面只針對必需的mvhd,trak這兩種box做分析，其他的box如：udta box一般是用戶有自己的用途在封裝時添加，有興趣可以自己了解。

Movie Header Box

類型：mvhd
父容器：moov
是否必須有：必須

數(shù)量：只有一個

aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) { 
 if (version==1) { 
 unsigned int(64) creation_time; 
 unsigned int(64) modification_time; 
 unsigned int(32) timescale; 
 unsigned int(64) duration; 
 } else { // version==0 
 unsigned int(32) creation_time; 
 unsigned int(32) modification_time; 
 unsigned int(32) timescale; 
 unsigned int(32) duration; 
 } 
 template int(32) rate = 0x00010000; // typically 1.0 
 template int(16) volume = 0x0100; // typically, full volume 
 const bit(16) reserved = 0; 
 const unsigned int(32)[2] reserved = 0; 
 template int(32)[9] matrix = 
 { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; 
 // Unity matrix 
 bit(32)[6] pre_defined = 0; 
 unsigned int(32) next_track_ID; 
}

根據(jù)標準定義，moov box屬于Full Box，其Header部分除了size（4字節(jié))，type(4字節(jié))外，還額外包含了version(1字節(jié)), flags(3字節(jié))。

字段	字節(jié)數(shù)	描述
version	1	版本，0或者1, version=1時，有些字段采用更大的字節(jié)空間定義
flags	3	0
creation_time	4/8	整數(shù)，表示文件創(chuàng)建的時間戳，是指從1904-01-01到創(chuàng)建點所經(jīng)過的秒數(shù)
modification_time	4/8	整數(shù)，表示文件修改時間戳，解釋同上
timescale	4	整數(shù)，表示時間粒度，如果是1000，表示1秒被分成1000個粒度，用于下面的duration計算
duration	4/8	整數(shù)，媒體文件的總時長，該值需要除以timescale，轉(zhuǎn)換成真正的秒數(shù)。例如：timescale = 1000, duration = 5000 ,則媒體文件的總時長為5秒。
rate	4	[16,16]格式，前16位表示整數(shù)部分，后16位表示小數(shù)部分：例如rate =0001 0000，表示rate=1.0，正常速率播放
volume	2	[8,8]格式，前8位表示整數(shù)，后8位表示浮點數(shù)，0x0100 = 1.0，最大音量播放
reserved	2*4	保留字段
matrix	4*9	視頻變換矩陣
next_track_ID	4	下一個trackId，當(dāng)在mp4中需要新插入一個track時，這個track的id。

mvhd box從整體上對該mp4文件做了一個信息概述，從這個box我們可以獲取到該mp4文件的播放總時長。

下圖是從二進制數(shù)據(jù)手動分析mvhdbox和工具解析對比圖：
MP4文件格式解析

Track Box

類型：trak
父容器：moov
是否必須有：必須
數(shù)量：至少有一個或者多個

aligned(8) class TrackBox extends Box(‘trak’) { 
}

MP4文件格式解析

一個mp4媒體是由一個或者多個track組成，如音頻track，視頻track，每一個track都攜帶有自己的時間和空間信息，每個track都相互獨立于其他track。video track：包含了視頻Sample，audio Track包含了audio sample，hint track稍有不同，它描述了一個流媒體服務(wù)器如何把文件中的媒體數(shù)據(jù)組成符合流媒體協(xié)議的數(shù)據(jù)包，如果文件只是本地播放，可以忽略hint track，他們只與流媒體有關(guān)。

track有兩種用途：a. 包含媒體數(shù)據(jù)(media tracks)。b.用來存放分包信息，用于流傳輸協(xié)議。(hint track).

在一個標準的mp4文件中，至少應(yīng)該有一個media track，同時所有有助于hint track的 media track都應(yīng)該保留在文件中，即使這些media track沒有被hint track引用。本文不對hint track進行討論。

如上圖所示：trak box是一個Container box，其內(nèi)容有由其子box承載，同樣我們只分析必要的box:tkhd,mdia，至于其他例如：edts有興趣的同學(xué)可以自行了解。

Track Header Box

類型：tkhd
父容器：trak
是否必須有：必須

數(shù)量：必須有一個

aligned(8) class TrackHeaderBox extends FullBox(‘tkhd’, version, flags){ 
 if (version==1) { 
   unsigned int(64) creation_time; 
   unsigned int(64) modification_time; 
   unsigned int(32) track_ID; 
   const unsigned int(32) reserved = 0; 
   unsigned int(64) duration; 
 } else { // version==0 
   unsigned int(32) creation_time; 
   unsigned int(32) modification_time; 
   unsigned int(32) track_ID; 
   const unsigned int(32) reserved = 0; 
   unsigned int(32) duration; 
 } 
 const unsigned int(32)[2] reserved = 0; 
 template int(16) layer = 0; 
 template int(16) alternate_group = 0; 
 template int(16) volume = {if track_is_audio 0x0100 else 0}; 
 const unsigned int(16) reserved = 0; 
 template int(32)[9] matrix= { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; 
 // unity matrix 
 unsigned int(32) width; 
 unsigned int(32) height; 
}

根據(jù)標準定義可知，tkhd box屬于Full Box，因此其頭部除了size（4B)，type(4B)外，還額外包含了version(1B), flag(3B)。

字段	字節(jié)數(shù)	含義
size	4	Box大小
type	4	Box類型：tkhd
version	1	box版本：0或1，區(qū)別是部分字段的存儲空間差異，一下以version=0進行分析
flags	3	按位或操作結(jié)果值, 預(yù)定義如下：0x000001:Track_enabled,表明該track是可以被播放的，如果最低bit為0，則不能被播放；0x000002:Track_in_movie,表明該track在播放中被引用；0x000004:Track_in_preview表明該track在預(yù)覽時被引用；一般這個值為7,如果一個媒體的所有track都沒有設(shè)置Track_in_movie和Track_in_preview，將被理解為所有track均設(shè)置了這兩項；對于hinttrack，該值為0
creation_time	4	track的創(chuàng)建時間
modification_time	4	track的修改時間
track_id	4	不能重復(fù)，而且不能為0
reserved	4	保留位
duration	4	track的時間長度，單位采用的是mvhd box下的timescale
reserved	2*4	保留位
layer	2	視頻層，默認為0，值小的在上面
alternate_group	2	track分組信息，默認為0表示該track不與其他任何track相關(guān)，
volume	2	[8.8]格式，音頻track使用，1.0[0100]:表示最大音量，視頻track為0[0000]
reserved	2	保留位
matrix	36	視頻變換矩陣
width	4	視頻的寬；音頻track為0
height	4	視頻的高【16.16】的格式存儲；音頻track為0

從這個box中我們可以獲取到MP4的如下信息：

track時長計算：duration / mvhd->timescale
對于視頻的寬高計算：realWidth = width >> 16; realHeight = height >> 16

Track Media Structure (Media box)

類型：mdia
父容器：trak
是否必須有：必須
數(shù)量：必須有一個
```
aligned(8) class MediaBox extends Box(‘mdia’) { 
}
```
mdia box是trak下一個非常重要的子box，它也是一個Container box，其內(nèi)容由其子box承載，mdia box下必須的子box有：mdhd,hdlr,minf如下圖所示

Media Header Box

類型：mdhd
父容器：mdia
是否必須有：必須

數(shù)量：必須有一個

aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) { 
 if (version==1) { 
 	unsigned int(64) creation_time; 
 	unsigned int(64) modification_time; 
	unsigned int(32) timescale; 
 	unsigned int(64) duration; 
 } else { // version==0 
 	unsigned int(32) creation_time; 
 	unsigned int(32) modification_time; 
	unsigned int(32) timescale; 
 	unsigned int(32) duration; 
 } 
 bit(1) pad = 0; 
 unsigned int(5)[3] language; // ISO-639-2/T language code 
 unsigned int(16) pre_defined = 0; 
}

這個box能獲取的主要信息是：單獨的視頻或音頻的總時長 = duration / timescale。

到目前為止，我們已經(jīng)能夠獲取到媒體的總時長，視頻的寬高獲取，以及音頻或者視頻的單獨時長，那么我們該如何區(qū)分當(dāng)前track是視頻track還是音頻track呢？這個任務(wù)交給了hdlr box。

Handler Reference Box

類型：hdlr
父容器：mdia
是否必須有：必須
數(shù)量：有且只有一個
```
aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) { 
 unsigned int(32) pre_defined = 0; 
 unsigned int(32) handler_type; 
 const unsigned int(32)[3] reserved = 0; 
 string name; 
}
```
handler_type: “vide”:表示當(dāng)前track是video track，“soun”：表示當(dāng)前track是audio track，“hint”:表示是hint track。

區(qū)分出track后，剩下需要針對不同的媒體類型數(shù)據(jù)，查找到對應(yīng)的編解碼信息，以及如何查找到媒體數(shù)據(jù)，以及數(shù)據(jù)包和其關(guān)聯(lián)的時間戳等信息。這些信息都包含在Media Information Box minf 里。

Media Information Box

類型：minf
父容器：mdia
是否必須有：必須
數(shù)量：有且只有一個
```
aligned(8) class MediaInformationBox extends Box(‘minf’) { 
}
```
minf box 顧名思義，是專門用來描述當(dāng)前track下媒體數(shù)據(jù)的box，它是一個container box，其內(nèi)容由其子box來表達，需要注意的是video track和audio track下的一些子box有些不同，看下圖對比

MP4文件格式解析

Video media header

類型：vmhd
父容器：minf
是否必須有：必須

數(shù)量：有且只有一個

aligned(8) class VideoMediaHeaderBox 
 extends FullBox(‘vmhd’, version = 0, 1) { 
 template unsigned int(16) graphicsmode = 0; // copy, see below 
 template unsigned int(16)[3] opcolor = {0, 0, 0}; 
}

Sound media header

類型：smhd
父容器：minf
是否必須有：必須

數(shù)量：有且只有一個

aligned(8) class SoundMediaHeaderBox 
 extends FullBox(‘smhd’, version = 0, 0) { 
 template int(16) balance = 0; 
 const unsigned int(16) reserved = 0; 
}

Data Reference Box

類型：dref
父容器：minf or meta
是否必須有：對于minf是必須有，對于meta可以沒有

數(shù)量：有且只有一個

aligned(8) class DataInformationBox extends Box(‘dref’) { 
}

dref box是一個container box，其內(nèi)容由其子box:‘url或urn box承載。

url或urn box

類型：url / urn
父容器：dref
是否必須有：必須

數(shù)量：有且只有一個

aligned(8) class DataEntryUrlBox (bit(24) flags) extends FullBox(‘url ’, version = 0, flags) { 
 string location; 
} 
aligned(8) class DataEntryUrnBox (bit(24) flags) extends FullBox(‘urn ’, version = 0, flags) { 
 string name; 
 string location; 
} 
aligned(8) class DataReferenceBox extends FullBox(‘dref’, version = 0, 0) { 
 unsigned int(32) entry_count; 
 for (i=1; i <= entry_count; i++) { 
 DataEntryBox(entry_version, entry_flags) data_entry; 
 } 
}

MP4媒體數(shù)據(jù)的物理存放位置不受媒體數(shù)據(jù)的時間順序限制，因為讀取真實的媒體數(shù)據(jù)是通過moov中的媒體描述信息來確定的。媒體數(shù)據(jù)可以包含在同一個或者多個box中，甚至可以在其他文件中，描述信息可以通過url來引用這些文件。但這些媒體數(shù)據(jù)的排列關(guān)系，必須全部包含在一個主文件的metadata描述里，其他文件不一定是MP4格式，甚至可能就沒有Box。

一個track的媒體數(shù)據(jù)可以被分為若干段，每一段可以根據(jù)url或者urn指向地址來獲取數(shù)據(jù)，sample描述中會用這些片段的序號將將這些片段組成一個完整的track，一般情況下，當(dāng)數(shù)據(jù)被完全包含在文件中時，url或urn中的定位字符串是空的。

stco/co64

aligned(8) class ChunkOffsetBox extends FullBox(‘stco’, version = 0, 0) { 
 unsigned int(32) entry_count; 
 for (i=1; i <= entry_count; i++) { 
 	unsigned int(32) chunk_offset; 
 } 
} 
aligned(8) class ChunkLargeOffsetBox extends FullBox(‘co64’, version = 0, 0) { 
 unsigned int(32) entry_count; 
 for (i=1; i <= entry_count; i++) { 
 	unsigned int(64) chunk_offset; 
 } 
}

一個mp4文件媒體數(shù)據(jù)是由一個或多個track組成，而一個track是由一個或多個chunk組成，一個chunk是由一個或多個sample組成。
要讀取mp4媒體的sample數(shù)據(jù)，就需要找到對應(yīng)的sample在文件中的存儲偏移位置。要找到對應(yīng)的sample在文件中的存儲偏移位置，首先要找到其所在chunk在文件中的偏移位置。stco box 給出了每個chunk在mp4文件中的偏移(offset)，即每個chunk在文件中的位置。將記錄結(jié)果存放到chunk_offset表中，如下圖：
MP4文件格式解析
通過上圖，我們可以遍歷上面的chunk_offset表找到每一個chunk在文件中的偏移地址。

stsc

aligned(8) class SampleToChunkBox extends FullBox(‘stsc’, version = 0, 0) { 
 unsigned int(32) entry_count; 
 for (i=1; i <= entry_count; i++) { 
 	unsigned int(32) first_chunk; 
 	unsigned int(32) samples_per_chunk; 
 	unsigned int(32) sample_description_index; 
 } 
}

知道chunk在文件中的偏移位置后，還需要知道sample和chunk之間的關(guān)系。mp4協(xié)議將連續(xù)且具有相同sample_per_chunk和sample_description_index的字段的chunk簡化成一個entry，并將這些entry放到一個表里如下圖。
MP4文件格式解析
first_chunk是每個entry的第一個chunk index，samples_per_chunk表示每個chunk中的sample個數(shù)

第1-2個chunk：每個chunk兩個sample
第3個trunk：有一個sample
第4個trunk：有2個sample
第5個trunk：有2個sample
第6-11個chunk：每個chunk有2個sample
第13個trunk：有1個sample

stsz

結(jié)合stco box和stsc box 提供的信息，我們知道了以下信息

mp4文件中某個track里的chunk的總數(shù)
每個chunk在文件中的偏移位置
每個chunk有多少個sample

但是想要完整的讀取該track的全部媒體數(shù)據(jù)，還需要知道每個sample的大小。stsz box記錄了每個sample的大小。

aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) { 
 unsigned int(32) sample_size; 
 unsigned int(32) sample_count; 
 if (sample_size==0) { 
 		for (i=1; i <= sample_count; i++) { 
 			unsigned int(32) entry_size; 
 		} 
 } 
}

如果sample_size不為0，則表示sample_count個數(shù)的sample都是sample_size的大??；如果sample_size為0，則表示每個sample具有不同的大小，這些size存儲在一張表里

entry_size：對應(yīng)index的sample的大小
MP4文件格式解析
如上圖：第一個sample的size是2317，第二個sample的szie是573，sample的總數(shù)是：6551

讀取mp4視頻媒體數(shù)據(jù)

現(xiàn)在我們知道了，該如何讀取一個mp4文件中video track下的視頻媒體數(shù)據(jù)了。

遍歷video track下的stco/co64 box中的chunk_offset表，找到每個chunk在文件中的偏移位置
解析stsc box中對應(yīng)chunk有多少個sample
讀取stsz表，查找每個sample有多大
這樣我們就可以讀取完成一個chunk里的數(shù)據(jù)，當(dāng)遍歷完chunk_offset表后，當(dāng)前track下的所有的sample也就被讀取了。

我們來用video的第一個sample來驗證一下，第一幀視頻數(shù)據(jù)存放在第一個trunk的第一個sample中，所以很簡單，video track中第一個trunk的偏移量就是第一幀視頻數(shù)據(jù)的偏移量：805293，這里我們打開二進制查看工具和StreamEye進行對比驗證
MP4文件格式解析

由于我使用的的mp4的視頻編碼h264采用的是AVCC的封裝，因此每個naul前面有4個字節(jié)的header，而header后才是真正的nalu的數(shù)據(jù)，可以看到和工具解析對應(yīng)的數(shù)據(jù)是一樣的

能夠從頭按順序的讀取完成一個mp4的媒體數(shù)據(jù)的全部數(shù)據(jù)，還遠遠不夠，我們還需要從指定的時間開始讀取對應(yīng)的媒體數(shù)據(jù)，那么該如何做呢？

MP4文件封裝的最終目標就是為了能夠再次播放，播放MP4的一個重要任務(wù)就是找到每個sample的PTS

stts

DTS(n+1) = DTS(n)+STTS(n)
stts box記錄著該sample到下一個sample的DTS間隔：duration，時間粒度是trak->mdia->mdhd->timescale。除了最后一個box外，所有的sample的duration都應(yīng)該大于0的數(shù)值。這樣可以保證整體的時間是單調(diào)遞增的，可以認為是這個sample的持續(xù)時間。
MP4文件格式解析
上圖分別是兩個MP4文件的stts box圖，對于圖一，從第一個sample開始，到第6551個sample，他們的DTS間隔都是2000。

對于圖二：解釋如下

第一個sample的持續(xù)時間是4497
第二個sample的持續(xù)時間是4499，
第三第四個sample：4498，
第五個：4500

ctts

如果視頻編碼中沒有B幀，則sample的編碼輸出順序與圖像的輸入順序是一致的，對應(yīng)到解碼時，解碼順序DTS和渲染順序PTS是一致的，此時PTS等于DTS。如果存在B幀則有些不同，由于B幀會進行雙向參考，因此編碼器會先編碼P5幀接著編碼B2,B3,B4幀，這樣編碼輸出的sample的PTS就會出現(xiàn)反轉(zhuǎn)現(xiàn)象，
MP4文件格式解析
因此如果視頻存在B幀，則DTS就不能代表PTS了，我們需要一種方式來計算出每個sample的正確PTS：ctts的作用就是如此，他記錄這每個sample的PTS和DTS的差值。
PTS(n) = DTS(n) + CTTS(n)

MP4文件格式解析
對于上圖的ctts，我們可以這樣計算PTS和DTS，一般第一個sample的PTS我們總是習(xí)慣其等于0：PTS(1) = 0。
若：stts:都是2000，trak->mdia->mdhd->timescale = 50000
有如下計算：

PTS(1) = DTS(1) + CTTS(1) 
=> DTS(1) = - CTTS(1) 
=> DTS(1) = -2000;

//換算成微秒
若：trak->mdia->mdhd->timescale = 50000;
則：DTS_us(1) = -2000/50000 * 1000000(us)
=> DTS_us(1) = -40000(us)

DTS(2) = DTS(1) + STTS(1)
=> DTS(2) = -2000 + 2000 = 0;
//換算成微秒后
DTS_us(2) = 0(us)


=>PTS(2) = DTS(2) + CTTS(2)
=>PTS(2) = 0 + 8000 = 8000
//換算成微秒后
PTS_us(2) = 8000/50000 * 1000000 = 160000(us)
...
以此類推，就可以計算出所有的pts

stss

視頻track獨有，記錄關(guān)鍵幀的sample序號

aligned(8) class SyncSampleBox extends FullBox(‘stss’, version = 0, 0) { 
 	unsigned int(32) entry_count; 
 	int i; 
 	for (i=0; i < entry_count; i++) { 
 		unsigned int(32) sample_number; 
 	} 
}

MP4文件格式解析
entry_count：記錄關(guān)鍵幀sample的個數(shù)，關(guān)鍵幀sample的序號會存放到一個表里，如上圖。視頻的第一個sample為關(guān)鍵幀序號為1.

Box						是否必須存在
ftyp						√	file type and compatibility
pdin							progressive download information
moov						√	container for all the metadata
	mvhd					√	movie header, overall declarations
	trak					√	container for an individual track or stream
		tkhd				√	track header, overall information about the track
		tref					track reference container
		edts					edit list container
			elst				an edit list
		mdia				√	container for the media information in a track
			mdhd			√	media header, overall information about the media
			hdlr			√	handler, declares the media (handler) type
			minf			√	media information container
				vmhd			video media header, overall information (video track only)
				smhd			sound media header, overall information (sound track only)
				hmhd			hint media header, overall information (hint track only)
				nmhd			Null media header, overall information (some tracks only)
				dinf		√	data information box, container
					dref	√	data reference box, declares source(s) of media data in track
				stbl		√	sample table box, container for the time/space map
					stsd	√	sample descriptions (codec types, initialization etc.)
					stts	√	(decoding) time-to-sample
					ctts		(composition) time to sample
					stsc	√	sample-to-chunk, partial data-offsetinformation
					stsz		sample sizes (framing)
					stz2		compact sample sizes (framing)
					stco	√	chunk offset, partial data-offset information
					co64		64-bit chunk offset
					stss		sync sample table (random access points)
					stsh		shadow sync sample table
					padb		sample padding bits
					stdp		sample degradation priority
					sdtp		independent and disposable samples
					sbgp		sample-to-group
					sgpd		sample group description
					subs		sub-sample information
	mvex						movie extends box
		mehd					movie extends header box
		trex				√	track extends defaults
	ipmc						IPMP Control Box
moof							movie fragment
	mfhd					√	movie fragment header
	traf						track fragment
		tfhd				√	track fragment header
		trun					track fragment run
		sdtp					independent and disposable samples
		sbgp					sample-to-group
		subs					sub-sample information
mfra							movie fragment random access
	tfra						track fragment random access
	mfro					√	movie fragment random access offset
mdat							media data container
free							free space
skip							free space
	udta						user-data
		cprt					copyright etc.
meta							metadata
	hdlr					√	handler, declares the metadata (handler) type
	dinf						data information box, container
		dref					data reference box, declares source(s) of metadata items
	ipmc						IPMP Control Box
	iloc						item location
	ipro						item protection
		sinf					protection scheme information box
			frma				original format box
			imif				IPMP Information box
			schm				scheme type box
			schi				scheme information box
	iinf						item information
	xml						XML container
	bxml						binary XML container
	pitm						primary item reference
	fiin						file delivery item information
		paen					partition entry
			fpar				file partition
			fecr				FEC reservoir
		segr					file delivery session group
		gitn					group id to name
		tsel					track selection
meco							additional metadata container
	mere						metabox relation

附錄

MP4在線分析工具：
MP4box.js:https://gpac.github.io/mp4box.js/test/filereader.html
mp4parser：https://www.onlinemp4parser.com/文章來源地址http://www.zghlxwxcb.cn/news/detail-425764.html

到了這里，關(guān)于MP4文件格式解析的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

Toy模板網(wǎng)

寫在前面的

MP4文件概述

Box的基本結(jié)構(gòu)

Basic Box

Full Box

MP4 Box解析

File Type Box

Movie Box

Movie Header Box

Track Box

Track Header Box

Track Media Structure (Media box)

Media Header Box

Handler Reference Box

Media Information Box

Video media header

Sound media header

Data Reference Box

url或urn box

stco/co64

stsc

stsz

讀取mp4視頻媒體數(shù)據(jù)

stts

ctts

stss

附錄

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

微信掃一掃打賞

支付寶掃一掃領(lǐng)取紅包，優(yōu)惠每天領(lǐng)

二維碼1

二維碼2

MP4文件格式解析

寫在前面的

MP4文件概述

Box的基本結(jié)構(gòu)

Basic Box

Full Box

MP4 Box解析

File Type Box

Movie Box

Movie Header Box

Track Box

Track Header Box

Track Media Structure (Media box)

Media Header Box

Handler Reference Box

Media Information Box

Video media header

Sound media header

Data Reference Box

url或urn box

stco/co64

stsc

stsz

讀取mp4視頻媒體數(shù)據(jù)

stts

ctts

stss

附錄

相關(guān)文章

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

微信掃一掃打賞

支付寶掃一掃領(lǐng)取紅包，優(yōu)惠每天領(lǐng)

二維碼1

二維碼2

支付寶掃一掃領(lǐng)取紅包，優(yōu)惠每天領(lǐng)