本文分享自華為云社區(qū)《爆圈Sora橫空出世,AGI通用人工智能時(shí)代真的要來了嗎?一鍵Run帶你體驗(yàn)擴(kuò)散模型的魅力!》,作者: 碼上開花_Lancer。
Sora這幾天的爆炸性新聞,讓所有人工智能相關(guān)從業(yè)者及對應(yīng)用感興趣的人群都感到沸騰,震撼到央視也在進(jìn)行相關(guān)的討論,簡直可以和2023年初ChatGPT討論帶來的熱潮一般。所以它到底為什么這么火?
一、什么是SORA?
Sora 是OpenAI最新發(fā)布的文本生成視頻模型,不僅可以生成長達(dá)一分鐘的視頻,且能完全遵照用戶的?Prompt?并保持視覺質(zhì)量。
OpenAI 這個(gè)公司的格局非常大,他想要做 World Simulators(世界模擬器),做通用AGI,而不僅僅是文字或者圖像視頻領(lǐng)域的內(nèi)容,他希望的是幫助人們解決需要現(xiàn)實(shí)世界交互的問題。單從OpenAI 發(fā)布的sora模型的論文可以看出來:
圖片中文翻譯:
視頻生成模型作為世界模擬器 我們探討了在視頻數(shù)據(jù)上對生成模型進(jìn)行大規(guī)模訓(xùn)練。 具體來說,我們共同訓(xùn)練了文本條件擴(kuò)散模型,這些模型能夠處理不同時(shí)長、分辨率和寬高比的視頻和圖像。 我們利用了一種變壓器架構(gòu),該架構(gòu)能夠處理視頻和圖像潛在代碼的空間時(shí)間塊。我們最大的模型,Sora,能夠生成一分鐘的高保真視頻。 我們的結(jié)果表明,擴(kuò)展視頻生成模型是構(gòu)建通用物理世界模擬器的有希望的道路。
在視頻創(chuàng)作領(lǐng)域,畫面的穩(wěn)定性至關(guān)重要。如果要呈現(xiàn)出優(yōu)質(zhì)的效果,創(chuàng)作者需要具備高超的視頻剪輯技能和相關(guān)基礎(chǔ)。然而,SORA這次的表現(xiàn)真是逆天!通過簡單的文字描述,它能生成畫面穩(wěn)定、理解能力強(qiáng)的長視頻。
SORA的技術(shù)思路與眾不同,完全碾壓了傳統(tǒng)方法。它不再僅關(guān)注二維像素的變化,而是專注于語義理解的變化。從以往的視頻畫面生成,轉(zhuǎn)變?yōu)楣适逻壿嫷纳?。這種創(chuàng)新思路讓人瞠目結(jié)舌,展示了技術(shù)的無限可能性
二、SORA背后原理的推測
根據(jù)OpenAI最新發(fā)布的技術(shù)報(bào)告,Sora背后的“text-to-video”模型基于Diffusion Transformer Model。這種模型結(jié)合了Transformer架構(gòu)和擴(kuò)散模型,用于生成圖像、視頻和其他數(shù)據(jù)。
實(shí)際上,Sora是一個(gè)基于Transformer的擴(kuò)散模型。這類模型不僅在理論上具有創(chuàng)新性,而且在實(shí)際應(yīng)用中也顯示出了強(qiáng)大的潛力。例如,DiT模型(Sora的基礎(chǔ))和GenTron模型在圖像和視頻生成等領(lǐng)域都已經(jīng)取得了巨大的成功,這些創(chuàng)新性的模型為我們展示了技術(shù)的無限可能性。目前Sora技術(shù)沒有公開,大家對它都有不同猜測。DIT提出人謝賽寧:
1)Sora應(yīng)該是建立在DiT這個(gè)擴(kuò)散Transformer之上的 。
2)Sora可能有大約30億個(gè)參數(shù),(引用論文模型0.13B, 32X算力)。
3)訓(xùn)練數(shù)據(jù)是Sora 成功的最關(guān)鍵因素。
4)主要的挑戰(zhàn)是如何解決錯(cuò)誤累積問題并隨著時(shí)間的推移保持質(zhì)量/一致 。
DiT模型:Meta提出的完全基于transformer架構(gòu)的擴(kuò)散模型,不僅將transformer成功應(yīng)用在擴(kuò)散模型,還探究了transformer架構(gòu)在擴(kuò)散模型上的scalability能力。
GenTron模型:一種基于Transformer的擴(kuò)散模型,在針對SDXL的人類評估中,GenTron在視覺質(zhì)量方面取得了51.1%的勝率(19.8%的平局率),在文本對齊方面取得了42.3%的勝率(42.9%的平局率)。
DiT模型Scalable Diffusion Models with Transformers ---- 基于transformer的擴(kuò)散模型,稱為Diffusion Transformers(DiTs) ,Diffusion Transformer Model(DiT)的設(shè)計(jì)空間、擴(kuò)展行為、網(wǎng)絡(luò)復(fù)雜度和樣本質(zhì)量之間的關(guān)系。這些研究結(jié)果表明,通過簡單地?cái)U(kuò)展DiT并使用高容量的骨干網(wǎng)絡(luò),可以在類條件256x256 ImageNet生成基準(zhǔn)測試中實(shí)現(xiàn)最新的2.27 FID。與像素空間擴(kuò)散模型相比,DiTs在使用的Gflops只是其一小部分,因此具有較高的計(jì)算效率。此外,DiTs還可以應(yīng)用于像素空間,使得圖像生成流程成為混合方法,使用現(xiàn)成的卷積VAEs和基于transformer的DDPMs。
擴(kuò)散模型中引入了transformer類的標(biāo)準(zhǔn)設(shè)計(jì),以取代傳統(tǒng)的U-Net設(shè)計(jì),從而提供了一種新的架構(gòu)選擇。
引入了潛在擴(kuò)散模型(LDMs),通過將圖像壓縮為較小的空間表示,并在這些表示上訓(xùn)練擴(kuò)散模型,從而解決了在高分辨率像素空間中直接訓(xùn)練擴(kuò)散模型的計(jì)算問題。
那對于我們開發(fā)者用戶想要強(qiáng)烈體驗(yàn)文生視頻的樂趣,那里可以體驗(yàn)?zāi)??今天給大家介紹下Stable Video Diffusion (SVD),一起在華為云一鍵Run體驗(yàn)其中的樂趣:
三、Stable Video Diffusion (SVD) 擴(kuò)散模型的圖像生成視頻的體驗(yàn)
1. 案例簡介
Stable Video Diffusion (SVD) 是一種擴(kuò)散模型,它將靜止圖像作為條件幀,并從中生成視頻。
?? 本案例需使用 Pytorch-1.8 GPU-V100 及以上規(guī)格運(yùn)行
?? 點(diǎn)擊Run in ModelArts,將會(huì)進(jìn)入到ModelArts CodeLab中,這時(shí)需要你登錄華為云賬號(hào),如果沒有賬號(hào),則需要注冊一個(gè),且要進(jìn)行實(shí)名認(rèn)證,參考《ModelArts準(zhǔn)備工作_簡易版》?即可完成賬號(hào)注冊和實(shí)名認(rèn)證。 登錄之后,等待片刻,即可進(jìn)入到CodeLab的運(yùn)行環(huán)境
?? 出現(xiàn) Out Of Memory ,請檢查是否為您的參數(shù)配置過高導(dǎo)致,修改參數(shù)配置,重啟kernel或更換更高規(guī)格資源進(jìn)行規(guī)避???
2. 下載代碼和模型
!git clone https://github.com/Stability-AI/generative-models.git
Cloning into 'generative-models'... ? remote: Enumerating objects: 860, done.?[K ? remote: Counting objects: 100% (489/489), done.?[K ? remote: Compressing objects: 100% (222/222), done.?[K ? remote: Total 860 (delta 368), reused 267 (delta 267), pack-reused 371?[K ? Receiving objects: 100% (860/860), 42.67 MiB | 462.00 KiB/s, done. ? Resolving deltas: 100% (445/445), done.
import moxing as mox mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/modify_file/generative-models/sgm/modules/encoders','generative-models/sgm/modules/encoders') mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/models','generative-models/models') mox.file.copy_parallel(,'obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/checkpoints','generative-models/checkpoints')
INFO:root:Using MoXing-v2.1.0.5d9c87c8-5d9c87c8 ? INFO:root:Using OBS-Python-SDK-3.20.9.1
3. 配置運(yùn)行環(huán)境
本案例依賴Python3.10.10及以上環(huán)境,因此我們首先創(chuàng)建虛擬環(huán)境:
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main !/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version! ? RequestsDependencyWarning) ? Collecting package metadata (current_repodata.json): done ? Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. ? Collecting package metadata (repodata.json): done ? Solving environment: done
import json import os ? data = { "display_name": "python-3.10.10", "env": { "PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin" }, "language": "python", "argv": [ "/home/ma-user/anaconda3/envs/python-3.10.10/bin/python", "-m", "ipykernel", "-f", "{connection_file}" ] } ? if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"): os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/") ? with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f: json.dump(data, f, indent=4)
創(chuàng)建完成后,稍等片刻,或刷新頁面,點(diǎn)擊右上角kernel選擇python-3.10.10?
!pip install torch==2.0.1 torchvision==0.15.2 !pip install MoviePy
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple ? Collecting torch==2.0.1 ? Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/8c/4d/17e07377c9c3d1a0c4eb3fde1c7c16b5a0ce6133ddbabc08ceef6b7f2645/torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB) ? ?[2K ?[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━?[0m ?[32m619.9/619.9 MB?[0m ?[31m5.6 MB/s?[0m eta ?[36m0:00:00?[0m00:01?[0m00:01?[0m ? ...... Uninstalling decorator-5.1.1: ? Successfully uninstalled decorator-5.1.1 ? Successfully installed MoviePy-1.0.3 decorator-4.4.2 imageio-2.34.0 imageio_ffmpeg-0.4.9 proglog-0.1.10 tqdm-4.66.2
%cd generative-models
/home/ma-user/work/stable-video-diffusion/generative-models
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library. ? self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
!pip install -r requirements/pt2.txt
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple ? Collecting clip@ git+https://github.com/openai/CLIP.git (from -r requirements/pt2.txt (line 3)) ? Cloning https://github.com/openai/CLIP.git to /tmp/pip-install-_vzv4vq_/clip_4273bc4d2cba4d6486a222a5093fbe4b ? conda3/envs/python-3.10.10/lib/python3.10/site-packages (from -r requirements/pt2.txt (line 32)) (4.66.2) ? Collecting transformers==4.19.1 (from -r requirements/pt2.txt (line 33)) ? Successfully uninstalled urllib3-2.2.1 ? Successfully installed PyWavelets-1.5.0 aiohttp-3.9.3 aiosignal-1.3.1 altair-5.2.0 antlr4-python3-runtime-4.9.3 appdirs-1.4.4 async-timeout-4.0.3 attrs-23.2.0 black-23.7.0 blinker-1.7.0 braceexpand-0.1.7 cachetools-5.3.2 chardet-5.1.0 click-8.1.7 clip-1.0 contourpy-1.2.0 cycler-0.12.1 docker-pycreds-0.4.0 einops-0.7.0 fairscale-0.4.13 fire-0.5.0 fonttools-4.49.0 frozenlist-1.4.1 fsspec-2024.2.0 ftfy-6.1.3 gitdb-4.0.11 gitpython-3.1.42 huggingface-hub-0.20.3 importlib-metadata-7.0.1 invisible-watermark-0.2.0 jsonschema-4.21.1 jsonschema-specifications-2023.12.1 kiwisolver-1.4.5 kornia-0.6.9 lightning-utilities-0.10.1 markdown-it-py-3.0.0 matplotlib-3.8.3 mdurl-0.1.2 multidict-6.0.5 mypy-extensions-1.0.0 natsort-8.4.0 ninja-1.11.1.1 omegaconf-2.3.0 open-clip-torch-2.24.0 opencv-python-4.6.0.66 pandas-2.2.0 pathspec-0.12.1 protobuf-3.20.3 pudb-2024.1 pyarrow-15.0.0 pydeck-0.8.1b0 pyparsing-3.1.1 pytorch-lightning-2.0.1 pytz-2024.1 pyyaml-6.0.1 referencing-0.33.0 regex-2023.12.25 rich-13.7.0 rpds-py-0.18.0 safetensors-0.4.2 scipy-1.12.0 sentencepiece-0.2.0 sentry-sdk-1.40.5 setproctitle-1.3.3 smmap-5.0.1 streamlit-1.31.1 streamlit-keyup-0.2.0 tenacity-8.2.3 tensorboardx-2.6 termcolor-2.4.0 timm-0.9.16 tokenizers-0.12.1 toml-0.10.2 tomli-2.0.1 toolz-0.12.1 torchaudio-2.0.2 torchdata-0.6.1 torchmetrics-1.3.1 transformers-4.19.1 tzdata-2024.1 tzlocal-5.2 urllib3-1.26.18 urwid-2.6.4 urwid-readline-0.13 validators-0.22.0 wandb-0.16.3 watchdog-4.0.0 webdataset-0.2.86 xformers-0.0.22 yarl-1.9.4 zipp-3.17.0
!pip install .
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple ? Processing /home/ma-user/work/stable-video-diffusion/generative-models ? Installing build dependencies ... ?[?25ldone ? ?[?25h Getting requirements to build wheel ... ?[?25ldone ? ?[?25h Preparing metadata (pyproject.toml) ... ?[?25ldone ? ?[?25hBuilding wheels for collected packages: sgm ? Building wheel for sgm (pyproject.toml) ... ?[?25ldone ? ?[?25h Created wheel for sgm: filename=sgm-0.1.0-py3-none-any.whl size=127368 sha256=0f9ff6913b03b2e0354cd1962ecb2fc03df36dea90d14b27dc46620e6eafc9a0 ? Stored in directory: /home/ma-user/.cache/pip/wheels/a9/b8/f4/e84140beaf1762b37f5268788964d58d91394ee17de04b3f9a ? Successfully built sgm ? Installing collected packages: sgm ? Successfully installed sgm-0.1.0
4. 生成視頻
視頻默認(rèn)生成到outputs文件夾內(nèi)
!python scripts/sampling/simple_video_sample.py --decoding_t 1 --input_path 'assets/test_image.png'
/home/ma-user/work/stable-video-diffusion/generative-models ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? VideoTransformerBlock is using checkpointing ? Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False ? Initialized embedder #1: ConcatTimestepEmbedderND with 0 params. Trainable: False ? Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False ? Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False ? Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False ? Restored from checkpoints/svd.safetensors with 0 missing and 0 unexpected keys ? 100%|███████████████████████████████████████| 890M/890M [00:50<00:00, 18.5MiB/s] ? /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None ? warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
#將視頻文件轉(zhuǎn)成動(dòng)圖顯示 from moviepy.editor import * # 指定輸入視頻路徑 video_path = "outputs/simple_video_sample/svd/000000.mp4" # 加載視頻 clip = VideoFileClip(video_path) # 設(shè)置保存GIF的參數(shù)(如分辨率、持續(xù)時(shí)間等) output_file = "output_animation.gif" fps = 10 # GIF每秒顯示的幀數(shù) # 生成并保存GIF clip.write_gif(output_file, fps=fps)
MoviePy - Building file output_animation.gif with imageio.
from IPython.display import Image Image(open('output_animation.gif','rb').read())
大家趕緊來體驗(yàn)文生視頻的樂趣吧!
?文章來源地址http://www.zghlxwxcb.cn/news/detail-837694.html
點(diǎn)擊關(guān)注,第一時(shí)間了解華為云新鮮技術(shù)~文章來源:http://www.zghlxwxcb.cn/news/detail-837694.html
?
到了這里,關(guān)于一鍵Run帶你體驗(yàn)擴(kuò)散模型的魅力的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!