国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<noscript id="hjo5i"><cite id="hjo5i"></cite></noscript>

<li id="hjo5i"><small id="hjo5i"></small></li>

<pre id="hjo5i"><dfn id="hjo5i"></dfn></pre>

<mark id="hjo5i"><pre id="hjo5i"></pre></mark>

<mark id="hjo5i"></mark>

AIGC：文生圖模型Stable Diffusion

2年前作者：智慧醫(yī)療探索者分類：Toy博客閱讀(20)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了AIGC：文生圖模型Stable Diffusion。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

1 Stable Diffusion介紹

Stable Diffusion 是由CompVis、Stability AI和LAION共同開(kāi)發(fā)的一個(gè)文本轉(zhuǎn)圖像模型，它通過(guò)LAION-5B子集大量的 512x512 圖文模型進(jìn)行訓(xùn)練，我們只要簡(jiǎn)單的輸入一段文本，Stable Diffusion 就可以迅速將其轉(zhuǎn)換為圖像，同樣我們也可以置入圖片或視頻，配合文本對(duì)其進(jìn)行處理。

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

Stable Diffusion的發(fā)布是AI圖像生成發(fā)展過(guò)程中的一個(gè)里程碑，相當(dāng)于給大眾提供了一個(gè)可用的高性能模型，不僅生成的圖像質(zhì)量非常高，運(yùn)行速度快，并且有資源和內(nèi)存的要求也較低。一張生成圖片展示如下：

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

Stable Diffusion Demo：demo

1.1?Stable Diffusion的組成

Stable Diffusion不是一個(gè)整體模型，它由幾個(gè)組件和模型組成。

文本理解組件：text-understanding component ，將文本信息轉(zhuǎn)換成數(shù)字表示，以捕捉文本中的想法。
圖像生成器：image generator，圖像生成器包括兩步，圖像信息創(chuàng)建者（ Image information creator）和圖像解碼器（Image Decoder）。

圖像信息創(chuàng)建者這一組件運(yùn)行多步以生成對(duì)象，這是stable diffusion接口和庫(kù)中的步長(zhǎng)參數(shù)，通常默認(rèn)為50或者100。圖像信息創(chuàng)建者完全在圖像信息空間（隱藏空間）中工作，此特性比在像素空間中工作的擴(kuò)散模型更快。

圖像解碼器根據(jù)從圖像信息創(chuàng)建者哪里獲得信息繪制圖片，它僅僅在生成最終圖像的結(jié)束階段運(yùn)行一次。

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

?上圖是stable diffusion的一個(gè)流程圖，包含了上述描述的三個(gè)組件，每個(gè)組件都有相應(yīng)的神經(jīng)網(wǎng)絡(luò)。

文本理解組件：Clip Text為文本編碼器。以77 token為輸入，輸出為77 token 嵌入向量，每個(gè)向量有768維度
圖像信息創(chuàng)建者：UNet+Scheduler，在潛在空間中逐步處理擴(kuò)散信息。以文本嵌入向量和由噪聲組成的起始多維數(shù)組為輸入，輸出處理的信息數(shù)組。
圖像解碼器：**自動(dòng)編碼解碼器，使用處理后的信息數(shù)組繪制最終的圖像。以處理后的維度為 4 × 64 × 64 4 \times 64 \times 64 4×64×64的信息數(shù)組為輸入，輸出尺寸為 3 × 512 × 512 3 \times 512 \times 512 3×512×512的圖像。

1.2 什么是Diffusion

上述我們描述過(guò)“圖像信息創(chuàng)建者”組件的功能，它以文本嵌入向量和由噪聲組成的起始多維輸入為輸出，輸出圖像解碼器用于繪制最終圖像的信息陣列。擴(kuò)散是發(fā)生在下圖粉紅色“圖像信息創(chuàng)建者”組件內(nèi)部的過(guò)程。

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

?

?擴(kuò)散這個(gè)過(guò)程是循序漸進(jìn)的，每一步都會(huì)添加更多相關(guān)信息。擴(kuò)散發(fā)生在多個(gè)步驟，每一步作用于一個(gè)輸入latents array，生成另一個(gè)latents array，該數(shù)組能夠更好類比輸入文本和模型從訓(xùn)練模型中的所有圖像中獲取的所有視覺(jué)信息。下圖將每一步生成的latents array作為圖像解碼器的輸入，可視化了每一步中添加了什么信息。下圖的diffusion迭代了50次，隨著迭代步數(shù)的增加，latents array解碼的圖像越來(lái)越清晰。
AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

1.3 Diffusion是如何工作的?

擴(kuò)散模型生成圖像的主要思路基于業(yè)內(nèi)已有強(qiáng)大的計(jì)算機(jī)視覺(jué)模型這一基礎(chǔ)上。只要數(shù)據(jù)集夠大，模型就可以學(xué)習(xí)到更復(fù)雜的邏輯。

假設(shè)有一張照片，有一些隨機(jī)生成的噪聲，然后隨機(jī)選擇一個(gè)噪聲添加到此圖像上，這樣構(gòu)成一條訓(xùn)練樣本。用相同的方式可以生成大量的訓(xùn)練樣本組成訓(xùn)練集，然后使用這份訓(xùn)練數(shù)據(jù)集，訓(xùn)練噪聲預(yù)測(cè)器（UNet）。訓(xùn)練結(jié)束后將會(huì)得到一個(gè)高性能的噪聲預(yù)測(cè)器，在特定配置下運(yùn)行時(shí)創(chuàng)建圖像。
AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

1.4 去噪聲繪制圖像

基于上述描述構(gòu)建的噪聲訓(xùn)練集訓(xùn)練得到一個(gè)噪聲預(yù)測(cè)器，噪聲預(yù)測(cè)器可以產(chǎn)生一個(gè)噪聲圖像，如果我們從圖像中減去此生成的噪聲圖像，那么就能夠得到與模型訓(xùn)練樣本盡可能接近的圖像，這個(gè)接近是指分布上的接近，比如天空通常是藍(lán)色的，人類有兩個(gè)眼等。生成圖像的風(fēng)格傾向于訓(xùn)練樣本存在的風(fēng)格。
AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

?

1.5 將文本信息添加到圖像生成器中

上述描述的擴(kuò)散生成圖像并不包括任何文本圖像，但是圖像生成器的輸入包括文本嵌入向量和由噪聲組成的起始多維數(shù)組，所以調(diào)整噪聲生成器來(lái)適配文本。這樣基于大量訓(xùn)練數(shù)據(jù)訓(xùn)練后既可以得到圖像生成器。基于選擇的文本編碼器加上訓(xùn)練后的圖像生成器，就構(gòu)成了整個(gè)stable diffusion模型?？梢越o定一些描述性的語(yǔ)句，整個(gè)stable diffusion模型就能夠生成相應(yīng)的畫(huà)作。

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

?

2 運(yùn)行環(huán)境構(gòu)建

2.1 conda環(huán)境安裝

conda環(huán)境準(zhǔn)備詳見(jiàn)：annoconda

2.2 運(yùn)行環(huán)境準(zhǔn)備

git clone https://github.com/CompVis/stable-diffusion.git

cd stable-diffusion

conda env create -f environment.yaml

conda activate ldm

pip install diffusers==0.12.1

2.3 模型下載

（1）下載模型文件“sd-v1-4.ckpt”

模型地址：模型

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

完成后執(zhí)行如下命令

mkdir -p models/ldm/stable-diffusion-v1/

mv sd-v1-4.ckpt model.ckpt

mv model.ckpt models/ldm/stable-diffusion-v1/

（2）下載checkpoint_liberty_with_aug.pth模型

模型地址：模型

下載完成后，模型放到cache文件夾下

mv checkpoint_liberty_with_aug.pth ~/.cache/torch/hub/checkpoints/

（3）下載clip-vit-large-patch14模型

模型地址：模型

需要下載的模型文件如下：

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

?創(chuàng)建模型的存儲(chǔ)目錄

mkdir -p openai/clip-vit-large-patch14

下載完成后，把下載的文件移動(dòng)到上面的目錄下。

（4）下載safety_checker模型

模型地址：模型

需要下載模型文件如下：

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

創(chuàng)建模型文件的存儲(chǔ)目錄

mkdir -p CompVis/stable-diffusion-safety-checker

下載完成后，把下載的文件移動(dòng)到上面的目錄下

將（3）中的preprocessor_config.json移動(dòng)當(dāng)前模型目錄下：

mv openai/clip-vit-large-patch14/preprocessor_config.json CompVis/stable-diffusion-safety-checker/

3 運(yùn)行效果展示

3.1 運(yùn)行文生圖

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

運(yùn)行效果展示

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

txt2img.py參數(shù)

usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
                  [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
                  [--seed SEED] [--precision {full,autocast}]

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision

3.2 運(yùn)行圖片轉(zhuǎn)換

執(zhí)行命令如下：

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img assets/stable-samples/img2img/mountains-1.png --strength 0.8

AIGC：文生圖模型Stable Diffusion,AIGC,stable diffusion,文生圖,AIGC

4 問(wèn)題解決

4.1 SAFE_WEIGHTS_NAME問(wèn)題解決

運(yùn)行txt2img，出現(xiàn)如下錯(cuò)誤：

(ldm) [root@localhost stable-diffusion]# python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
Traceback (most recent call last):
  File "scripts/txt2img.py", line 22, in <module>
    from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/diffusers/__init__.py", line 29, in <module>
    from .pipelines import OnnxRuntimeModel
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/__init__.py", line 19, in <module>
    from .dance_diffusion import DanceDiffusionPipeline
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/dance_diffusion/__init__.py", line 1, in <module>
    from .pipeline_dance_diffusion import DanceDiffusionPipeline
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/dance_diffusion/pipeline_dance_diffusion.py", line 21, in <module>
    from ..pipeline_utils import AudioPipelineOutput, DiffusionPipeline
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 67, in <module>
    from transformers.utils import SAFE_WEIGHTS_NAME as TRANSFORMERS_SAFE_WEIGHTS_NAME
ImportError: cannot import name 'SAFE_WEIGHTS_NAME' from 'transformers.utils' (/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/utils/__init__.py)

通過(guò)變更組件diffusers版本解決，命令如下：

pip install diffusers==0.12.1

4.2 不能連接到huggingface.co的解決辦法

 python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
Traceback (most recent call last):
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/feature_extraction_utils.py", line 403, in get_feature_extractor_dict
    resolved_feature_extractor_file = cached_path(
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
    output_path = get_from_cache(
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 545, in get_from_cache
    raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/txt2img.py", line 28, in <module>
    safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id)
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/models/auto/feature_extraction_auto.py", line 270, in from_pretrained
    config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/feature_extraction_utils.py", line 436, in get_feature_extractor_dict
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like CompVis/stable-diffusion-safety-checker is not the path to a directory containing a preprocessor_config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

解決方法：

將模型下載到本地，過(guò)程詳見(jiàn)2.3描述文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-551954.html

到了這里，關(guān)于AIGC：文生圖模型Stable Diffusion的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來(lái)自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

文生圖模型之Stable Diffusion
參考文章地址利用文本生成圖片，進(jìn)一步衍生到inpainting功能 autoencoder CLIP text encoder tokenizer最大長(zhǎng)度為77（CLIP訓(xùn)練時(shí)所采用的設(shè)置），當(dāng)輸入text的tokens數(shù)量超過(guò)77后，將進(jìn)行截?cái)?，如果不足則進(jìn)行paddings，這樣將保證無(wú)論輸入任何長(zhǎng)度的文本（甚至是空文本）都得到77x768大小
2024年02月11日
瀏覽(34)
本地開(kāi)啟stable diffusion web-ui體驗(yàn)AIGC文生圖，圖生圖
目錄準(zhǔn)備工作主機(jī)電腦配置檢查安裝以下軟件 Python Git 下載stable-diffusion-webui倉(cāng)庫(kù) 根據(jù)顯卡屬性安裝CUDA 2.下載stable diffusion的訓(xùn)練模型啟動(dòng) 問(wèn)題處理模型加載問(wèn)題這是啟動(dòng)后界面以下是運(yùn)行時(shí)的系統(tǒng)狀態(tài)截圖需要16G內(nèi)存，8G顯存（網(wǎng)上說(shuō)是6G就夠，不過(guò)跑出來(lái)圖片像素會(huì)
2024年02月14日
瀏覽(21)
【學(xué)習(xí)筆記】文生圖模型——Stable diffusion3.0
2.0原理才看到VAE，sd3.0就發(fā)布了，雖然還沒(méi)看到源碼和詳解，但原來(lái)的那個(gè)小方向估計(jì)得棄。人已經(jīng)麻了。 1.LDMs模型（stable diffusion≈LDMs+CLIP） ? 2.stable diffusion3.0模型架構(gòu)圖 3.主要改進(jìn)地方 ①前向加噪過(guò)程：引入了新的噪聲采樣器用于改善Rectified Flow訓(xùn)練，該方法優(yōu)化了噪聲
2024年04月11日
瀏覽(29)
最強(qiáng)文生圖跨模態(tài)大模型：Stable Diffusion
Stable diffusion是一種潛在的文本到圖像的擴(kuò)散模型?；谥暗拇罅抗ぷ鳎ㄈ鏒DPM、LDM的提出），并且在Stability AI的算力支持和LAION的海量數(shù)據(jù)支持下，Stable diffusion才得以成功。 Stable diffusion能夠在來(lái)自 LAION- 5B 數(shù)據(jù)庫(kù)子集的512x512圖像上訓(xùn)練潛在擴(kuò)散模型。與谷歌的Imagen類似，這
2024年02月03日
瀏覽(25)
AnimateDiff論文解讀-基于Stable Diffusion文生圖模型生成動(dòng)畫(huà)
論文：《AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning》 github: https://github.com/guoyww/animatediff/ 隨著文生圖模型Stable Diffusion及個(gè)性化finetune方法：DreamBooth、LoRA發(fā)展，人們可以用較低成本生成自己所需的高質(zhì)量圖像，這導(dǎo)致對(duì)于圖像動(dòng)畫(huà)的需求越來(lái)越多
2024年02月14日
瀏覽(25)
diffusers加速文生圖速度；stable-diffusion、PixArt-α模型
參考： https://pytorch.org/blog/accelerating-generative-ai-3/ https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing#scrollTo=jueYhY5YMe22 大概GPU資源8G-16G；另外模型資源下載慢可以在國(guó)內(nèi)鏡像：https://aifasthub.com/ 1、加速代碼能加速到2秒左右
2024年04月23日
瀏覽(26)
AIGC專欄2——Stable Diffusion結(jié)構(gòu)解析-以文本生成圖像（文生圖，txt2img）為例
用了很久的Stable Diffusion，但從來(lái)沒(méi)有好好解析過(guò)它內(nèi)部的結(jié)構(gòu)，寫(xiě)個(gè)博客記錄一下，嘿嘿。 https://github.com/bubbliiiing/stable-diffusion 喜歡的可以點(diǎn)個(gè)star噢。 Stable Diffusion是比較新的一個(gè)擴(kuò)散模型，翻譯過(guò)來(lái)是穩(wěn)定擴(kuò)散，雖然名字叫穩(wěn)定擴(kuò)散，但實(shí)際上換個(gè)seed生成的結(jié)果就完全不
2024年02月10日
瀏覽(27)
【AIGC】Stable Diffusion的模型入門
下載好相關(guān)模型文件后，直接放入Stable Diffusion相關(guān)目錄即可使用，Stable Diffusion 模型就是我們?nèi)粘Ｋf(shuō)的大模型，下載后放入**webuimodelsStable-diffusion**目錄，界面上就會(huì)展示相應(yīng)的模型選項(xiàng)，如下圖所示。作者用夸克網(wǎng)盤分享了「大模型」鏈接：https://pan.quark.cn/s/bd3491e5199
2024年02月20日
瀏覽(22)
【AIGC】Stable Diffusion的模型微調(diào)
為什么要做模型微調(diào) 模型微調(diào)可以在現(xiàn)有模型的基礎(chǔ)上，讓AI懂得如何更精確生成/生成特定的風(fēng)格、概念、角色、姿勢(shì)、對(duì)象。Stable Diffusion 模型的微調(diào)方法通常依賴于您要微調(diào)的具體任務(wù)和數(shù)據(jù)。下面是一個(gè)通用的微調(diào)過(guò)程的概述：準(zhǔn)備數(shù)據(jù)集：準(zhǔn)備用于微調(diào)的數(shù)據(jù)集。
2024年02月19日
瀏覽(22)
Diffusion擴(kuò)散模型學(xué)習(xí)2——Stable Diffusion結(jié)構(gòu)解析-以文本生成圖像（文生圖，txt2img）為例
用了很久的Stable Diffusion，但從來(lái)沒(méi)有好好解析過(guò)它內(nèi)部的結(jié)構(gòu)，寫(xiě)個(gè)博客記錄一下，嘿嘿。 https://github.com/bubbliiiing/stable-diffusion 喜歡的可以點(diǎn)個(gè)star噢。 Stable Diffusion是比較新的一個(gè)擴(kuò)散模型，翻譯過(guò)來(lái)是穩(wěn)定擴(kuò)散，雖然名字叫穩(wěn)定擴(kuò)散，但實(shí)際上換個(gè)seed生成的結(jié)果就完全不
2024年02月15日
瀏覽(21)

<label id="c7roz"></label>

<mark id="c7roz"></mark>

<pre id="c7roz"><em id="c7roz"><kbd id="c7roz"></kbd></em></pre>

<input id="c7roz"><cite id="c7roz"><input id="c7roz"></input></cite></input>