国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE

這篇具有很好參考價值的文章主要介紹了Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方,請大家不吝賜教,您也可以點擊"舉報違法"按鈕提交疑問。

Diffusers

A library that offers an implementation of various diffusion models, including text-to-image models.

提供不同擴散模型的實現的庫,代碼上最簡潔,國內的問題是?huggingface 需要翻墻。

Transformers

A Hugging Face library that provides pre-trained deep learning models for natural language processing tasks.

提供了預訓練深度學習模型,

Accelerate

This library, also from Hugging Face, simplifies the execution of deep learning models on multiple devices, such as multiple CPUs, GPUs, or even TPUs.

加速庫,可以針對不同硬件CPUs, GPUs,TPUs 加快執(zhí)行模型速度

Invisible_watermark

A package that allows embedding invisible watermarks in images. It is not used directly in the code shown, but could be useful for marking generated images.

不可見水印,可以給生成的圖片加水印

Mediapy

A library that allows you to display and manipulate images and videos in a Jupyter notebook.

Pipelines

Pipelines provide a simple way to run state-of-the-art diffusion models in inference. Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler components - all of which are needed to have a functioning end-to-end diffusion system.

列如,?Stable Diffusion?由3個獨立的預訓練模型組成

  • Conditional Unet
  • CLIP text encoder
  • a scheduler component,?scheduler,
  • a?CLIPFeatureExtractor,
  • as well as a?safety checker. All of these components are necessary to run stable diffusion in inference even though they were trained or created independently from each other.

Stable diffusion using Hugging Face

最簡單的調用

from diffusers import StableDiffusionPipeline


pipe = StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4').to('cuda')

# Initialize a prompt
prompt = "a dog wearing hat"

# Pass the prompt in the pipeline
pipe(prompt).images[0]

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

理解核心模塊

上面的文生圖流程就是使用的擴散模型(diffusion models),?Stable diffusion 模型是潛擴散模型(Latent?Diffusion?Model, LDM)。具體概念參考:深入淺出講解Stable Diffusion原理,新手也能看明白 - 知乎

在latent diffusion里有3個重要的部分

  1. A text encoder 文本編碼器, in this case, a?CLIP Text encoder
  2. An autoencoder, in this case, a 變分自編碼器(Variational Auto Encoder)也叫 VAE
  3. A?U-Net

CLIP Text Encoder

概念

CLIP(Contrastive Language–Image Pre-training)?基于對比學習的語言-圖像預訓練,它將文本作為輸入,并將輸出的結果向量存儲在?embedding?屬性中。CLIP 模型可以把圖像和文本,嵌入到相同的潛在特征空間 (latent space)。

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

任何機器模型都無法識別自然語言,需要將自然語言轉換成一堆它能理解的數字,也叫embeddings,這個轉換的過程可以分為2步

1.?Tokenizer?- 將文字(字詞)切割,并使用lookup表來轉換成數字
2.?Token_To_Embedding Encoder?- Converting those numerical sub-words into a representation that contains the representation of that text

代碼

import torch, logging
## disable warnings
logging.disable(logging.WARNING)  
## Import the CLIP artifacts 
from transformers import CLIPTextModel, CLIPTokenizer
## Initiating tokenizer and encoder.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to("cuda")

prompt = ["a dog wearing hat"]
tok =tokenizer(prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt") 
print(tok.input_ids.shape)
tok

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

tokenizer 返回字典里有2個對象
1.?input_ids?-

A tensor of size 1x77 as one prompt was passed and padded to 77 max length.?49406?表示起始?token,?320?是 “a”,?1929?是 dog,?3309?是 wearing,?3801?是 hat,?49407?is the end of text token repeated till the pad length of 77.
2.?attention_mask?-?1?representing an embedded value and?0?representing padding.

for token in list(tok.input_ids[0,:7]): 
    print(f"{token}:{tokenizer.convert_ids_to_tokens(int(token))}")

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

接著看Token_To_Embedding Encoder,它將 input_ids 轉換成?embeddings

emb = text_encoder(tok.input_ids.to("cuda"))[0].half()
print(f"Shape of embedding : {emb.shape}")
emb

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

從中可以看出, 每個1x77 的 tokenized 輸入被轉換成?1x77x768 緯度的 embedding. 由此可見,每個輸入的單詞被轉換成 768-dimensional 空間.

在Stable diffusion pipeline的表現

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

Stable diffusion 使用CLIP trained encoder?轉換輸入的文字,它成為U-net.的一個輸入源。從另外一個方面來說,CLIP使用圖片encoder和文字encoder,生成了在 latent space里相似的embeddings,這種相似更精確的定義是Contrastive objective。

VAE — Variational Auto Encoder變分自編碼器

概念

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

autoencoder 包含2個部分
1.?Encoder?takes an image as input and converts it into a low dimensional latent representation
2.?Decoder?takes the latent representation and converts it back into an image

從圖中可見,Encoder像粉碎機直接將圖粉碎成幾個碎片,decoder 又從碎片整合出原圖

代碼

## To import an image from a URL 
from fastdownload import FastDownload  
## Imaging  library 
from PIL import Image 
from torchvision import transforms as tfms  
## Basic libraries 
import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline  
## Loading a VAE model 
from diffusers import AutoencoderKL 
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to("cuda")
def load_image(p):
   '''     
   Function to load images from a defined path     
   '''    
    return Image.open(p).convert('RGB').resize((512,512))
def pil_to_latents(image):
    '''     
    Function to convert image to latents     
    '''     
    init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0   
    init_image = init_image.to(device="cuda", dtype=torch.float16)
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * 0.18215     
    return init_latent_dist  
def latents_to_pil(latents):     
    '''     
    Function to convert latents to images     
    '''     
    latents = (1 / 0.18215) * latents     
    with torch.no_grad():         
        image = vae.decode(latents).sample     
    
    image = (image / 2 + 0.5).clamp(0, 1)     
    image = image.detach().cpu().permute(0, 2, 3, 1).numpy()      
    images = (image * 255).round().astype("uint8")     
    pil_images = [Image.fromarray(image) for image in images]        
    return pil_images
p = FastDownload().download('https://lafeber.com/pet-birds/wp-content/uploads/2018/06/Scarlet-Macaw-2.jpg')
img = load_image(p)
print(f"Dimension of this image: {np.array(img).shape}")
img

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

開始使用 VAE encoder 壓縮圖片

latent_img = pil_to_latents(img)
print(f"Dimension of this latent representation: {latent_img.shape}")

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

我們可以看到VAE 壓縮一個?3 x 512 x 512 緯度的圖片到 4 x 64 x 64 圖片,壓縮比例有48x,可以看看4通道的latent表現

fig, axs = plt.subplots(1, 4, figsize=(16, 4))
for c in range(4):
    axs[c].imshow(latent_img[0][c].detach().cpu(), cmap='Greys')

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

理論上從這四張圖中能得到原圖的很多信息,接著我們用 decoder來往回解壓縮。

decoded_img = latents_to_pil(latent_img)
decoded_img[0]

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

從中我們可以看出VAE decoder 可以從48x compressed latent representation 還原原圖。

注意2張圖里的眼鏡,其實有細微差別,整個流程不是無損的

在Stable diffusion pipeline 里扮演的角色

沒有 VAE 加入, Stable diffusion 也能完整使用,使用VAE能減少生成高清圖的計算量。?The latent diffusion models can perform diffusion in this?latent space?produced by the VAE encoder and once we have our desired latent outputs produced by the diffusion process, we can convert them back to the high-resolution image by using the VAE decoder. To get a better intuitive understanding of Variation Autoencoders and how they are trained, read?this blog by Irhum Shafkat.

U-Net model

概念

U-Net model 有2個輸入
1.?Noisy latent?or?Noise- Noisy latents are latents produced by a VAE encoder (in case an initial image is provided) with added noise or it can take pure noise input in case we want to create a random new image based solely on a textual description
2.?Text embeddings?- CLIP-based embedding generated by input textual prompts

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

U-Net model 的輸出是可預測的 noise residual which the input noisy latent contains. In other words, it predicts the noise which is subtracted from the noisy latents to return the original de-noised latents.

代碼

from diffusers import UNet2DConditionModel, LMSDiscreteScheduler
## Initializing a scheduler
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
## Setting number of sampling steps
scheduler.set_timesteps(51)
## Initializing the U-Net model
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet", torch_dtype=torch.float16).to("cuda")

代碼里 imported?unet 也加入了?scheduler 。scheduler是用來確認指定diffusion?處理過程中指定步驟加入多少 noise?latent

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

從圖中可以看出,diffusion 處理中,一開始noise比較高或許逐步降低

noise = torch.randn_like(latent_img) # Random noise
fig, axs = plt.subplots(2, 3, figsize=(16, 12))
for c, sampling_step in enumerate(range(0,51,10)):
    encoded_and_noised = scheduler.add_noise(latent_img, noise, timesteps=torch.tensor([scheduler.timesteps[sampling_step]]))
    axs[c//3][c%3].imshow(latents_to_pil(encoded_and_noised)[0])
    axs[c//3][c%3].set_title(f"Step - {sampling_step}")

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

讓我們看看 U-Net 如何從圖片中去除noise。先加入些noise

encoded_and_noised = scheduler.add_noise(latent_img, noise, timesteps=torch.tensor([scheduler.timesteps[40]])) latents_to_pil(encoded_and_noised)[0]

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

跑個 U-Net 并試著去噪

## Unconditional textual prompt
prompt = [""]
## Using clip model to get embeddings
text_input = tokenizer(prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt")
with torch.no_grad(): 
    text_embeddings = text_encoder(
        text_input.input_ids.to("cuda")
    )[0]
    
## Using U-Net to predict noise    
latent_model_input = torch.cat([encoded_and_noised.to("cuda").float()]).half()
with torch.no_grad():
    noise_pred = unet(
        latent_model_input,40,encoder_hidden_states=text_embeddings
    )["sample"]
## Visualize after subtracting noise 
latents_to_pil(encoded_and_noised- noise_pred)[0]

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

如上圖,噪聲已經去掉了不少

扮演的角色

Latent diffusion 在 latent 空間里使用 U-Net 逐步的降噪以達到預期的效果。在每一步中,加入到latents 的 noise 數量將會達到最總的降噪輸出。 U-Net 最早是由??this paper?提出的。U-Net 由encoder 和 decoder ,組合成 ResNet blocks。The stable diffusion U-Net also has cross-attention layers to provide them with the ability to condition(影響) the output based on the 輸入的文字。?The Cross-attention layers are added to both the encoder and the decoder part of the U-Net usually between ResNet blocks. You can learn more about this U-Net architecture?here.

組合

我們將試著將 CLIP text encoder, VAE, and U-Net 三者一起組合,看看如何走通文生圖流程

回顧The Diffusion Process

stable diffusion mode 需要文字輸入和seed。文字輸入通過CLIP轉換成 77*768 的數組,seed用來生成高斯噪音(4x64x64),它將會成為第一個latent image representation.

Note — You will notice that there is an additional dimension mentioned (1x) in the image like 1x77x768 for text embedding, that is because it represents the batch size of 1.

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

Next, the U-Net iteratively denoises(降噪) the random latent image representations while conditioning(訓練) on the text embeddings. The output of the U-Net is predicted(預測) noise residual(剩余), which is then used to compute conditioned(影響) latents via a scheduler algorithm. This process of denoising and text conditioning is repeated N times (We will use 50) to retrieve a better latent image representation.

Once this process is complete, the latent image representation (4x64x64) is decoded by the VAE decoder to retrieve the final output image (3x512x512).

Note — This iterative denoising is an important step for getting a good output image. Typical steps are in the range of 30–80. However, there are?recent papers?that claim to reduce it to 4–5 steps by using distillation techniques.

代碼

import torch, logging
## disable warnings
logging.disable(logging.WARNING)  
## Imaging  library
from PIL import Image
from torchvision import transforms as tfms
## Basic libraries
import numpy as np
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import display
import shutil
import os
## For video display
from IPython.display import HTML
from base64 import b64encode

## Import the CLIP artifacts 
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler
## Initiating tokenizer and encoder.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to("cuda")
## Initiating the VAE
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to("cuda")
## Initializing a scheduler and Setting number of sampling steps
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
scheduler.set_timesteps(50)
## Initializing the U-Net model
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet", torch_dtype=torch.float16).to("cuda")
## Helper functions
def load_image(p):
    '''
    Function to load images from a defined path
    '''
    return Image.open(p).convert('RGB').resize((512,512))
def pil_to_latents(image):
    '''
    Function to convert image to latents
    '''
    init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0
    init_image = init_image.to(device="cuda", dtype=torch.float16) 
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * 0.18215
    return init_latent_dist
def latents_to_pil(latents):
    '''
    Function to convert latents to images
    '''
    latents = (1 / 0.18215) * latents
    with torch.no_grad():
        image = vae.decode(latents).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
    images = (image * 255).round().astype("uint8")
    pil_images = [Image.fromarray(image) for image in images]
    return pil_images
def text_enc(prompts, maxlen=None):
    '''
    A function to take a texual promt and convert it into embeddings
    '''
    if maxlen is None: maxlen = tokenizer.model_max_length
    inp = tokenizer(prompts, padding="max_length", max_length=maxlen, truncation=True, return_tensors="pt") 
    return text_encoder(inp.input_ids.to("cuda"))[0].half()

后續(xù)代碼是StableDiffusionPipeline.from_pretrained? 簡化版本,主要展示過程

def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=False):
    """
    Diffusion process to convert prompt to image
    """
    
    # Defining batch size
    bs = len(prompts) 
    
    # Converting textual prompts to embedding
    text = text_enc(prompts) 
    
    # Adding an unconditional prompt , helps in the generation process
    uncond =  text_enc([""] * bs, text.shape[1])
    emb = torch.cat([uncond, text])
    
    # Setting the seed
    if seed: torch.manual_seed(seed)
    
    # Initiating random noise
    latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))
    
    # Setting number of steps in scheduler
    scheduler.set_timesteps(steps)
    
    # Adding noise to the latents 
    latents = latents.to("cuda").half() * scheduler.init_noise_sigma
    
    # Iterating through defined steps
    for i,ts in enumerate(tqdm(scheduler.timesteps)):
        # We need to scale the i/p latents to match the variance
        inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        
        # Predicting noise residual using U-Net
        with torch.no_grad(): u,t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)
            
        # Performing Guidance
        pred = u + g*(t-u)
        
        # Conditioning  the latents
        latents = scheduler.step(pred, ts, latents).prev_sample
        
        # Saving intermediate images
        if save_int: 
            if not os.path.exists(f'./steps'):
                os.mkdir(f'./steps')
            latents_to_pil(latents)[0].save(f'steps/{i:04}.jpeg')
            
    # Returning the latent representation to output an image of 3x512x512
    return latents_to_pil(latents)

最終使用

images = prompt_2_img(["A dog wearing a hat", "a photograph of an astronaut riding a horse"], save_int=False)
for img in images:display(img)

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=False):

參數解釋
1.?prompt?- 文字,文生圖
2.?g?or?guidance scale?- It’s a value that determines how close the image should be to the textual prompt. This is related to a technique called?Classifier free guidance?which improves the quality of the images generated. The higher the value of the guidance scale, more close it will be to the textual prompt
3.?seed?- This sets the seed from which the initial Gaussian noisy latents are generated
4.?steps?- Number of de-noising steps taken for generating the final latents.
5.?dim?- dimension of the image, for simplicity we are currently generating square images, so only one value is needed
6.?save_int?- This is optional, a boolean flag, if we want to save intermediate latent images, helps in visualization.

也可以參考的webui里的界面

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

可視化整個過程?

Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE,人工智能

參考

https://towardsdatascience.com/stable-diffusion-using-hugging-face-501d8dbdd8

https://huggingface.co/blog/stable_diffusion文章來源地址http://www.zghlxwxcb.cn/news/detail-726592.html

到了這里,關于Hugging Face使用Stable diffusion Diffusers Transformers Accelerate Pipelines VAE的文章就介紹完了。如果您還想了解更多內容,請在右上角搜索TOY模板網以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網!

本文來自互聯(lián)網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。如若轉載,請注明出處: 如若內容造成侵權/違法違規(guī)/事實不符,請點擊違法舉報進行投訴反饋,一經查實,立即刪除!

領支付寶紅包贊助服務器費用

相關文章

  • Hugging Face Transformers 萌新完全指南

    歡迎閱讀《Hugging Face Transformers 萌新完全指南》,本指南面向那些意欲了解有關如何使用開源 ML 的基本知識的人群。我們的目標是揭開 Hugging Face Transformers 的神秘面紗及其工作原理,這么做不是為了把讀者變成機器學習從業(yè)者,而是讓為了讓讀者更好地理解 transformers 從而能

    2024年04月22日
    瀏覽(22)
  • hugging face開源的transformers模型可快速搭建圖片分類任務

    hugging face開源的transformers模型可快速搭建圖片分類任務

    2017年,谷歌團隊在論文「Attention Is All You Need」提出了創(chuàng)新模型,其應用于NLP領域架構Transformer模型。從模型發(fā)布至今,transformer模型風靡微軟、谷歌、Meta等大型科技公司。且目前有模型大一統(tǒng)的趨勢,現在transformer 模型不僅風靡整個NLP領域,且隨著VIT SWIN等變體模型,成功把

    2024年02月06日
    瀏覽(24)
  • [算法前沿]--028-基于Hugging Face -Transformers的預訓練模型微調

    [算法前沿]--028-基于Hugging Face -Transformers的預訓練模型微調

    本章節(jié)將使用 Hugging Face 生態(tài)系統(tǒng)中的庫——?? Transformers來進行自然語言處理工作(NLP)。 Transformers的歷史 以下是 Transformer 模型(簡短)歷史中的一些參考點: Transformer 架構于 2017 年 6 月推出。原始研究的重點是翻譯任務。隨后推出了幾個有影響力的模型,包括: 2018 年 6

    2024年02月11日
    瀏覽(73)
  • Hugging Face快速入門(重點講解模型(Transformers)和數據集部分(Datasets))

    Hugging Face快速入門(重點講解模型(Transformers)和數據集部分(Datasets))

    本文主要包括如下內容: Hugging Face是什么,提供了哪些內容 Hugging Face模型的使用(Transformer類庫) Hugging Face數據集的使用(Datasets類庫) Hugging Face Hub和 Github 類似,都是Hub(社區(qū))。Hugging Face可以說的上是機器學習界的Github。Hugging Face為用戶提供了以下主要功能: 模型倉庫(

    2024年01月21日
    瀏覽(19)
  • Hugging Face 的 Transformers 庫快速入門 (一)開箱即用的 pipelines

    Hugging Face 的 Transformers 庫快速入門 (一)開箱即用的 pipelines

    注:本系列教程僅供學習使用, 由原作者授權, 均轉載自小昇的博客 。 Transformers 是由 Hugging Face 開發(fā)的一個 NLP 包,支持加載目前絕大部分的預訓練模型。隨著 BERT、GPT 等大規(guī)模語言模型的興起,越來越多的公司和研究者采用 Transformers 庫來構建 NLP 應用,因此熟悉 Transformer

    2023年04月27日
    瀏覽(20)
  • 【擴散模型】12、Stable Diffusion | 使用 Diffusers 庫來看看 Stable Diffusion 的結構

    【擴散模型】12、Stable Diffusion | 使用 Diffusers 庫來看看 Stable Diffusion 的結構

    參考:HuggingFace 參考:https://jalammar.github.io/illustrated-stable-diffusion/ Stable Diffusion 這個模型架構是由 Stability AI 公司推于2022年8月由 CompVis、Stability AI 和 LAION 的研究人員在 Latent Diffusion Model 的基礎上創(chuàng)建并推出的。 其原型是(Latent Diffusion Model),一般的擴散模型都需要直接在像

    2024年01月18日
    瀏覽(31)
  • 【擴散模型】11、Stable Diffusion | 使用 Diffusers 庫來看看 Stable Diffusion 的結構

    【擴散模型】11、Stable Diffusion | 使用 Diffusers 庫來看看 Stable Diffusion 的結構

    參考:HuggingFace 參考:https://jalammar.github.io/illustrated-stable-diffusion/ Stable Diffusion 這個模型架構是由 Stability AI 公司推于2022年8月由 CompVis、Stability AI 和 LAION 的研究人員在 Latent Diffusion Model 的基礎上創(chuàng)建并推出的。 其原型是(Latent Diffusion Model),一般的擴散模型都需要直接在像

    2024年01月16日
    瀏覽(37)
  • 使用 Docker 和 Diffusers 快速上手 Stable Video Diffusion 圖生視頻大模型

    使用 Docker 和 Diffusers 快速上手 Stable Video Diffusion 圖生視頻大模型

    本篇文章聊聊,如何快速上手 Stable Video Diffusion (SVD) 圖生視頻大模型。 月底計劃在機器之心的“AI技術論壇”做關于使用開源模型 “Stable Diffusion 模型” 做有趣視頻的實戰(zhàn)分享。 因為會議分享時間有限,和之前一樣,比較簡單的部分,就用博客文章的形式來做補充分享吧。

    2024年01月24日
    瀏覽(106)
  • diffusers庫中stable Diffusion模塊的解析

    diffusers中,stable Diffusion v1.5主要由以下幾個部分組成 下面給出具體的結構說明。 “text_encoder block” “vae block” “unet block” “feature extractor block” “tokenizer block” “safety_checker block” “scheduler block”

    2024年02月03日
    瀏覽(20)
  • Stable Diffusion XL on diffusers

    Stable Diffusion XL on diffusers

    翻譯自:https://huggingface.co/docs/diffusers/using-diffusers/sdxl v0.24.0 非逐字翻譯 Stable Diffusion XL (SDXL) 是一個強大的圖像生成模型,其在上一代 Stable Diffusion 的基礎上主要做了如下優(yōu)化: 參數量增加:SDXL 中 Unet 的參數量比前一代大了 3 倍,并且 SDXL 還引入了第二個 text-encoder(OpenCL

    2024年03月14日
    瀏覽(26)

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請作者喝杯咖啡吧~博客贊助

支付寶掃一掃領取紅包,優(yōu)惠每天領

二維碼1

領取紅包

二維碼2

領紅包