国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

2年前作者：畢設(shè)小程序軟件程序猿分類(lèi)：Toy博客閱讀(18)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

本篇文章聊聊如何通過(guò) Docker 和八十行左右的 Python 代碼，實(shí)現(xiàn)一款類(lèi)似 Midjourney 官方圖片解析功能 Describe 的 Prompt 工具。

讓你在玩 Midjourney、Stable Diffusion 這類(lèi)模型時(shí)，不再為生成 Prompt 描述撓頭。

寫(xiě)在前面

本文將提供兩個(gè)版本的工具，分別支持 CPU 和 GPU 推理使用，如果你有一張大于 8GB 顯存的顯卡，可以愉快的使用全部的功能，如果你只有 CPU，那么也可以使用 CPU 版本的應(yīng)用來(lái)進(jìn)行偷懶。

本篇文章的代碼已上傳至 GitHub soulteary/docker-prompt-generator[1]，歡迎自取。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

本篇文章的開(kāi)源代碼

昨晚在玩 Midjourney 的時(shí)候，在想 Prompt 的時(shí)候，想到撓頭。作為一個(gè)懶人，計(jì)上心頭：能不能讓模型幫我生成 Prompt 呢，輸入一些關(guān)鍵詞或者句子，然后讓程序幫助我完成完整的 Prompt 內(nèi)容。（俗話(huà)：文生文）于是我開(kāi)了個(gè)坑，創(chuàng)建了上面的這個(gè)開(kāi)源項(xiàng)目，在簡(jiǎn)單驗(yàn)證可行性之后，就去補(bǔ)覺(jué)了。

一覺(jué)起來(lái)，看到有著相同興趣愛(ài)好的同事轉(zhuǎn)發(fā)了一篇文章：Midjourney 發(fā)布了新功能，“describe”，支持解析圖片為幾段不同的 Prompt 文本，并支持繼續(xù)進(jìn)行圖片生成。（俗話(huà)：圖生文，然后文生圖）

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

Midjourney 官方的“圖生文”功能：describe

這個(gè)功能相比昨晚折騰的小東西，顯然更能體現(xiàn)先進(jìn)的生產(chǎn)效率嘛（作為懶人體驗(yàn)非常好）。

可惜網(wǎng)上掃了一圈，發(fā)現(xiàn)官方功能并不開(kāi)源，那么，我來(lái)實(shí)現(xiàn)一個(gè)吧。

“作圖咒語(yǔ)生成器” 的使用

為了更快的上手和使用到這個(gè)工具，我們需要先完成環(huán)境的配置。

應(yīng)用和 Docker 環(huán)境準(zhǔn)備

在過(guò)去的幾篇文章[2]里，我提到過(guò)了我個(gè)人習(xí)慣和推薦的開(kāi)發(fā)環(huán)境，基于 Docker 和 Nvidia 官方基礎(chǔ)容器的深度學(xué)習(xí)環(huán)境，所以就不再贅述相關(guān)知識(shí)點(diǎn)，感興趣可以自行翻閱，比如這篇《基于 Docker 的深度學(xué)習(xí)環(huán)境：入門(mén)篇》[3]。相信老讀者應(yīng)該已經(jīng)很熟悉啦。

當(dāng)然，因?yàn)楸疚陌?CPU 也能玩的部分，你也可以參考幾個(gè)月前的《在搭載 M1 及 M2 芯片 MacBook設(shè)備上玩 Stable Diffusion 模型》[4]，來(lái)配置你的環(huán)境。

在準(zhǔn)備好 Docker 環(huán)境的配置之后，我們就可以繼續(xù)玩啦。

我們隨便找一個(gè)合適的目錄，使用 git clone 或者下載 Zip 壓縮包的方式，先把“Docker Prompt Generator(Docker 作圖咒語(yǔ)生成器)”項(xiàng)目的代碼下載到本地。

git clone https://github.com/soulteary/docker-prompt-generator.git
# or
curl -sL -o docker-prompt-generator.zip https://github.com/soulteary/docker-prompt-generator/archive/refs/heads/main.zip

接著，進(jìn)入項(xiàng)目目錄，使用 Nvidia 原廠的 PyTorch Docker 基礎(chǔ)鏡像來(lái)完成基礎(chǔ)環(huán)境的構(gòu)建，相比于我們直接從 DockerHub 拉制作好的鏡像，自行構(gòu)建將能節(jié)約大量時(shí)間。

我們?cè)陧?xiàng)目目錄中執(zhí)行下面的命令，就能夠完成應(yīng)用模型應(yīng)用的構(gòu)建啦：

# 構(gòu)建基礎(chǔ)鏡像
docker build -t soulteary/prompt-generator:base . -f docker/Dockerfile.base

# 構(gòu)建 CPU 應(yīng)用
docker build -t soulteary/prompt-generator:cpu . -f docker/Dockerfile.cpu

# 構(gòu)建 GPU 應(yīng)用
docker build -t soulteary/prompt-generator:gpu . -f docker/Dockerfile.gpu

然后，根據(jù)你的硬件環(huán)境，選擇性執(zhí)行下面的命令，就能夠啟動(dòng)一個(gè)帶有 Web UI 界面的模型應(yīng)用啦。

# 運(yùn)行 CPU 鏡像
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7860:7860 soulteary/prompt-generator:cpu

# 運(yùn)行 GPU 鏡像
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7860:7860 soulteary/prompt-generator:gpu

我們?cè)跒g覽器中輸入運(yùn)行容器的宿主機(jī)的 IP 地址，就能夠開(kāi)始使用工具啦。

使用工具

工具的使用，非常簡(jiǎn)單，分別有使用“圖片生成描述”和使用“文本生成描述”兩種。

我找了一張之前模型生成的圖片，然后將這張圖片喂給這個(gè)程序，點(diǎn)擊按鈕，就能獲得圖片的描述文本啦。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

將圖片解析為描述文本

我們可以在 Midjourney 或者 Stable Diffusion 中，直接使用這段文本來(lái)繼續(xù)生成圖片，或者使用“從文本中生成”，來(lái)擴(kuò)展內(nèi)容，讓內(nèi)容更適合 Midjourney 這類(lèi)應(yīng)用。

為了體現(xiàn)工具的中文翻譯和續(xù)寫(xiě)能力，我們單獨(dú)寫(xiě)一段簡(jiǎn)單的中文描述：“一只小鳥(niǎo)立梢頭，一輪明月當(dāng)空照，一片黃葉鋪枝頭”。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

使用中文生成圖片生成“咒語(yǔ)”（描述）

可以看到，基于我們的輸入內(nèi)容，生成了非常多不同的文本。

想要驗(yàn)證文本內(nèi)容是否符合原意，我們可以將內(nèi)容粘貼到 Midjourney 中進(jìn)行測(cè)試。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

使用上面兩段文本來(lái)生成圖片

因?yàn)槟Ｐ痛嬖陔S機(jī)性，如果想要得到更好的結(jié)果，還需要對(duì)描述進(jìn)行更多的調(diào)整優(yōu)化，不過(guò)，看起來(lái)工具解析圖片，生成的描述，其實(shí)是能夠做到開(kāi)箱即用的，而根據(jù)我們的三言?xún)烧Z(yǔ)生成的文本，也生成出了符合要求的圖片。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

這次試驗(yàn)中相對(duì)好的結(jié)果

好啦，工具的基礎(chǔ)使用，我們介紹完啦。

模型應(yīng)用功能實(shí)現(xiàn)

下面是工具的實(shí)現(xiàn)流程和思考，如果你想學(xué)習(xí)或快速使用開(kāi)源模型項(xiàng)目來(lái)構(gòu)建你的 AI 容器應(yīng)用，可以繼續(xù)瀏覽。

應(yīng)用功能設(shè)計(jì)

在“動(dòng)手”前，我們需要先明確功能設(shè)計(jì)，以及考慮使用什么樣的技術(shù)來(lái)做具體功能的技術(shù)支撐。

在我日常使用 Stable Diffusion、Midjourney 的過(guò)程中，時(shí)常有三個(gè)場(chǎng)景撓頭：

我只有一些關(guān)鍵詞，需要發(fā)揮想象力把關(guān)鍵詞串起來(lái)，然后喂給模型應(yīng)用。如果描述內(nèi)容不夠好，或者關(guān)鍵詞之間的關(guān)聯(lián)比較遠(yuǎn)，那么圖片的生成效果就不會(huì)特別好。
我有一張圖片，想讓模型圍繞圖片中的內(nèi)容，比如：構(gòu)圖、某些元素、情感等進(jìn)行二次創(chuàng)作，而不是簡(jiǎn)單的做圖片中的元素替換。
我更習(xí)慣使用中文做描述，而不是英文，但是目前模型生成圖片，想要好的效果，需要使用英文，總是借助翻譯工具，切換程序界面或者網(wǎng)頁(yè)，還是挺麻煩的。

解決第一個(gè)問(wèn)題，我們可以使用最近火爆出圈的 GPT-4 的前輩的前輩：GPT-2 其實(shí)就能夠滿(mǎn)足需求，將內(nèi)容（一句話(huà)、幾個(gè)關(guān)鍵詞）進(jìn)行快速續(xù)寫(xiě)。相比較使用 GPT-3 / GPT-4，無(wú)需聯(lián)網(wǎng)，也無(wú)需付費(fèi)，模型文件更是“便宜大碗”，用 CPU 就能跑起來(lái)。

解決第二個(gè)問(wèn)題，我們可以使用 OpenAI 在一年前推出的 CLIP 神經(jīng)網(wǎng)絡(luò)模型[5]，以及 Salesforce 推出的 BLIP [6]，能夠從圖片中抽取出最合適的描述文本，讓我們用在新的 AIGC 圖片生成任務(wù)中。稍作優(yōu)化調(diào)整，我們只需要大概使用 6～8GB 顯存就能將這部分功能的模型跑起來(lái)。

解決第三個(gè)問(wèn)題，我們可以使用赫爾辛基大學(xué)開(kāi)源的 OPUS MT 模型[7]，實(shí)現(xiàn)將中文翻譯為英文，進(jìn)一步偷懶，以及解決上面兩類(lèi)原始模型不支持中文輸入的問(wèn)題。

因?yàn)榍皟蓚€(gè)場(chǎng)景問(wèn)題中的模型不支持中文，而我又是一個(gè)懶人，不想輸入英文來(lái)玩圖，所以我們先來(lái)解決第三個(gè)問(wèn)題，讓整個(gè)應(yīng)用實(shí)現(xiàn)流程更絲滑。

中文 Prompt 翻譯為英文 Prompt 功能

想要實(shí)現(xiàn)第一個(gè)懶人功能，從用戶(hù)輸入的中文內(nèi)容中，自動(dòng)生成英文，我們需要使用中英雙語(yǔ)的翻譯模型。赫爾辛基大學(xué)的開(kāi)源組織將預(yù)訓(xùn)練模型開(kāi)放在了 HuggingFace 社區(qū)，Helsinki-NLP/opus-mt-zh-en[8]。

我們可以通過(guò)寫(xiě)十五行簡(jiǎn)單的 Python 代碼，來(lái)完成模型文件的下載，以及實(shí)現(xiàn)將中文自動(dòng)轉(zhuǎn)換為合適的英文內(nèi)容的功能。比如下面的例子中，程序運(yùn)行完畢，將輸出《火影忍者》中的金句“青春不能回頭，所以青春沒(méi)有終點(diǎn)”的譯文。

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en").eval()
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")

def translate(text):
    with torch.no_grad():
        encoded = tokenizer([text], return_tensors="pt")
        sequences = model.generate(**encoded)
        return tokenizer.batch_decode(sequences, skip_special_tokens=True)[0]

input = "青春不能回頭，所以青春沒(méi)有終點(diǎn)。 ——《火影忍者》"
print(input, translate(input))

將上面的代碼保存為 translate.py，然后執(zhí)行 python translate.py，等待模型下載完畢，我們將得到類(lèi)似下面的結(jié)果：

青春不能回頭，所以青春沒(méi)有終點(diǎn)。 Youth can't turn back, so there's no end to youth.

是不是看起來(lái)還不錯(cuò)？這部分代碼保存在了項(xiàng)目中的 soulteary/docker-prompt-generator/app/translate.py[9]。

接下來(lái)，我們來(lái)實(shí)現(xiàn) Prompt “免費(fèi)續(xù)杯”（有邏輯續(xù)寫(xiě)）功能。

實(shí)現(xiàn) MidJourney Prompt 續(xù)寫(xiě)功能

基于一些內(nèi)容，進(jìn)行繼續(xù)的內(nèi)容生成，是生成類(lèi)模型的看家本領(lǐng)，比如大家已經(jīng)熟悉的不能再熟悉的 ChatGPT 背后的 GPT 模型系列。

作為一個(gè)懶人，我在網(wǎng)上尋覓了一番，找到了一個(gè) Google 離職創(chuàng)業(yè)的“國(guó)外大姐” 基于 GPT-2 使用 25 萬(wàn)條 MidJourney 數(shù)據(jù) fine-tune 好的 GPT2 模型：succinctly/text2image-prompt-generator[10]，簡(jiǎn)單試了試了試效果還不錯(cuò)，那么我們就用它來(lái)實(shí)現(xiàn)這部分功能吧。（其實(shí)，用前幾篇文章里的 LLaMA 也行，可以自行替換。）

和上面一樣，我們實(shí)現(xiàn)一個(gè)不到 30 行的簡(jiǎn)單的程序，就能夠?qū)崿F(xiàn)模型自動(dòng)下載，以及調(diào)用模型根據(jù)我們的輸入內(nèi)容（上文中熱血臺(tái)詞的翻譯）生成一些符合 Midjourney 或 Stable Diffusion 的新的 Prompt 內(nèi)容：

from transformers import pipeline, set_seed
import random
import re

text_pipe = pipeline('text-generation', model='succinctly/text2image-prompt-generator')

def text_generate(input):
    seed = random.randint(100, 1000000)
    set_seed(seed)

    for count in range(6):    
        sequences = text_pipe(input, max_length=random.randint(60, 90), num_return_sequences=8)
        list = []
        for sequence in sequences:
            line = sequence['generated_text'].strip()
            if line != input and len(line) > (len(input) + 4) and line.endswith((":", "-", "—")) is False:
                list.append(line)

        result = "\n".join(list)
        result = re.sub('[^ ]+\.[^ ]+','', result)
        result = result.replace("<", "").replace(">", "")
        if result != "":
            return result
        if count == 5:
            return result

input = "Youth can't turn back, so there's no end to youth."
print(input, text_generate(input))

我們將上面的代碼保存為 text-generation.py，然后執(zhí)行 python text-generation.py，稍等片刻我們將得到類(lèi)似下面的內(nèi)容：

# Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Youth can't turn back, so there's no end to youth. Youth can't turn back, so there's no end to youth. Young, handsome, confident, lonely boy sitting on his  can't turn back, so there's no end to youth. Where old yang waits, young man on the streets of Bangkok::10 film poster::10 photorealism, postprocessing, low angle::10 Trending on artstation::8 —ar 47:82
Youth can't turn back, so there's no end to youth. By Karel Thole and Mike Mignola --ar 2:3
Youth can't turn back, so there's no end to youth. And there is a bright hope about a future where there will be time.

內(nèi)容看起來(lái)好像還不錯(cuò)，我們直接在 Midjourney 中輸入測(cè)試，將得到類(lèi)似下面的結(jié)果。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

將我們生成的 Prompt 內(nèi)容，使用 Midjourney 進(jìn)行圖片生成

看起來(lái)算是及格了，這部分代碼保存在項(xiàng)目的 soulteary/docker-prompt-generator/app/text-generation.py[11]中，有需要可以自取。

完成了兩個(gè)功能之后，我們來(lái)實(shí)現(xiàn)根據(jù)圖片內(nèi)容生成 Prompt 描述的應(yīng)用功能。

實(shí)現(xiàn)根據(jù)圖片生成 Prompt 描述功能

相比較上面兩個(gè)功能，使用 CPU 就能搞定，內(nèi)容生成效率也非常高。

但是想要快速的根據(jù)圖片生成 Prompt 則需要顯卡的支持。不過(guò)根據(jù)我的試驗(yàn)，運(yùn)行起來(lái)只需要 6～8GB 左右的顯存，還是比較省錢(qián)的。（沒(méi)有顯卡可以使用云服務(wù)器代替，買(mǎi)個(gè)按量的，玩罷銷(xiāo)毀即可。）

這里，我們依舊是實(shí)現(xiàn)一段簡(jiǎn)單的，不到 30 行的 Python 代碼，完成模型下載、應(yīng)用加載、圖片下載，以及將圖片轉(zhuǎn)換為 Prompt 的功能：

from clip_interrogator import Config, Interrogator
import torch
config = Config()
config.device = 'cuda' if torch.cuda.is_available() else 'cpu'
config.blip_offload = False if torch.cuda.is_available() else True
config.chunk_size = 2048
config.flavor_intermediate_count = 512
config.blip_num_beams = 64
config.clip_model_name = "ViT-H-14/laion2b_s32b_b79k"
ci = Interrogator(config)

def get_prompt_from_image(image):
    return ci.interrogate(image.convert('RGB'))

import requests
import shutil
r = requests.get("https://pic1.zhimg.com/v2-6e056c49362bff9af1eb39ce530ac0c6_1440w.jpg?source=d16d100b", stream=True)
if r.status_code == 200:
    with open('./image.jpg', 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f) 

from PIL import Image
print(get_prompt_from_image(Image.open('./image.jpg')))

代碼中的圖片，使用了我專(zhuān)欄中上一篇文章的題圖（同樣使用 Midjourney 生成）。將上面的內(nèi)容保存為 clip.py，然后執(zhí)行 python clip.py，稍等片刻，我們將得到類(lèi)似下面的結(jié)果：

# WARNING:root:Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test it
Loading BLIP model...
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
Loading CLIP model...
Loaded CLIP model and data in 8.29 seconds.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 316.23it/s]
Flavor chain:  38%|███████████████████████████████████████████████████████▏                                                                                           | 12/32 [00:04<00:07,  2.74it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 441.49it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 346.74it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 457.84it/s]

a robot with a speech bubble on a blue background, highly detailed hyper real retro, artificial intelligence!!, toy photography, by Emma Andijewska, markings on robot, computer generated, blueish, delete, small gadget, animated, blue body, in retro colors

從結(jié)果中看，描述還是比較準(zhǔn)確的。這部分代碼我保存在了項(xiàng)目的 soulteary/docker-midjourney-prompt-generator/app/clip.py[12]。

好啦，到目前為止，三個(gè)主要功能，我們就都實(shí)現(xiàn)完畢了。接下來(lái)，我們借助 Docker 和 Gradio 來(lái)完成 Web UI 和一鍵運(yùn)行的模型容器應(yīng)用。

使用 Docker 構(gòu)建 AI 應(yīng)用容器

接下來(lái)，我們來(lái)完成 AI 應(yīng)用的容器構(gòu)建和相關(guān)代碼編寫(xiě)。

前文中提到，我們將實(shí)現(xiàn)兩個(gè)版本的應(yīng)用，分別支持 CPU 和 GPU 來(lái)完成快速的 AI 模型推理功能。因?yàn)楹笳呖梢韵蛳录嫒萸罢撸晕覀兿葋?lái)實(shí)現(xiàn)一個(gè)包含前兩個(gè)應(yīng)用功能，CPU 就能跑的模型基礎(chǔ)鏡像。

完成只需要 CPU 運(yùn)行的應(yīng)用容器鏡像

結(jié)合上文中的代碼，Dockerfile 文件不難編寫(xiě)：

FROM nvcr.io/nvidia/pytorch:22.12-py3
LABEL org.opencontainers.image.authors="soulteary@gmail.com"

RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \
    pip install transformers sentencepiece sacremoses && \
    pip install gradio

WORKDIR /app

RUN cat > /get-models.py <<EOF
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-zh-en')
AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-zh-en')
pipeline('text-generation', model='succinctly/text2image-prompt-generator')
EOF

RUN python /get-models.py && \
    rm -rf /get-models.py

將上面的內(nèi)容保存為 Dockerfile.base，然后使用 docker build -t soulteary/prompt-generator:base . -f Dockerfile.base ，稍等片刻，包含了模型文件的基礎(chǔ)應(yīng)用模型就搞定啦。

[+] Building 189.5s (7/8)                                                                                                                                                                             
 => [internal] load .dockerignore                                                                                                                                                                0.0s
 => => transferring context: 2B                                                                                                                                                                  0.0s
 => [internal] load build definition from Dockerfile.base                                                                                                                                        0.0s
 => => transferring dockerfile: 692B                                                                                                                                                             0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:22.12-py3                                                                                                                                0.0s
 => [1/5] FROM nvcr.io/nvidia/pytorch:22.12-py3                                                                                                                                                  0.0s
 => CACHED [2/5] RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple &&     pip install transformers sentencepiece sacremoses &&     pip install gradio                 0.0s
 => CACHED [3/5] WORKDIR /app                                                                                                                                                                    0.0s
 => CACHED [4/5] RUN cat > /get-models.py <<EOF                                                                                                                                                  0.0s
 => [5/5] RUN python /get-models.py &&     rm -rf /get-models.py                                                                                                                               189.4s
 => => # Downloading (…)olve/main/source.spm: 100%|██████████| 805k/805k [00:06<00:00, 130kB/s]                                                                                                      
 => => # Downloading (…)olve/main/target.spm: 100%|██████████| 807k/807k [00:01<00:00, 440kB/s]                                                                                                      
 => => # Downloading (…)olve/main/vocab.json: 100%|██████████| 1.62M/1.62M [00:01<00:00, 1.21MB/s]                                                                                                   
 => => # Downloading (…)lve/main/config.json: 100%|██████████| 907/907 [00:00<00:00, 499kB/s]                                                                                                        
 => => # Downloading pytorch_model.bin: 100%|██████████| 665M/665M [00:11<00:00, 57.2MB/s]                                                                                                           
 => => # Downloading (…)okenizer_config.json: 100%|██████████| 255/255 [00:00<00:00, 81.9kB/s]

實(shí)現(xiàn)過(guò)程中，我這邊的構(gòu)建時(shí)間大概要 5 分鐘左右，可以從椅子上起來(lái)，動(dòng)一動(dòng)，聽(tīng)首歌放松一會(huì)。

鏡像構(gòu)建完畢，可以使用下面的命令，進(jìn)入包含模型和 PyTorch 環(huán)境的 Docker 鏡像。在這個(gè)鏡像中，我們可以自由的使用前兩個(gè)功能相關(guān)的模型：

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7680:7680 soulteary/prompt-generator:base bash

有了環(huán)境之后，我們來(lái)繼續(xù)實(shí)現(xiàn)一個(gè)簡(jiǎn)單的 Web UI，實(shí)現(xiàn)上文中的懶人功能：讓模型根據(jù)我們輸入的中文內(nèi)容，生成可以繪制高質(zhì)量圖片的 Prompt：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-zh-en').eval()
tokenizer = AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-zh-en')

def translate(text):
    with torch.no_grad():
        encoded = tokenizer([text], return_tensors='pt')
        sequences = model.generate(**encoded)
        return tokenizer.batch_decode(sequences, skip_special_tokens=True)[0]

from transformers import pipeline, set_seed
import random
import re

text_pipe = pipeline('text-generation', model='succinctly/text2image-prompt-generator')

def text_generate(input):
    seed = random.randint(100, 1000000)
    set_seed(seed)
    text_in_english = translate(input)
    for count in range(6):    
        sequences = text_pipe(text_in_english, max_length=random.randint(60, 90), num_return_sequences=8)
        list = []
        for sequence in sequences:
            line = sequence['generated_text'].strip()
            if line != text_in_english and len(line) > (len(text_in_english) + 4) and line.endswith((':', '-', '—')) is False:
                list.append(line)

        result = "\n".join(list)
        result = re.sub('[^ ]+\.[^ ]+','', result)
        result = result.replace('<', '').replace('>', '')
        if result != '':
            return result
        if count == 5:
            return result

import gradio as gr

with gr.Blocks() as block:
    with gr.Column():
        with gr.Tab('文本生成'):
            input = gr.Textbox(lines=6, label='你的想法', placeholder='在此輸入內(nèi)容...')
            output = gr.Textbox(lines=6, label='生成的 Prompt')
            submit_btn = gr.Button('快給我編')

    submit_btn.click(
        fn=text_generate,
        inputs=input,
        outputs=output
    )

block.queue(max_size=64).launch(show_api=False, enable_queue=True, debug=True, share=False, server_name='0.0.0.0')

在容器環(huán)境中創(chuàng)建一個(gè)名為 webui.cpu.py 的文件，然后使用 python webui.cpu.py，將看到類(lèi)似下面的日志輸出：

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

然后我們?cè)跒g覽器中打開(kāi)容器所在設(shè)備的 IP （如果在本機(jī)運(yùn)行，可以訪問(wèn) http://127.0.0.1:7860 ，就能訪問(wèn) Web 服務(wù)啦。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

隨便輸入點(diǎn)什么，它都能給你繼續(xù)往下編

我們?cè)谏厦娴妮斎肟蚶镙斎胍恍﹥?nèi)容，然后點(diǎn)擊“快給我編”按鈕，就能夠得到一堆模型編出來(lái)的 Prompt 內(nèi)容啦。

實(shí)現(xiàn)完“文生文”功能之后，我們來(lái)實(shí)現(xiàn)“圖生文”相關(guān)功能。

完成需要 GPU 運(yùn)行的應(yīng)用容器鏡像

結(jié)合上文，完成 GPU 相關(guān)功能需要的容器環(huán)境也不難：

FROM soulteary/prompt-generator:base
LABEL org.opencontainers.image.authors="soulteary@gmail.com"

RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \
    pip install clip_interrogator git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip

RUN cat > /get-models.py <<EOF
from clip_interrogator import Config, Interrogator
import torch
config = Config()
config.device = 'cuda' if torch.cuda.is_available() else 'cpu'
config.blip_offload = False if torch.cuda.is_available() else True
config.chunk_size = 2048
config.flavor_intermediate_count = 512
config.blip_num_beams = 64
config.clip_model_name = "ViT-H-14/laion2b_s32b_b79k"
ci = Interrogator(config)
EOF

RUN python /get-models.py && \
    rm -rf /get-models.py

將上面的內(nèi)容保存為 Dockerfile.gpu 文件，然后使用 docker build -t soulteary/prompt-generator:gpu . -f Dockerfile.gpu 完成鏡像的構(gòu)建。

耐心等待鏡像構(gòu)建完畢，使用下面的命令，能夠進(jìn)入包含三種模型和 PyTorch 環(huán)境的 Docker 鏡像：

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7680:7680 soulteary/prompt-generator:gpu bash

接著，來(lái)編寫(xiě)能夠調(diào)用三種模型能力的 Python 程序：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-zh-en').eval()
tokenizer = AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-zh-en')

def translate(text):
    with torch.no_grad():
        encoded = tokenizer([text], return_tensors='pt')
        sequences = model.generate(**encoded)
        return tokenizer.batch_decode(sequences, skip_special_tokens=True)[0]

from transformers import pipeline, set_seed
import random
import re

text_pipe = pipeline('text-generation', model='succinctly/text2image-prompt-generator')

def text_generate(input):
    seed = random.randint(100, 1000000)
    set_seed(seed)
    text_in_english = translate(input)
    for count in range(6):    
        sequences = text_pipe(text_in_english, max_length=random.randint(60, 90), num_return_sequences=8)
        list = []
        for sequence in sequences:
            line = sequence['generated_text'].strip()
            if line != text_in_english and len(line) > (len(text_in_english) + 4) and line.endswith((':', '-', '—')) is False:
                list.append(line)

        result = "\n".join(list)
        result = re.sub('[^ ]+\.[^ ]+','', result)
        result = result.replace('<', '').replace('>', '')
        if result != '':
            return result
        if count == 5:
            return result

from clip_interrogator import Config, Interrogator
import torch
import gradio as gr

config = Config()
config.device = 'cuda' if torch.cuda.is_available() else 'cpu'
config.blip_offload = False if torch.cuda.is_available() else True
config.chunk_size = 2048
config.flavor_intermediate_count = 512
config.blip_num_beams = 64
config.clip_model_name = "ViT-H-14/laion2b_s32b_b79k"

ci = Interrogator(config)

def get_prompt_from_image(image, mode):
    image = image.convert('RGB')
    if mode == 'best':
        prompt = ci.interrogate(image)
    elif mode == 'classic':
        prompt = ci.interrogate_classic(image)
    elif mode == 'fast':
        prompt = ci.interrogate_fast(image)
    elif mode == 'negative':
        prompt = ci.interrogate_negative(image)
    return prompt

with gr.Blocks() as block:
    with gr.Column():
        gr.HTML('<h1>MidJourney / SD2 懶人工具</h1>')
        with gr.Tab('從圖片中生成'):
            with gr.Row():
                input_image = gr.Image(type='pil')
                with gr.Column():
                    input_mode = gr.Radio(['best', 'fast', 'classic', 'negative'], value='best', label='Mode')
            img_btn = gr.Button('這圖里有啥')
            output_image = gr.Textbox(lines=6, label='生成的 Prompt')

        with gr.Tab('從文本中生成'):
            input_text = gr.Textbox(lines=6, label='你的想法', placeholder='在此輸入內(nèi)容...')
            output_text = gr.Textbox(lines=6, label='生成的 Prompt')
            text_btn = gr.Button('快給我編')

    img_btn.click(fn=get_prompt_from_image, inputs=[input_image, input_mode], outputs=output_image)
    text_btn.click(fn=text_generate, inputs=input_text, outputs=output_text)

block.queue(max_size=64).launch(show_api=False, enable_queue=True, debug=True, share=False, server_name='0.0.0.0')

我們將上面的程序保存為 webui.gpu.py，然后使用 python webui.gpu.py 運(yùn)行程序，將得到類(lèi)似下面的日志：

██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.0/44.0 [00:00<00:00, 31.5kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 786k/786k [00:01<00:00, 772kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 788k/788k [00:00<00:00, 863kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.54M/1.54M [00:01<00:00, 1.29MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 907/907 [00:00<00:00, 618kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 634M/634M [00:27<00:00, 23.8MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 172kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 779k/779k [00:01<00:00, 757kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 556kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.01M/2.01M [00:01<00:00, 1.60MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99.0/99.0 [00:00<00:00, 69.2kB/s]
I0405 12:50:42.798199 140240289830720 instantiator.py:21] Created a temporary directory at /tmp/tmpuvpi8s9q
I0405 12:50:42.798363 140240289830720 instantiator.py:76] Writing /tmp/tmpuvpi8s9q/_remote_module_non_scriptable.py
W0405 12:50:42.878760 140240289830720 version.py:27] Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test it
I0405 12:50:43.373221 140240289830720 font_manager.py:1633] generated new fontManager
Loading BLIP model...
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
Loading CLIP model...
I0405 12:51:00.455630 140240289830720 factory.py:158] Loaded ViT-H-14 model config.
I0405 12:51:06.642275 140240289830720 factory.py:206] Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
Loaded CLIP model and data in 8.22 seconds.
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

當(dāng)看到 Running on local URL: http://0.0.0.0:7860 的日志的時(shí)候，我們就可以在瀏覽器中訪問(wèn)程序啦。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

將上文中的圖喂給它

將上文中的圖片投喂給它，然后點(diǎn)下“這圖里有啥”按鈕，稍等片刻，我們將得到一些比較合理的 Prompts 內(nèi)容，你可以用這些內(nèi)容去生成圖片。

自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具

喂它文本，擴(kuò)寫(xiě)內(nèi)容

當(dāng)然，你也可以將生成的文本內(nèi)容再投喂給它，來(lái)獲得更多的 Prompt 內(nèi)容，讓圖片的變化更豐富一些。

其他：顯存資源消耗

在模型識(shí)別圖片的過(guò)程中，我簡(jiǎn)單記錄了應(yīng)用的顯存消耗，峰值大概在 8GB 左右。

Wed Apr  5 21:00:09 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
| 31%   35C    P8    23W / 450W |   8111MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1286      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1504      G   /usr/bin/gnome-shell               10MiB |
|    0   N/A  N/A    115252      C   python                           8086MiB |
+-----------------------------------------------------------------------------+

最后

好了，這篇文章就先聊到這里啦。

引用鏈接

[1] soulteary/docker-prompt-generator: https://github.com/soulteary/docker-prompt-generator
[2] 幾篇文章: https://soulteary.com/tags/python.html
[3] 《基于 Docker 的深度學(xué)習(xí)環(huán)境：入門(mén)篇》: https://soulteary.com/2023/03/22/docker-based-deep-learning-environment-getting-started.html
[4] 《在搭載 M1 及 M2 芯片 MacBook設(shè)備上玩 Stable Diffusion 模型》: https://soulteary.com/2022/12/10/play-the-stable-diffusion-model-on-macbook-devices-with-m1-and-m2-chips.html
[5] CLIP 神經(jīng)網(wǎng)絡(luò)模型: https://openai.com/research/clip
[6] Salesforce 推出的 BLIP : https://blog.salesforceairesearch.com/blip-bootstrapping-language-image-pretraining/
[7] 赫爾辛基大學(xué)開(kāi)源的 OPUS MT 模型: https://github.com/Helsinki-NLP/OPUS-MT-train
[8] Helsinki-NLP/opus-mt-zh-en: https://huggingface.co/Helsinki-NLP/opus-mt-zh-en
[9] soulteary/docker-prompt-generator/app/translate.py: https://github.com/soulteary/docker-prompt-generator/blob/main/app/translate.py
[10] succinctly/text2image-prompt-generator: https://huggingface.co/succinctly/text2image-prompt-generator
[11] soulteary/docker-prompt-generator/app/text-generation.py: https://github.com/soulteary/docker-prompt-generator/blob/main/app/text-generation.py
[12] soulteary/docker-midjourney-prompt-generator/app/clip.py: https://github.com/soulteary/docker-midjourney-prompt-generator/blob/main/app/clip.py
[13] 關(guān)于“交友”的一些建議和看法: 致新朋友：為生活投票，不斷尋找更好的朋友 - 知乎
[14] 關(guān)于折騰群入群的那些事: 關(guān)于折騰群入群的那些事 - 知乎
?

如果你覺(jué)得內(nèi)容還算實(shí)用，歡迎點(diǎn)贊分享給你的朋友，在此謝過(guò)。

轉(zhuǎn)載：https://zhuanlan.zhihu.com/p/619702740文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-490977.html

到了這里，關(guān)于自制開(kāi)源的 Midjourney、Stable Diffusion “咒語(yǔ)”作圖工具的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來(lái)自互聯(lián)網(wǎng)用戶(hù)投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

AI繪畫(huà)Stable Diffusion實(shí)戰(zhàn)操作： 62個(gè)咒語(yǔ)調(diào)教-時(shí)尚雜志封面
今天來(lái)給大家分享，如何用sd簡(jiǎn)單的咒語(yǔ)輸出好看的圖片的教程，今天做的是時(shí)尚雜志專(zhuān)題，話(huà)不多說(shuō)直入主題。還不會(huì)StableDiffusion的基本操作，推薦看看這篇保姆級(jí)教程： AI繪畫(huà)：Stable Diffusion 終極煉丹寶典：從入門(mén)到精通飛書(shū)原文鏈接（帶圖片）：AI繪畫(huà)Stable Diffusion實(shí)
2024年02月16日
瀏覽(25)
【百科】《DALL·E 2 vs Midjourney vs Stable Diffusion｜AI圖像工具對(duì)比》- 知識(shí)點(diǎn)目錄
Midjourney DALL·E 2 StableDiffusion 需要收費(fèi) 如果出現(xiàn) “區(qū)域限制” 的情況，請(qǐng)嘗試清除緩存后并使用全局代理訪問(wèn)； Stability-AI/stablediffusion DreamStudio 示例prompt: 中文提示英文提示提問(wèn)備忘英文提示
2024年02月12日
瀏覽(29)
【必看】AI繪畫(huà)/stable diffusion：超簡(jiǎn)單高效的畫(huà)畫(huà)技巧附咒語(yǔ)/提示詞
用AI畫(huà)一幅符合自己預(yù)期的好畫(huà)，模型、提示詞和方法都很重要。本文是我通過(guò)數(shù)十個(gè)小時(shí)不斷探索總結(jié)出來(lái)的AI繪畫(huà)經(jīng)驗(yàn)，相信你看后一定有所收獲！先看看我用AI畫(huà)出來(lái)的小姐姐(●\\\'?\\\'●) ?（其實(shí)是隨便畫(huà)的）（好像把自己的xp暴露在外了）（本人確實(shí)沒(méi)什么審美）（哇靠
2024年02月16日
瀏覽(28)
AI 繪畫(huà)咒語(yǔ)入門(mén) - Stable Diffusion Prompt 語(yǔ)法指南【成為初級(jí)魔導(dǎo)士吧！】
要用好 Stable Diffusion，最最重要的就是掌握 Prompt（提示詞）。由于提示詞對(duì)于生成圖的影響甚大，所以被稱(chēng)為魔法，用得好驚天動(dòng)地，用不好魂飛魄散 ??。因此本篇整理下提示詞的語(yǔ)法（魔法咒語(yǔ)）、如何使用（如何吟唱）、以及一些需要注意的細(xì)節(jié)問(wèn)題（避免翻車(chē)）。
2024年02月08日
瀏覽(30)
24｜Stable Diffusion：最熱門(mén)的開(kāi)源AI畫(huà)圖工具
上一講，我們一起體驗(yàn)了 CLIP 這個(gè)多模態(tài)的模型。在這個(gè)模型里，我們已經(jīng)能夠把一段文本和對(duì)應(yīng)的圖片關(guān)聯(lián)起來(lái)了?？吹轿谋竞蛨D片的關(guān)聯(lián)，想必你也能聯(lián)想到過(guò)去半年非常火熱的“文生圖”（Text-To-Image）的應(yīng)用浪潮了。相比于在大語(yǔ)言模型里 OpenAI 的一枝獨(dú)秀。文生圖領(lǐng)
2024年02月20日
瀏覽(17)
AI繪圖開(kāi)源工具Stable Diffusion WebUI前端API對(duì)接
本文主要介紹 AI 繪圖開(kāi)源工具 Stable Diffusion WebUI 的 API 開(kāi)啟和基本調(diào)用方法，通過(guò)本文的閱讀，你將了解到 stable-diffusion-webui 的基本介紹、安裝及 API 環(huán)境配置；文生圖、圖生圖、局部重繪、后期處理等 API 接口調(diào)用；圖像處理開(kāi)發(fā)中常用到一些方法如 Base64 、 PNG 、 Canvas 及
2024年02月10日
瀏覽(29)
最新版本 Stable Diffusion 開(kāi)源 AI 繪畫(huà)工具之 VAE 篇
VAE：是 Variational Auto-Encoder 的簡(jiǎn)稱(chēng)，也就是變分自動(dòng)編碼器可以把它理解成給圖片加濾鏡，現(xiàn)在的很多大模型里面已經(jīng)嵌入了 VAE ，所以并需要額外添加 VAE 如果你發(fā)現(xiàn)生成的圖片在色彩，細(xì)節(jié)上有些顏色缺失或者失真，基本就是該模型中沒(méi)有 VAE 導(dǎo)致的，需要手動(dòng)使用 VAE 點(diǎn)
2024年02月13日
瀏覽(53)
最新版本 Stable Diffusion 開(kāi)源 AI 繪畫(huà)工具之 ControlNet 篇
ControlNet 就是控制網(wǎng)，并不需要你多了解它的底層原理，簡(jiǎn)單來(lái)說(shuō)，在 ControlNet 出來(lái)前，你使用 stable diffusion 時(shí)生成圖片，就像開(kāi)盲盒在圖片生成出來(lái)前，你根本不知道圖片內(nèi)容究竟是怎樣的，而 ControlNet 就是對(duì)于出圖更精準(zhǔn)的一種控制隨著 ControlNet 的出現(xiàn)，才真正意義上讓
2024年02月08日
瀏覽(34)
最新版本 Stable Diffusion 開(kāi)源 AI 繪畫(huà)工具之微調(diào)模型篇
當(dāng)你打開(kāi)模型網(wǎng)站C站后，你可以看到右上角篩選里面有很多不同種類(lèi)的模型包括： Checkpoint 、 Textual Inversion 、 Hypernetwork 、 VAE 、 Lora 、 LyCORIS 、 Aesthetic Gradients 等等其中 Checkpoint 是主模型，所以體積會(huì)很大，因?yàn)橐诖竽Ｐ蛥?shù)的訓(xùn)練，所以最開(kāi)始誕生的就是主模型，
2024年02月08日
瀏覽(20)
使用開(kāi)源免費(fèi)AI繪圖工具神器-Stable Diffusion懶人整合包
Stable Diffusion (簡(jiǎn)稱(chēng) SD) 是一款開(kāi)源免費(fèi)的以文生圖的 AI 擴(kuò)散模型，它和付費(fèi)的 Midjourney 被人稱(chēng)為當(dāng)下最好用的 AI 繪畫(huà)工具。你在網(wǎng)上看到的絕大多數(shù)優(yōu)秀 AI 圖片作品，基本都是出自它倆之手。其中 Midjourney 是在線(xiàn)服務(wù) (需綁信用卡付費(fèi))，而 Stable Diffusion 則完全免費(fèi)，可在自
2024年02月09日
瀏覽(23)