hi,各位大佬,今天嘗試下diffusion大模型,也是CV領(lǐng)域的GPT,但需要prompt,我給了prompt結(jié)果并不咋滴,如下示例,并附代碼及參考link
1、img2img
代碼實(shí)現(xiàn):
import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline
device = "cuda"
model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe = pipe.to(device)
img_path="girl.jpeg"
init_image = Image.open(img_path).convert("RGB")
init_image = init_image.resize(( 512,768))
#init_image.resize((int(init_image.size[0]*0.6),int(init_image.size[1]*0.6) ))
prompt = "A beautiful hair "
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("beauty.png")
原圖及生成的新圖對比如下:侵刪
?woc 網(wǎng)上搜的圖,結(jié)果搞成這樣子,也是服氣了。
2、text2img
代碼如下:
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a beautiful girl with blue eyes and long legs and little dress"
#"three girl,chesty"
image = pipe(prompt).images[0]
image.save("generator.png")
? ???
?眼睛都有問題啊,這生成魔鬼可以,生成正常人有點(diǎn)難。
3、帶有負(fù)向提示詞的depth2img
據(jù)說哈,提示詞越多越好,畫得就越好,不然它就比較“自我”,比較隨意畫了。?
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
)
pipe.to("cuda")
img_path="seg1.jpeg"#仍舊是第一個網(wǎng)圖
init_image = Image.open(img_path)
prompt = "handsome, beautiful, long hair, big eyes, white face"
n_propmt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]
效果不錯,除了手指有問題,這個需要加入負(fù)向提示詞。
?負(fù)向改為如下,生成上面右圖,勉強(qiáng)吧,雖說不上好看,但也沒畸形吧。
n_propmt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry,bad, cartoon, ugly, deformed"
>>> init_image = Image.open(img_path)
>>> init_image=init_image.resize((int(init_image.size[0]*0.6),int(init_image.size[1]*0.6) ))
>>> image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:39<00:00, 1.14s/it]
>>> image.save("seg1_d.png")
因此,對上面的text2img及img2img進(jìn)行增加上述負(fù)向詞再次試驗(yàn),如下:正向詞不變
?
text2img(上面右圖),必須指明五官方面的詞,不能有任何畸形,包括腳,腿,不然太嚇人了。
負(fù)向提示詞改為如下:頭都沒有了。。。
n_propmt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry,bad, cartoon, ugly, deformed,bad face,bad fingers,bad leg,bad shoes, bad feet, bad arm"
上邊右圖相對正常了,但牙齒不太好,負(fù)向詞增加“bad teeth”再次嘗試,下面的圖截斷了。
?這也太差勁了吧??,我勒個去。這要是給客戶看到立馬滾蛋了。
4、高分辨率的Super-Resolution
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch
# load model and scheduler
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(
model_id, revision="fp16", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
# let's download an image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-
init_image = Image.open("seg1.jpeg")
init_image=init_image.resize((int(init_image.size[0]*0.1),int(init_image.size[1]*0.1) ))
prompt = "a white cat"
upscaled_image = pipeline(prompt="a beautiful Chinese girl", image=init_image).images[0]
upscaled_image.save("upsampled_cat.png")
?
?壓縮后再高分辨率的,為啥到我這里都是翻車呢?
?文章來源地址http://www.zghlxwxcb.cn/news/detail-425229.html文章來源:http://www.zghlxwxcb.cn/news/detail-425229.html
?
到了這里,關(guān)于stable-diffusion真的好用嗎?的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!