のんくり日和: 画像生成AI「Stable Diffusion」の作例を呪文（prompt）とともに紹介します

　最近オープンソース化して話題の画像生成AI「Stable Diffusion」を使ってみたので、呪文（prompt）とともに載せてみます。

　Stable Diffusionというのは、簡単に言うと、文章を入力すると画像を生成してくれるAIです。
　オープンソース化されたので自分のパソコンにインストールすることもできますが、ハードウェア環境を整えなくちゃいけないので（GPUがないと生成にかなり時間がかかる）、とりあえず開発元のWebサービスDream Studioで試してみたら、これがとても面白い。
　どうにか自分のPCで使えないかと調べていたらGoogle Colabで使えるようなので、試してみました。Google Colabというのは、Googleが機械学習の教育及び研究用に提供しているGoogle のクラウドサーバーでコードを実行する開発環境で、無料でも使えます（課金すると使えるGPUとかメモリ量とかがパワーアップ）。あんまり使いすぎると使用制限がかかりますが。

　Stable Diffusionの導入手順は【イラストAI】コピペ可！Stable Diffusionの簡単な使い方【コードつき】 | おっさんゲーマーどっとねっとおっさんゲーマーどっとねっとを参考にしました。Google Colabの使用方法とPythonの基礎の基礎を知っていれば15分もあれば可能かと。
　なお、使用する学習済みモデルは上の参考サイトから変更しています。

import torch 
from diffusers import StableDiffusionPipeline 

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)

　参考サイトだと下のようになっているのですが、私は上のようにしています。GPUのメモリが10GB未満のときは下のように半精度（float16）のモデルの使用が推奨されているのですが、Google Colabの場合は通常のものが使えるので、上で大丈夫でした。

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True)

　参考にオプション類の指定は次のような感じ。公式サービスのDream StudioではSmaplerも指定できるのですが、Google Colabでのやり方は調べきれていません。

from torch import autocast
from IPython.display import display

prompt = "a cute cat in a hat"

# heightとwidthで縦横サイズ（8の倍数で）を指定。デフォルトは縦横とも512。あまり大きくするとメモリが足りず実行できない。
# 512を下回ると、画像の品質が低下する場合がある。
# 512を超えると、画像領域が繰り返される（画像全体の首尾一貫性がとれなくなる）。
# 正方形でない画像を作成するには、縦横のいずれかに512を使用し、もう一方にそれより大きな値を使用するのがおすすめ。それでも腕が増えたりすることが頻発するが。
# guidance_scale：　promptに記載したテキストへの忠実性。デフォルトは7.5。一般に7から8.5ぐらいがよいらしい。
# num_inference_steps： 一般にステップ数が多くなると結果は良くなるが生成に要する時間がよりかかる。デフォルトは50。

with autocast("cuda"):
  image = pipe(prompt, height=768, width=512)["sample"][0]
  display(image)
  image = pipe(prompt, height=768, width=512, guidance_scale=7, num_inference_steps=50)["sample"][0]
  display(image)
  image = pipe(prompt, height=768, width=512, guidance_scale=7.5, num_inference_steps=100)["sample"][0]
  display(image)

　ということで、以下は呪文付きの作例です。もちろん、出来が比較的良いものだけ。
　全般的に雰囲気はいいけれどよく見るとツッコミどころがいろいろあり、というような画像が多数です。
　ちなみに、呪文（prompt）作成の際はDeepL翻訳を使うと便利ですよ。

oil painting, highly detailed, an old wizard reading a book in study, dramatic lighting

old wizard のキーワードは映画版のダンブルドアとガンダルフの要素が入った顔になりがちな気がする。

oil painting, highly detailed, a girl watching the sunset in the meadow, dramatic lighting

アメリカンな感じなものは学習元のデータが多いのか違和感のない出来のものが多い。

oil painting, highly detailed, children in the corn field, dramatic lighting

これも上と同様にアメリカンな感じ。

beautiful concept art of, highly detailed, children in the corn field, dramatic lighting

「oil paintg」を「concept art」というキーワードに変えると映画っぽい雰囲気になる。

beautiful concept art of, highly detailed, Asian girl standing on a street corner in a big city, dramatic lighting

コンセプトアートぽい雰囲気。

an old wizard in study, Ukiyoe style

浮世絵風のダンブルドア校長の生成を狙ったのだが、中国風の老人しか生成できず。もしかすると中国の仙人は英語だとold wizardと訳されるのか？

beautiful concept art, highly detailed, a witch with cat's face, moonlit night, dramatic lighting

ネコ顔の魔女の生成を狙ったがなかなか狙い通りに行かず、何度か繰り返して３つほど成功。特に指定はしていないのだがなぜかお供の猫も一緒に生成される。

illustration, highly detailed, a witch with cat's face, moonlit night, dramatic lighting

「concept art」を「illustration」に変えてみると、モノクロ調が増えてくる。

color illustration, highly detailed, a witch with cat's face, moonlit night, dramatic lighting

カラフルにできないかと「color illustration」にしてみたら、なぜか今ひとつな絵が増えた。その中で一番ましなものを。

Ukiyoe style, highly detailed, a witch with cat's face, moonlit night, dramatic lighting

ネコ顔魔女の浮世絵風は何度繰り返してもうまくいかず。一番良かったのがこれ。

Hetauma style, highly detailed, a witch with cat's face, moonlit night, dramatic lighting

ヘタウマ風を狙ったら、なぜかアーティスティックなものが生成された。ヘタウマはやっぱりヘタでなくアートなのだろう。

beautiful concept art, highly detailed, an old lady crying at a rainy forest, dramatic lighting

雨の森の中で泣いている老婆、というものを生成しようと思ったら、木に埋まって泣いている老婆もついでに生成された……

beautiful concept art, highly detailed, a cat in a deerstalker hat, dramatic lighting

鹿撃ち帽（シャーロック・ホームズの帽子として有名ですね。原作には出てこないらしいですが）を被ったネコを生成したかったのですが、何度やっても鹿撃ち帽にはひとつもならず。

beautiful concept art, highly detailed, A beautiful silver-haired girl standing alone on a moonlit plain, dramatic lighting

なんとなく格好いい構図。

beautiful concept art, highly detailed, A beautiful silver-haired girl standing alone on a moonlit plain, with a cat, dramatic lighting

これぐらいゆるい感じの方が誤魔化しがききやすいですね。左手とか変ですけど。

beautiful concept art, highly detailed, a silver-haired old lady with a cat, moonlit night, dramatic lighting

縦512、横768で横長の画像を生成。上の方にも書きましたが、一辺が512を超えると、画像領域が繰り返される（画像全体の首尾一貫性がとれなくなる）ため、失敗が増えます。手が増えたり、顔が増えたり、人物が重なったり。なので成功すると嬉しい。

beautiful concept art, highly detailed, a witch reading a book in study, dramatic lighting

こちらは縦768、横512の縦長。

oil painting, highly detailed, King of cats, dramatic lighting

beautiful concept art, highly detailed, an old wizard sitting in study, with a cat, dramatic lighting

ネコと一緒に書斎でくつろぐ老魔術師、を想定していたのだが、なぜかネコと一緒に書斎でくつろぐネコ顔の老魔術師が生成される。下の画像は左手が多いように見えますが、たぶん足なのですよ。ネコだし。あるいは二人羽織。

のんくり日和

ページ

2022年8月27日土曜日

画像生成AI「Stable Diffusion」の作例を呪文（prompt）とともに紹介します

0 件のコメント:

Profile

フォロワー

最近の人気記事

このブログを検索

ラベル

ブログアーカイブ