Can dall e mini make porn

1 years ago

Comments: 0

Views: 160

Table of Contents Show

I tested OpenAI’s text-to-image generator to see how it would perform creating images using a variety of subjects and styles
Using DALL-E
Quick Walkthrough
Commercial use of DALL-E
Usage Limitations
Societal Impact
Test Results
Abstract Paintings
Various Styles
Varying Aspect Ratios
Acknowledgments

I tested OpenAI’s text-to-image generator to see how it would perform creating images using a variety of subjects and styles

Sample Renderings from DALL-E, Images by Author

I was very excited to get my invitation to OpenAI’s DALL-E [1] text-to-image generation beta program. Since signing up, I have created over 200 images with various text prompts from my earlier AI projects. As you can see from the sample images, the quality and resolution of the images generated by DALL-E are excellent.

I’ll start with some background info on DALL-E and then show the results compared to my early experiments. You can tell which images were created by DALL-E by checking for the small watermark in the lower-right corner of the renderings. (Note that it’s comprised of muted shades of yellow, cyan, green, red, and blue. I guess magenta didn’t make the cut.)

DALL-E Watermark, Image by Author

DALL-E

OpenAI released its first version of the DALL-E speech-to-text system in January 2021 [2]. The images in the paper looked great, even at 256x256 resolution, but it was closed to the public. In April 2022, they released DALL-E 2, which can render images at 1024x1024 resolution. And they opened up access to the system via their beta program. More info is here, and you can join the waitlist for the beta program here.

Using DALL-E

The system has three main functions for creating images:

Create images from a text prompt
Create variations of an image
Edit an image, generating new parts from a prompt

Note that functions 2 and 3 work with either generated or uploaded images.

Quick Walkthrough

I'll start by heading over to labs.openai.com and logging in. Next, I’ll enter the prompt, “An Impressionist oil painting of sunflowers in a magenta vase in a cyan room.” After 20 seconds or so, we see this.

DALL-E Image Creation, Image by Author

OK, these look pretty good. I like the one on the left. Clicking on the image zooms it up and shows more options.

DALL-E Image Selection, Image by Author

Let’s check out some variations. I’ll click on the Variations button to see what we get.

DALL-E Image Variations, Image by Author

Nice! I like the one on the right. Let’s see if we can change the shadow on the table with the Edit feature. Clicking on the image and then Edit brings up this window.

DALL-E Image Editing, Image by Author

I can use the eraser to rub out any part of the image, like the shadow on the table. And I can adjust the size of the eraser using the gadget on the right. Next, I’ll enter the prompt again and hit the Generate button. Here are the results.

OK, the painting on the right seems to have the best shadow. I’ll choose that one and download a full-res version.

DALL-E Rendering of “An Impressionist oil painting of sunflowers in a magenta vase in a cyan room,” Image by Author

Excellent! It’s suitable for printing and framing.

Costs

OpenAI gives you fifty “credits” to start you off when you first create an account. A credit is used each time you hit the Generate or Variations button. And they give you an additional 15 free new credits a month.

You can purchase more credits at US$15 plus tax for 115 credits. This is about 13 or 14 cents per generation operation, depending on the sales tax in your state.

Note that you get four images when creating new images from a prompt and three images using the Variations and Edit functions, and you get to keep all of the images you create. So each image costs about 4 cents.

Commercial use of DALL-E

Unlike when OpenAI’s beta program for DALL-E first started, you can now create images and use them for commercial purposes.

… users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview.
Users have told us that they are planning to use DALL·E images for commercial projects, like illustrations for children’s books, art for newsletters, concept art and characters for games, moodboards for design consulting, and storyboards for movies. — OpenAI

But note that OpenAI’s Terms of Use page says that using DALL-E is subject to compliance with the terms spelled out in their Content Policy, discussed in the next section.

Usage Limitations

Under OpenAI’s Content Policy, when using DALL-E, you agree to the following rules:

Only create images that are G-rated that don’t cause harm to anyone. They list a bunch of prohibited bad stuff, like hate speech, violent images, porn, etc.
You must clearly indicate that images are AI-generated and give attribution to OpenAI.
Respect the rights of others - Don’t upload or create images of public figures; only upload images that you own, etc.

These seem like good, common sense rules intended to curb potential abuse of the system.

Societal Impact

Like other AI models that were trained on large amounts of public data, DALL-E has inherent societal biases. The authors address these concerns in the model card for their system [3] using the official model name, DALL·E 2.

Use of DALL·E 2 has the potential to harm individuals and groups by reinforcing stereotypes, erasing or denigrating them, providing them with disparately low quality performance, or by subjecting them to indignity. These behaviors reflect biases present in DALL·E 2 training data and the way in which the model is trained. While the deeply contextual nature of bias makes it difficult to measure and mitigate the actual downstream harms resulting from use of the DALL·E 2 Preview (i.e. beyond the point of generation), our intent is to provide concrete illustrations here that can inform users and affected non-users even at this very initial preview stage.- Pamela Mishkin, et al., from OpenAI

They show examples of bias in the system, like how “a builder” generates images of older white men wearing hard hats or how “a flight attendant” generates images of Asian women wearing airline uniforms. They discuss other societal issues with the model and mitigation through policies and reporting. Yes, there is a “Report” button in the UI.

DALL-E Report Image Dialog, Image by Author

Test Results

Here are the results from using DALL-E, comparing them to other AI experiments I have written about here on Medium.

Abstract Paintings

The first test compares the results of DALL-E to the output from my MachineRay project. MachineRay is the result of training StyleGAN 2 [3] using abstract paintings in the public domain from WikiArt.org. There’s no text prompt. The system cranks out a bunch of new abstract paintings.

Abstract Paintings from MachineRay (top), Variations from DALL-E (bottom), Images by Author

The top row shows three abstract paintings from MachineRay, and the bottom row shows variations of the painting generated by DALL-E. It’s interesting to see how each system effectively has its own “style.” The output from MachineRay has a more scruffy texture to the paint strokes, whereas the results from DALL-E seem more airbrushed in comparison.

The following three paintings were generated by the prompt “abstract art painting.”

DALL-E Renderings of “abstract art painting,” Images by Author

Hmm. These seem to look like details of larger splatter paintings. I found that DALL-E needs more direction in the prompts to create better paintings. You’ll see this in the sections below.

Modern Art

The second test compares DALL-E to my MAGnet project, which used the image to embedding system CLIP [4] from OpenAI, SWAGAN [6], and a custom genetic algorithm to create modern paintings from text prompts. Here are the results.

MAGNet Renderings of “rolling farmland,” “an abstract painting with circles, “ and “a cubist painting” (top), and DALL-E Variations (bottom), Images by Author

The top row shows the output from MAGNet for the prompts “rolling farmland,” “an abstract painting with circles,” and “a cubist painting.” The bottom row shows variations from DALL-E based on the MAGNet renderings. Note that when you use DALL-E to create variations, it doesn’t use the original text prompt. The system interprets the meaning of the input image using CLIP and then uses their new “unCLIP” model [1] to generate the variations from the CLIP embedding. It’s akin to a first artist describing a painting in words and a second artist using the words to create a new painting without looking at the original image. It’s an interesting effect. For example, I like how DALL-E picked up a face in the cubist painting.

Here’s what DALL-E creates from the original text prompts.

DALL-E Renderings of “a painting of rolling farmland,” “an abstract painting with circles,” “a cubist painting from 1920”, Images by Author

I know art appreciation is subjective, but I like these renditions much better than the first set of images.

Various Styles

The next test compares DALL-E's output to my GANshare project's output. GANshare uses CLIP and a VQGAN [7] model I trained on public domain images on WikiArt. Let’s see the results.

GANshare Renderings of “Geometric Painting with Diagonals in Sonic Silver Brown,” “Impressionist Painting with Prisms in Carolina Blue,” and “Futurist Painting of a City in Vivid Burgundy Brown” (top), and DALL-E Variations (bottom), Images by Author

The top row shows GANshare’s renderings from “Geometric Painting with Diagonals in Sonic Silver Brown,” “Impressionist Painting with Prisms in Carolina Blue,” and “Futurist Painting of a City in Vivid Burgundy Brown.” The bottom row shows the variations from DALL-E. It seems that DALL-E somehow zoomed up on the forms from GANshare. Also, we see more of the airbrushed look from DALL-E.

Here are new images using these prompts from DALL-E.

DALL-E Renderings of “Geometric painting with diagonals in sonic silver brown,” “Impressionist painting of a dining room with prisms in Carolina blue,” and “ Futurist painting of a city in vivid burgundy brown,” Images by Author

First up, note how I tweaked the prompt for the second image. I added the “dining room” bit to make the image similar to the one from GANshare and the DALL-E variant. DALL-E seems freer to express the intent of the prompts when it renders images directly. For example, note the more straightforward design of the geometric painting. And note how the word “prism” yields figurative forms in the overhead lights, but not the image's color scheme like VQGAN. I would probably need to add “with rainbow colors” to the prompt to get that effect with DALL-E.

Portraits

Up next is using DALL-E to create portraits of people. Although OpenAI’s Content Policy makes it clear that you can't use their system to “create images of public figures,” creating paintings of fictional people seems to be allowed.

For my GANfolk project, I trained two AI models, StyleGAN 2 and VQGAN, to create portraits of people, both driven by CLIP to create portraits from text prompts. Here are the results of DALL-E compared to GANfolk.

GANfolk rendering of “Drawing of a Thoughtful Brazilian Girl,” “Painting of a Focused Portuguese Man,” and “Painting of a Concerned Korean Woman” (top), and DALL-E Variations (bottom), Images by Author

The top row shows how GANfolk rendered portraits from the prompts, “Drawing of a Thoughtful Brazilian Girl,” “Painting of a Focused Portuguese Man,” and “Painting of a Concerned Korean Woman.” The bottom row shows portraits rendered by DALL-E as variations of the images from GANfolk. Overall, the images from DALL-E seem very good. For example, the shading of the first two portraits is excellent.

Here are new portraits generated by DALL-E with a slight twist; I added a phrase to the three prompts.

DALL-E renderings of “realistic charcoal drawing of a young Brazilian girl in sepia tones from 1920,” “realistic oil painting of a focused Portuguese man wearing a black t-shirt in front of a blue background from 1920,” and “realistic oil painting of a wide shot of a concerned Korean woman facing forward with a green background from 1920,” Images by Author

The original portraits by DALL-E for these prompts seemed very modern, as seen below. To match the general look of the portraits from GANshare, which I trained on public domain paintings from the 19th and early 20th centuries, I added the phrase “from 1920” to the prompts. This gave the portraits from DALL-E an oldey-timey look that seems to match the portraits generated by GANshare better.

DALL-E renderings of “realistic charcoal drawing of a young Brazilian girl in sepia tones,” “realistic oil painting of a focused Portuguese man wearing a black t-shirt in front of a blue background,” and “realistic oil painting of a wide shot of a concerned Korean woman facing forward with a green background,” Images by Author

I kinda like both sets of portraits from DALL-E. Next up is a look at landscape paintings.

Landscapes

For my GANscapes project, I trained StyleGAN 2 ADA using 5,000 landscape paintings from WikiArt.org. The system used CLIP to generate images from text prompts. Here are the results of paintings generated by GANscapes and DALL-E.

GANscapes Renderings for “Impressionist painting of a house by a lake,” “Impressionist painting of New England foliage,” and “Impressionist painting of a bay in summer” (top), Variations from DALL-E (bottom), Images by Author

You can see how DALL-E picked up the main visual components from the GANscapes renderings and pulled them into aesthetically pleasing compositions. And you can see a cohesive painterly style across all three images.

Let’s see what DALL-E generates with prompts directly.

The first thing that pops out is the saturated colors. I guess DALL-E got the memo that the Impressionists used vibrant paints. This set of landscapes seems to have fewer details than the ones above, but overall, the compositions are very lovely.

Varying Aspect Ratios

You may have noticed that all of the images in this article are dead square. Many image generation models use a 1:1 aspect ratio, including DALL-E.

My most recent project, Expand-DALL-E, or E-DALL-E, changes the aspect ratio of the pictures by generating new imagery. It uses an open-source text-to-image modal called Craiyon (previously DALL-E Mini) and VQGAN to “inpaint” the sides of images to change the aspect ratio.

The following examples show how 1:1 images generated by Craiyon are expanded to have a 16:9 aspect ratio.

Craiyon Rendering of “a painting of rolling farmland,” “an abstract painting with orange triangles,” and “a still life painting of a bowl of fruit” (top), E-DALL-E Expansion to 16:9 (middle), and DALL-E Expansion to 16:9 (bottom), Images by Author

The top row shows images created by Craiyon from the prompts “a painting of rolling farmland,” “an abstract painting with orange triangles,” and “a still life painting of a bowl of fruit.” The middle row shows how E-DALL-E expands the images to 16:9, filling in details on the sides. The bottom row shows how DALL-E performs the same function. Although DALL-E seems to do a better job with the expansion overall, E-DALL-E appears to be more creative and adds new forms to the expanded areas. More explicit prompts could be used to coax DALL-E to add additional forms.

As a final test, here is a DALL-E rendering from the prompt “a painting of rolling farmland” before and after the image expansion.

DALL-E Rendering of “a painting of rolling farmland,” Image by Author

DALL-E Expansion to 16:6, Image by Author

Here are the steps needed to use DALL-E to create images with non-square aspect ratios.

Import your image into a photo editing system like Photoshop.
Expand the canvas to the new size, keeping the original image in the center.
Upload the expanded image to DALL-E, and Edit the image.
Crop the image to the left, erase the blank part, and click Generate.
Save your favorite version of the inpainting.
Repeat steps 4 and 5 for the right side.
Bring both halves into Photoshop and merge them together.

If you need help with these instructions, please let me know in the comments. I could either make a YouTube video of a demo or, better yet, a Python program to split and merge the images.

Discussion

The DALL-E system is quite impressive. In almost every test, it outperforms previously available models. The team at OpenAI is aware of the system's limitations and is monitoring its use for possible abuses. I am excited about exploring the system further, like adding images to accompany the text for memes, stories, or movie storyboards. Like this!

Acknowledgments

I want to thank Jennifer Lim for her help with this article.

References

[1]A. Ramesh et al., DALL-E 2 Hierarchical Text-Conditional Image Generation with CLIP Latents (2022)

[2] DALL-E by A. Ramesh et al., Zero-Shot Text-to-Image Generation (2021)

[3] P. Mishkin, et al., DALL·E 2 Preview — Risks and Limitations (2022)

[4] StyleGAN 2, T. Karras et al., Analyzing and Improving the Image Quality of StyleGAN (2020)

[5] CLIP by A. Radford et al., Learning Transferable Visual Models From Natural Language Supervision (2021)

[6] R. Gal et al., SWAGAN: A Style-based WAvelet-driven Generative Model (2021)

[7] VQGAN by P. Esser, R. Rombach, and B. Ommer, Taming Transformers for High-Resolution Image Synthesis (2020)

[8] StyleGAN2 ADA, T. Karras, et al., Training Generative Adversarial Networks with Limited Data (2020)

[9] B. Dayma and P. Cuenca, DALL·E mini — Generate Images from Any Text Prompt (2021)

To get unlimited access to all articles on Medium, become a member for $5/month. Non-members can only read three locked stories each month.

dall mini make porn

Related Posts