How InstantID works, the new AI-based image generation solution

Ana Cano Barrera

Conversational Designer & Communications

March 19, 2024

The generation of images using Artificial Intelligence-based technologies is bringing about a great revolution. The wave of text-to-image broadcasting models that has been emerging in recent months is unstoppable, as is its enormous potential in multiple sectors such as audiovisual, advertising, leisure and entertainment services, etc. Many industries are focusing on this type of technology to make a difference. Recently, a new solution based on broadcasting models has been unveiled that is arousing great interest due to its ease of use, performance and quality of results.

We are talking about InstantID. An innovative open source tool with which to have fun and explore new realities by creating images with surprising coherence and precision.

In this new technology, a single facial photo is enough to generate new images with high fidelity in a few seconds, with no need for prior training. All you have to do is upload the image of the person you want to clone, add a few text indications and choose a style. The possibilities are endless and we want to encourage you to try everything InstantID has to offer. Shall we get started?

If you're new to AI image generation and unsure where to begin, InstantID is an excellent starting point. Here are five simple steps to help you make the most of this innovative solution:

1. Access the demo and upload the image

You can try InstantID through the demo hosted in the cloud: https://huggingface.co/spaces/InstantX/InstantID. You will see that it is a very intuitive interface and in a few seconds you will have everything ready to receive the first results.

In the left box, upload the photo you want the new image to be based on. Use only images of a single person, rather than a group (if it recognizes multiple faces, it will use the face that occupies the most space in the photo as reference). Make sure the face is clearly visible and the image is sharp enough.

If you want to get a specific pose, you can use a reference image, but this is a completely optional step. You can upload this second image via the box on the right.

Sometimes this secondary image can generate noise or interfere slightly with the likeness of the generated image, so consider whether you want to use this option or not. In many cases it is preferable to use only the prompt so that the text is the only instruction sent to the tool. However, some people choose to use this reference image because they find it difficult to describe in writing what they want to achieve.

If you are very clear about what you are looking for but find it difficult to define it through text, this second reference image can be very useful. Experiment with it and utilize or discard this option based on the generated results.

2. Construct an effective prompt

Write a simple prompt, but rich in detail. Include specific instructions so that the result is as precise as possible. Remember to avoid complex and ambiguous sentences that can lead to confusion. Use natural, easily understandable language. Although you can insert it in any language, we suggest inserting it in English to ensure better understanding.

You already know that artificial intelligence has to be able to correctly interpret these textual indications to understand what we are looking for; in the composition of the prompt is the key to direct the generation of images correctly and that the result is close to our expectations.

Evaluate the result generated by the AI and if you are not convinced, adjust the prompt to improve it: enrich your instructions with more details or replace some words with other more concise terms.

Let's look at a simple example with a very basic prompt: A businesswoman in London.

A businesswoman, dressed in a smart, professional suit, exuding confidence and determination, on a bustling avenue in a modern city full of skyscrapers.

Here are some examples of more sophisticated prompts:

An image of a casually dressed woman enjoying outdoor activities in a rural and picturesque setting surrounded by trees and mountains.

A blond-haired child with brown eyes is playing in a playground. In the background, there are colorful swings, trees, and wooden benches.

Photo by Andrew Lancaster on Unsplash.

3. Choose a style

If you're looking for a more artistic result, you'll want to try one of the styles that come as default in the tool. There are nine different modalities, nine environments that will transport you to a new world of fantasy. InstantID pushes the boundaries of your imagination!

Same example as above, but applying the Watercolor style. Original image source: Rich Fury//Getty Images.

Same example as above, but applying the Film Noir style. Original image source: Rich Fury//Getty Images.

If you want to achieve a more realistic image, don't select any style.

4. Adjust the parameters

If you want to have more control over the generated image, you can explore the different parameters you will find in the tool. The first to appear are 'IdentityNet strength' (how much do you want it to resemble the original image?) and 'Image adapter strength' (how much do you want it to adapt to the second reference image?).

Although both variables can range from 0 to 1.5, by default they are set to 0.8. Pay attention to these tips on how to experiment with them:

Increasing them will help you get a higher degree of similarity to the original image.
However, it is possible that you will get an image with too much saturation. In this case you should reduce the 'Image adapter strength' parameter.
The possibility of using images in the models improves the textual indications considerably, especially for content that is difficult to describe. The 'Image adapter strength' will be very useful in these cases, but be careful, if you find that the result is not sufficiently faithful to the text instructions, try to reduce it.

In addition, InstantID offers net control parameters to further adjust the request we make to the AI. There are three adjustable options that also range from 0 to 1.5, although the default value is 0.4. These options are pose, canny and depth. Use 'pose' for skeleton tuning, 'canny' for edge detection and depth to play with depth elements.

Finally, we can make use of the advanced options to specify the number of sample steps or enhance a non-facial region of the image, or play with other more complex parameters such as the negative prompt.

5. Generate the new image

Click the Submit button to start the image generation process. In a few seconds the result will appear in the box on the left (the larger one).

Surprising, isn't it? Although the results are impressive, you may want to keep trying. Download the resulting image and keep trying.

Look at the generated image and think about where it could be improved. Is it not close enough to the original photo? Are you happy with the similarity but would like to change the pose? Are you hesitant about the chosen style? Would you like to keep the background of the first image? Experiment with the parameters and the prompt, the versatility of this solution is amazing.

Important! Please note that this version has certain restrictions, and you won't be able to test it unlimitedly due to the GPU capacity of the device. With the same IP, you can generate around 20 images daily, approximately.

As we mentioned at the beginning of this post, this is a revolutionary solution that makes it possible to clone images of any person in a matter of seconds. If you are passionate about Artificial Intelligence and want to explore the creation of images, dive into InstantID and discover all its possibilities!

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Ana Cano Barrera

Conversational Designer & Communications

Degree in Audiovisual Communication.I believe in the strength of multidisciplinary teams and, above all, in the fundamental role played by humanistic profiles for a good human-machine understanding. To face technological innovation, communication is the key.