1 min read

Apple introduces ‘MGIE,’ an innovative AI model for image editing based on instructions

Apple has unveiled a novel AI model dubbed “MGIE,” short for MLLM-Guided Image Editing, which allows for image manipulation based on natural language commands. This advanced model, developed through a collaboration between Apple and researchers from the University of California, Santa Barbara, was showcased in a paper presented at the International Conference on Learning Representations (ICLR) 2024, underscoring its efficacy in improving both automatic metrics and human evaluation.

How does MGIE function?

Leveraging multimodal large language models (MLLMs), MGIE integrates text and image processing to revolutionize instruction-based image editing. By employing MLLMs, MGIE interprets user instructions and generates visual representations of desired edits, enabling precise pixel-level manipulations. This approach entails two key components: first, deriving expressive instructions from user input to guide the editing process, and second, generating a visual imagination to inform pixel-level manipulation. MGIE employs a novel end-to-end training scheme to optimize instruction derivation, visual imagination, and image editing modules.



New Apple AI Model Edits Images Based on Natural Language Input - MacRumors
New Apple AI Model Edits Images Based on Natural Language Input

What capabilities does MGIE offer?

MGIE is equipped to handle a diverse array of editing tasks, ranging from simple color adjustments to complex object manipulations. Its features include:

Expressive instruction-based editing: MGIE generates concise instructions to guide the editing process effectively, enhancing user experience and edit quality.
Photoshop-style modification: The model performs common Photoshop-style edits such as cropping, resizing, rotating, flipping, and applying filters, along with more advanced edits like background changes and object manipulation.
Global photo optimization: MGIE optimizes overall photo quality by adjusting brightness, contrast, sharpness, and color balance, and offers artistic effects such as sketching, painting, and cartooning.
Local editing: MGIE enables targeted edits on specific regions or objects within an image, with options to modify attributes such as shape, size, color, texture, and style.

How to utilize MGIE?

The model is available as an open-source project on GitHub, providing users with access to code, data, and pre-trained models. Additionally, a demo notebook demonstrates various editing tasks, and an online web demo hosted on Hugging Face Spaces facilitates experimentation. MGIE offers flexibility and ease of customization, allowing users to provide natural language instructions for image editing. Feedback can be provided to refine edits or request modifications, and integration with other applications or platforms requiring image editing functionality is possible.

Significance of MGIE:

MGIE represents a significant advancement in instruction-based image editing, bridging the gap between AI capabilities and human creativity. It serves as a practical tool for diverse scenarios, empowering users to create, modify, and optimize images for personal or professional use. Furthermore, MGIE underscores Apple’s commitment to AI research and development, showcasing the company’s expanding machine learning capabilities. While there’s room for improvement in multimodal AI systems, MGIE’s release heralds a promising future for assistive AI in creative endeavors.

Leave a Reply