Meta announced the launch of an Artificial Intelligence (AI) model that Creates content based on studies of the human senses. THE The ImageBind project is a system that analyzes multisensory data, such as vision and depth perception, in a way similar to the human brain, transforming this data into information to generate an action.
READ ALSO: "Experts call for a halt to AI advancements"
READ ALSO: “4 AI platforms for task execution“
ImageBind, the first AI model capable of linking information from six modalitiesThe model learns a single embedding or shared representation space, not only for text, image/video, and audio, but also for sensors that record depth (3D), thermal (infrared radiation), and inertial measurement units (IMUs), which calculate motion and position. ImageBind equips machines with a holistic understanding that connects the objects in a photo with how they will sound, their 3D shape, how hot or cold they are, and how they move.

All this data is collected automatically and used to calculate the next action, without the need for human supervision. Meta scientists believe that such a model could surpass the performance of previous tests with specialized machines trained by humans, since machine learning can dispense with external interference.
ImageBind is part of Meta's efforts to create multimodal AI systems that they learn from all types of data around them. As the number of modalities increases, ImageBind opens the floodgates for researchers to attempt to develop new holistic systems, such as combining 3D sensors and IMUs to design or experience immersive virtual worlds. ImageBind can also provide a rich way to explore memories – searching for photos, videos, audio files, or text messages using a combination of text, audio, and image.
ImageBind in practice
To better understand how ImageBind works, let's imagine a train loading robot handling flammable liquids. A conventional machine would continue its function indefinitely, but a robot equipped with Meta's AI could alert human technicians by detecting a change in heat. The robotic assistant would associate this finding with an explosion sound and the visual detection of fire inside the train car to make a decision. Depending on the robot's algorithm training, it could decide on its own the best course of action. Would it simply move away? Grab a fire extinguisher? Turn off the electrical circuit? Warn the humans to evacuate the area? All decisions would be based on mathematical calculations involving probabilities and statistics associated with holistic data.
Meta understands that there is still much to be discovered about multimodal learning. The AI research community still needs to effectively quantify the scaling behaviors that appear only in larger models and understand their applications. ImageBind is a step towards rigorously evaluating them and demonstrating new applications in image generation and retrieval.
SEE ALSO: "Humanized or automated service: which is the best option?"
SEE ALSO: LinkedIn will lay off 700,000 employees and shut down its app in China.