A Spotlight on Paris, London, Tel Aviv and Zurich
In the eight years since we established our FAIR hub in Paris, Meta has become one of the leading research organizations in the world, with pioneering work stemming from our tech hubs in Paris, London, Tel Aviv, and Zurich.
One of the most important decisions we made when we set up FAIR was to put exploratory research and open science at the center. We regularly collaborate with external researchers, because we have a strong hypothesis that this is the fastest and most responsible way to make progress.
“We have worked with institutions to develop generations of AI researchers, especially via our PhD programs,” said Naila Murray, head of FAIR EMEA. “Many of our PhD students have made important contributions to the field.”
Today, our teams in Paris, London, Tel Aviv, and Zurich, are focused on a variety of interests, including self-supervised learning, reinforcement learning, speech and audio, computer vision, natural language modeling, responsible AI, machine learning theory, model efficiency, AR/VR, and more.
“Our research is driven by a unique mix of ambition and collegiality, and our team works tightly together across boundaries of expertise, seniority, location, and job role to make rapid research progress,” Murray said. “In this current era in AI research, seemingly each day brings a potential new research breakthrough, including from our EMEA team.”
Groundbreaking Large Language Model Research
Earlier this year, our researchers in Paris formed the team that built and deployed LLaMA (Large Language Model Meta AI) – a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI.
LLaMA works by taking a sequence of words as an input and predicts a next word to recursively generate text. To train our model, we chose text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets. With capabilities to generate creative text, solve mathematical theorems, predict protein structures, answer reading comprehension questions, and more, large language models are one of the clearest cases of the substantial potential benefits AI can offer at scale to billions of people.
Self-supervised Computer Vision Research
Also based in Paris, our teams introduced two breakthroughs in computer vision research. In April, we unveiled DINOv2 – the first method for training computer vision models that uses self-supervised learning to achieve results that match or surpass the standard approach used in the field.
DINOv2 can discover and segment objects in an image or a video with absolutely no supervision and without being given a targeted objective. For example, DINO can understand that an image contains a representation of a dog without ever being taught what a dog is in the first place. As part of this announcement, we shared a public demo that anyone can use to explore some of the capabilities of DINOv2.
We’re already using DINOv2 to learn more about the physical world. Meta recently collaborated with the World Resources Institute to use AI to map forests – tree by tree – across areas the size of continents. While our self-supervised model was trained on data from forests in North America, evaluations confirm that it generalizes well and delivers accurate maps in other locations around the world.
Our Paris team, in collaboration with colleagues in North America, also pioneered new research using SEER (SElf-SupERvised), Meta AI Research’s groundbreaking self-supervised computer vision model. SEER learns directly from any random collection of images — without the need for careful data curation and labeling that goes into conventional computer vision training — and then outputs an image embedding.
For our latest breakthrough, SEER10B, we use diverse datasets to enable better and fairer computer vision. Traditional computer vision systems are trained primarily on examples from the U.S. and wealthy countries in Europe, so they often don’t work well for images from other places with different socioeconomic characteristics. SEER delivers strong results for images from all around the globe – including non-U.S. and non-Europe regions with a wide range of income levels. SEER10B drastically improved performance on fairness benchmarks across gender, apparent skin tone, and age groups. Apart from its improved performance on fairness benchmarks, this model understands images from across the world well enough to localize them with unprecedented precision. We hope SEER will be an important building block as the AI community works to build systems that work well for everyone.
Advancements in 3D Modeling
In August 2022, researchers in London and Paris open sourced the code for Implicitron, a modular framework within our open source PyTorch3D library. Implictron uses neural implicit representation, a computer vision technique that can seamlessly combine real and virtual objects in augmented reality — without requiring large amounts of data to learn from and without being limited to just a few points of view.
Implicitron learns a representation of a 3D object or scene using a sparse set of combined images of that object or scene from arbitrary viewpoints. Unlike traditional 3D representations such as meshes or point clouds, this newer approach represents objects as a continuous function, which allows for more accurate reconstruction of shapes with complex geometries as well as higher color reconstruction accuracy.
Generative AI for Images and Video
Our team in Tel Aviv is working closely on generative AI and has been at the forefront of some of Meta’s most recent advancements. In July 2022, our Tel Aviv researchers and collaborators around the world created a generative AI research model called Make-A-Scene. This multimodal generative AI method puts creative control in the hands of people who use it by allowing them to describe and illustrate their vision through both text descriptions and freeform sketches, resulting in surreal art, such as a hot dog flying through the sky and skyscrapers in the desert.
We followed up this work with Make-A-Video, an AI system that enables people to turn text prompts into brief, high-quality, one-of-a-kind video clips. The system can also create videos from images or take existing videos and create new ones that are similar.
The Metaverse and Beyond
We believe augmented and virtual reality, coupled with AI-powered interfaces, will constitute the next paradigm shift in human-oriented computing. While our other EMEA hubs are predominantly focused on the AI research that will help us get there, our team in Zurich is working closely to advance AR and VR.
Together, we are working on contextualized AI interfaces that could allow our devices to understand our context, our preferences, our history, and our goals. This supports our future vision where devices will act as partners rather than tools, surrounding us with technology that adapts to us and helps us to work the way we want.
Historically, different areas of AI research have been relatively isolated from one another, Murray said. However, the collaborative foundation FAIR was built upon has been an important catalyst for bringing different teams together and advancing research.
As head of the FAIR EMEA team, Murray said one of the best parts of her job is “sparking collaborations across researchers by pointing out connections between related research interests.”
“In recent months, there’s been an exciting confluence of multimodal perception, language understanding and generation, reinforcement learning, and human-machine interaction,” Murray said. “This confluence is getting us closer to the field’s long-held dream of building truly advanced intelligent systems, which is immensely exciting.”