Chameleon, a mixed-modal early-fusion foundation model
In a new paper, Meta announces 𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧, a 𝐦𝐢𝐱𝐞𝐝-𝐦𝐨𝐝𝐚𝐥 𝐞𝐚𝐫𝐥𝐲-𝐟𝐮𝐬𝐢𝐨𝐧 foundation model. Contrary to earlier multimodal models, which model the different modalities (text, image, audio, etc.) separately, mixed-modal early-fusion foundation models like Chameleon are end-to-end models. They ingest all modalities from the start and project them into one representational space. That permits integrating information across