Chameleon, a mixed-modal early-fusion foundation model
In a new paper, Meta announces ๐๐ก๐๐ฆ๐๐ฅ๐๐จ๐ง, a ๐ฆ๐ข๐ฑ๐๐-๐ฆ๐จ๐๐๐ฅ ๐๐๐ซ๐ฅ๐ฒ-๐๐ฎ๐ฌ๐ข๐จ๐ง foundation model. Contrary to earlier multimodal models, which model the different modalities (text, image, audio, etc.) separately, mixed-modal early-fusion foundation models like Chameleon are end-to-end models. They ingest all modalities from the start and project them into one representational space. That permits integrating information across