Machine Learning/Deep Learning with Us

These articles are documenting my learning notes and the code reproductions(if I have the time or capbability) in mainly these fields:

  1. Natural Language Processing: Information Extarction and Retireval
  2. Computer Vision: Object Detection, Segmentation and Synthesis
  3. Optimal Transport and Generative Models
  4. Reinforcement Learning and Game Theory

🧑🏿‍💻 Multimodal Representation Leraning from both Text and Image

# Artificial Intelligence # text-image # multi-modality # long-read
Published On: February 8, 2024 (Last updated on: April 15, 2024)
1198 words · 6 min

Before For a long time, the machine learning model (deep learning model) cannot understand more than one modality, i.e., whether they knows how to do the text-based task or they know how to play with the image. As artificial intelligence, the researchers would like to the models have the cability of manipulating the multimodal data as the natural intelligence is not limited to just a single modality. Such that the AI shell read and write text while they could also see images and watch videos and hear the audio.