haotian-liu/LLaVA — LLaVA develops and releases large language and vision m

Version	Commit	Size	Downloads	Date
latestLatest	HEAD	12.6 MB	4	1mo ago

🌋 LLaVA: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

[📢 LLaVA-NeXT Blog] [Project Page] [Demo] [Data] [Model Zoo]

🤝Community Contributions: [llama.cpp] [Colab] [🤗Space] [Replicate] [AutoGen] [BakLLaVA]

Improved Baselines with Visual Instruction Tuning [Paper] [HF]
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Visual Instruction Tuning (NeurIPS 2023, Oral) [Paper] [HF]
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)

Release

[2024/05/10] 🔥 LLaVA-NeXT (Stronger) models are released, stronger LMM with support of LLama-3 (8B) and Qwen-1.5 (72B/110B). [Blog] [Checkpoints] [Demo] [Code]
[2024/05/10] 🔥 LLaVA-NeXT (Video) is released. The image-only-trained LLaVA-NeXT model is surprisingly strong on video tasks with zero-shot modality transfer. DPO training with AI feedback on videos can yield significant improvement. [Blog] [Checkpoints] [Code]
[03/10] Releasing LMMs-Eval, a highly efficient evaluation pipeline we used when developing LLaVA-NeXT. It supports the evaluation of LMMs on dozens of public datasets and allows new dataset onboarding, making the dev of new LMMs much faster. [Blog] [Codebase]
[1/30] 🔥 LLaVA-NeXT (LLaVA-1.6) is out! With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the blog post, and explore the demo! Models are available in Model Zoo. Training/eval data and scripts coming soon.
[11/10] LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). [Project Page] [Demo] [Code] [Paper]
[11/2] LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. [Project Page] [Demo] [Code] [Paper]
[10/26] 🔥 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.
[10/12] Check out the Korean LLaVA (Ko-LLaVA), created by ETRI, who has generously supported our research! [🤗 Demo]

LLaVA

Quick Overview

What is this?

What problem does it solve?

Who should use it?

Pros

Cons

Scores

Trust Score

Maintenance

Popularity

Star History

Snapshot Versions

Alternatives

hermes-agent

prompts.chat

dify

open-webui

langchain

awesome-llm-apps

Community Reviews

README

🌋 LLaVA: Large Language and Vision Assistant

Release