What Makes for Good Visual Tokenizers for Large Language Models?We empirically investigate proper pre-training methods to build good visual tokenizers, making Large Language Models (LLMs) powerful Multimodal Large Language Models (MLLMs). In our benchmark,...https://arxiv.org/abs/2305.12223Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for...The tokenizer, as one of the fundamental components of large models, has long been overlooked or even misunderstood in visual tasks. One key factor of the great comprehension power of the large...https://arxiv.org/abs/2403.18593Β