We are a team Committed to Lang-Barrier-Free Exploration. Contact us: [email protected]
We are looking for engineers and interns who are passionate about multimodal language models, OCR, and PDF document translation to join our team.
- Develop and optimize multimodal language models (AI models combining image and text)
- Develop and improve OCR (Optical Character Recognition) technology to enhance text recognition accuracy across various document types
- Design and implement PDF document translation and processing systems
- Collaborate with team members to solve technical challenges and drive product innovation
- Stay current with the latest industry technological developments and apply them to practical projects
- Proficiency with deep learning frameworks (such as PyTorch etc.)
- Fundamental knowledge of computer vision and natural language processing
- Understanding of OCR technology, prior project experience preferred
- Familiarity with multimodal models (like CLIP, GPT-4V, etc.), prior project experience preferred
- Research or project experience in multimodal models, document processing, or OCR
- Contributions to open-source projects or GitHub repositories
- Experience with large-scale AI model training or fine-tuning
- Knowledge of PDF document structure and processing techniques