Skip to content

Commit e8264c3

Browse files
committed
Add support for TextNet
1 parent da2c1e9 commit e8264c3

File tree

6 files changed

+54
-0
lines changed

6 files changed

+54
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
407407
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
408408
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
409409
1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.
410+
1. **[TextNet](https://huggingface.co/docs/transformers/model_doc/textnet)** released with the paper [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/abs/2111.02394) by Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu.
410411
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
411412
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
412413
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

docs/snippets/6_supported-models.snippet

+1
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@
122122
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
123123
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
124124
1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.
125+
1. **[TextNet](https://huggingface.co/docs/transformers/model_doc/textnet)** released with the paper [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/abs/2111.02394) by Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu.
125126
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
126127
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
127128
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

src/models.js

+14
Original file line numberDiff line numberDiff line change
@@ -4712,6 +4712,18 @@ export class ViTForImageClassification extends ViTPreTrainedModel {
47124712
}
47134713
//////////////////////////////////////////////////
47144714

4715+
//////////////////////////////////////////////////
4716+
export class TextNetPreTrainedModel extends PreTrainedModel { }
4717+
export class TextNetModel extends TextNetPreTrainedModel { }
4718+
export class TextNetForImageClassification extends TextNetPreTrainedModel {
4719+
/**
4720+
* @param {any} model_inputs
4721+
*/
4722+
async _call(model_inputs) {
4723+
return new SequenceClassifierOutput(await super._call(model_inputs));
4724+
}
4725+
}
4726+
//////////////////////////////////////////////////
47154727

47164728
//////////////////////////////////////////////////
47174729
export class IJepaPreTrainedModel extends PreTrainedModel { }
@@ -7002,6 +7014,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
70027014
['rt_detr', ['RTDetrModel', RTDetrModel]],
70037015
['table-transformer', ['TableTransformerModel', TableTransformerModel]],
70047016
['vit', ['ViTModel', ViTModel]],
7017+
['textnet', ['TextNetModel', TextNetModel]],
70057018
['ijepa', ['IJepaModel', IJepaModel]],
70067019
['pvt', ['PvtModel', PvtModel]],
70077020
['vit_msn', ['ViTMSNModel', ViTMSNModel]],
@@ -7251,6 +7264,7 @@ const MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES = new Map([
72517264

72527265
const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
72537266
['vit', ['ViTForImageClassification', ViTForImageClassification]],
7267+
['textnet', ['TextNetForImageClassification', TextNetForImageClassification]],
72547268
['ijepa', ['IJepaForImageClassification', IJepaForImageClassification]],
72557269
['pvt', ['PvtForImageClassification', PvtForImageClassification]],
72567270
['vit_msn', ['ViTMSNForImageClassification', ViTMSNForImageClassification]],

src/models/image_processors.js

+1
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ export * from './sam/image_processing_sam.js'
3232
export * from './segformer/image_processing_segformer.js'
3333
export * from './siglip/image_processing_siglip.js'
3434
export * from './swin2sr/image_processing_swin2sr.js'
35+
export * from './textnet/image_processing_textnet.js'
3536
export * from './vit/image_processing_vit.js'
3637
export * from './vitmatte/image_processing_vitmatte.js'
3738
export * from './vitpose/image_processing_vitpose.js'
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import {
2+
ImageProcessor,
3+
} from "../../base/image_processors_utils.js";
4+
5+
export class TextNetImageProcessor extends ImageProcessor { }
6+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
import { AutoImageProcessor, TextNetImageProcessor } from "../../../src/transformers.js";
2+
3+
import { load_cached_image } from "../../asset_cache.js";
4+
import { MAX_PROCESSOR_LOAD_TIME, MAX_TEST_EXECUTION_TIME } from "../../init.js";
5+
6+
export default () => {
7+
describe("TextNetImageProcessor", () => {
8+
const model_id = "onnx-community/textnet-tiny";
9+
10+
/** @type {TextNetImageProcessor} */
11+
let processor;
12+
beforeAll(async () => {
13+
processor = await AutoImageProcessor.from_pretrained(model_id);
14+
}, MAX_PROCESSOR_LOAD_TIME);
15+
16+
it(
17+
"default",
18+
async () => {
19+
const image = await load_cached_image("receipt");
20+
const { pixel_values, original_sizes, reshaped_input_sizes } = await processor(image);
21+
22+
expect(pixel_values.dims).toEqual([1, 3, 960, 640]);
23+
expect(pixel_values.mean().item()).toBeCloseTo(0.8106788992881775, 6);
24+
25+
expect(original_sizes).toEqual([[864, 576]]);
26+
expect(reshaped_input_sizes).toEqual([[960, 640]]);
27+
},
28+
MAX_TEST_EXECUTION_TIME,
29+
);
30+
});
31+
};

0 commit comments

Comments
 (0)