|
1 | 1 | # Recent Changes |
| 2 | +### Dec 23, 2022 🎄☃ |
| 3 | +* Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013) |
| 4 | + * NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP |
| 5 | +* Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit) |
| 6 | +* More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use) |
| 7 | +* More ImageNet-12k (subset of 22k) pretrain models popping up: |
| 8 | + * `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448 |
| 9 | + * `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384 |
| 10 | + * `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256 |
| 11 | + * `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288 |
| 12 | + |
| 13 | +### Dec 8, 2022 |
| 14 | +* Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some) |
| 15 | + * original source: https://github.com/baaivision/EVA |
| 16 | + |
| 17 | +| model | top1 | param_count | gmac | macts | hub | |
| 18 | +|:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------| |
| 19 | +| eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) | |
| 20 | +| eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) | |
| 21 | +| eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) | |
| 22 | +| eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) | |
| 23 | + |
| 24 | +### Dec 6, 2022 |
| 25 | +* Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`. |
| 26 | + * original source: https://github.com/baaivision/EVA |
| 27 | + * paper: https://arxiv.org/abs/2211.07636 |
| 28 | + |
| 29 | +| model | top1 | param_count | gmac | macts | hub | |
| 30 | +|:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------| |
| 31 | +| eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) | |
| 32 | +| eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) | |
| 33 | +| eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) | |
| 34 | +| eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) | |
| 35 | + |
| 36 | +### Dec 5, 2022 |
| 37 | + |
| 38 | +* Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm` |
| 39 | + * vision_transformer, maxvit, convnext are the first three model impl w/ support |
| 40 | + * model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling |
| 41 | + * bugs are likely, but I need feedback so please try it out |
| 42 | + * if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) |
| 43 | +* Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument |
| 44 | +* Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output |
| 45 | +* Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models |
| 46 | + |
| 47 | +| model | top1 | param_count | gmac | macts | hub | |
| 48 | +|:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------| |
| 49 | +| vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) | |
| 50 | +| vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) | |
| 51 | +| vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) | |
| 52 | +| vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) | |
| 53 | +| vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) | |
| 54 | +| vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) | |
| 55 | +| vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) | |
| 56 | +| vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) | |
| 57 | +| vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) | |
| 58 | +| vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) | |
| 59 | +| vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) | |
| 60 | +| vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) | |
| 61 | +| vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) | |
| 62 | +| vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) | |
| 63 | +| vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) | |
| 64 | +| vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) | |
| 65 | +| vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) | |
| 66 | +| vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) | |
| 67 | +| vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) | |
| 68 | +| vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) | |
| 69 | +| vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) | |
| 70 | +| vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) | |
| 71 | +| vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) | |
| 72 | +| vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) | |
| 73 | + |
| 74 | +* Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit |
| 75 | + * There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing |
| 76 | + |
| 77 | +| model | top1 | param_count | gmac | macts | hub | |
| 78 | +|:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------| |
| 79 | +| maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) | |
| 80 | +| maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) | |
| 81 | +| maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) | |
| 82 | +| maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) | |
| 83 | +| maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) | |
| 84 | +| maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) | |
| 85 | +| maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) | |
| 86 | +| maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) | |
| 87 | +| maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) | |
| 88 | +| maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) | |
| 89 | +| maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) | |
| 90 | +| maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) | |
| 91 | +| maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) | |
| 92 | +| maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) | |
| 93 | +| maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) | |
| 94 | +| maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) | |
| 95 | +| maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) | |
| 96 | +| maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) | |
| 97 | + |
| 98 | +### Oct 15, 2022 |
| 99 | +* Train and validation script enhancements |
| 100 | +* Non-GPU (ie CPU) device support |
| 101 | +* SLURM compatibility for train script |
| 102 | +* HF datasets support (via ReaderHfds) |
| 103 | +* TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate) |
| 104 | +* in_chans !=3 support for scripts / loader |
| 105 | +* Adan optimizer |
| 106 | +* Can enable per-step LR scheduling via args |
| 107 | +* Dataset 'parsers' renamed to 'readers', more descriptive of purpose |
| 108 | +* AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16` |
| 109 | +* main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds |
| 110 | +* master -> main branch rename |
| 111 | + |
| 112 | +### Oct 10, 2022 |
| 113 | +* More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments: |
| 114 | + * `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm) |
| 115 | + * `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN) |
| 116 | + * `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G) |
| 117 | + * `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN) |
| 118 | + * `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T) |
| 119 | + * NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun. |
| 120 | + |
| 121 | +### Sept 23, 2022 |
| 122 | +* LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier) |
| 123 | + * vit_base_patch32_224_clip_laion2b |
| 124 | + * vit_large_patch14_224_clip_laion2b |
| 125 | + * vit_huge_patch14_224_clip_laion2b |
| 126 | + * vit_giant_patch14_224_clip_laion2b |
| 127 | + |
| 128 | +### Sept 7, 2022 |
| 129 | +* Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future |
| 130 | +* Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2 |
| 131 | +* Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants: |
| 132 | + * `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T) |
| 133 | + * `maxvit_tiny_rw_224` - 83.5 @ 224 (G) |
| 134 | + * `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T) |
2 | 135 |
|
3 | 136 | ### Aug 29, 2022 |
4 | 137 | * MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this: |
|
0 commit comments