Update task pages (#786)

merveenoyan · Merve Noyan · osanseviero · web-flow · commit de4e3d344248 · 2024-07-13T13:07:15.000+03:00
---------

Co-authored-by: Merve Noyan &lt;mervenoyan@Merve-MacBook-Pro.local&gt;
Co-authored-by: Omar Sanseviero &lt;osanseviero@gmail.com&gt;
diff --git a/packages/tasks/src/tasks/depth-estimation/about.md b/packages/tasks/src/tasks/depth-estimation/about.md
@@ -1,4 +1,5 @@
-## Use Cases
+## Use Cases
+
 Depth estimation models can be used to estimate the depth of different objects present in an image.
 
 ### Estimation of Volumetric Information
@@ -8,6 +9,14 @@ Depth estimation models are widely used to study volumetric formation of objects
 
 Depth estimation models can also be used to develop a 3D representation from a 2D image.
 
+## Depth Estimation Subtasks
+
+There are two depth estimation subtasks.
+
+- **Absolute depth estimation**: Absolute (or metric) depth estimation aims to provide exact depth measurements from the camera. Absolute depth estimation models output depth maps with real-world distances in meter or feet.
+
+- **Relative depth estimation**: Relative depth estimation aims to predict the depth order of objects or points in a scene without providing the precise measurements.
+
 ## Inference
 
 With the `transformers` library, you can use the `depth-estimation` pipeline to infer with image classification models. You can initialize the pipeline with a model id from the Hub. If you do not provide a model id it will initialize with [Intel/dpt-large](https://huggingface.co/Intel/dpt-large) by default. When calling the pipeline you just need to specify a path, http link or an image loaded in PIL. Additionally, you can find a comprehensive list of various depth estimation models at [this link](https://huggingface.co/models?pipeline_tag=depth-estimation).
diff --git a/packages/tasks/src/tasks/depth-estimation/data.ts b/packages/tasks/src/tasks/depth-estimation/data.ts
@@ -3,9 +3,13 @@ import type { TaskDataCustom } from "..";
 const taskData: TaskDataCustom = {
 	datasets: [
 		{
-			description: "NYU Depth V2 Dataset: Video dataset containing both RGB and depth sensor data",
+			description: "NYU Depth V2 Dataset: Video dataset containing both RGB and depth sensor data.",
 			id: "sayakpaul/nyu_depth_v2",
 		},
+		{
+			description: "Monocular depth estimation benchmark based without noise and errors.",
+			id: "depth-anything/DA-2K",
+		},
 	],
 	demo: {
 		inputs: [
@@ -24,26 +28,26 @@ const taskData: TaskDataCustom = {
 	metrics: [],
 	models: [
 		{
-			description: "Strong Depth Estimation model trained on 1.4 million images.",
-			id: "Intel/dpt-large",
-		},
-		{
-			description: "Strong Depth Estimation model trained on a big compilation of datasets.",
-			id: "LiheYoung/depth-anything-large-hf",
+			description: "Cutting-edge depth estimation model.",
+			id: "depth-anything/Depth-Anything-V2-Large",
 		},
 		{
 			description: "A strong monocular depth estimation model.",
 			id: "Bingxin/Marigold",
 		},
+		{
+			description: "A metric depth estimation model trained on NYU dataset.",
+			id: "Intel/zoedepth-nyu",
+		},
 	],
 	spaces: [
 		{
 			description: "An application that predicts the depth of an image and then reconstruct the 3D model as voxels.",
 			id: "radames/dpt-depth-estimation-3d-voxels",
 		},
 		{
-			description: "An application to compare the outputs of different depth estimation models.",
-			id: "LiheYoung/Depth-Anything",
+			description: "An application on cutting-edge depth estimation.",
+			id: "depth-anything/Depth-Anything-V2",
 		},
 		{
 			description: "An application to try state-of-the-art depth estimation.",
diff --git a/packages/tasks/src/tasks/feature-extraction/about.md b/packages/tasks/src/tasks/feature-extraction/about.md
@@ -1,9 +1,21 @@
 ## Use Cases
 
+### Transfer Learning
+
 Models trained on a specific dataset can learn features about the data. For instance, a model trained on an English poetry dataset learns English grammar at a very high level. This information can be transferred to a new model that is going to be trained on tweets. This process of extracting features and transferring to another model is called transfer learning. One can pass their dataset through a feature extraction pipeline and feed the result to a classifier.
 
+### Retrieval and Reranking
+
+Retrieval is the process of obtaining relevant documents or information based on a user's search query. In the context of NLP, retrieval systems aim to find relevant text passages or documents from a large corpus of data that match the user's query. The goal is to return a set of results that are likely to be useful to the user. On the other hand, reranking is a technique used to improve the quality of retrieval results by reordering them based on their relevance to the query.
+
+### Retrieval Augmented Generation
+
+Retrieval-augmented generation (RAG) is a technique in which user inputs to generative models are first queried through a knowledge base, and the most relevant information from the knowledge base is used to augment the prompt to reduce hallucinations during generation. Feature extraction models (primarily retrieval and reranking models) can be used in RAG to reduce model hallucinations and ground the model.
+
 ## Inference
 
+You can infer feature extraction models using `pipeline` of transformers library.
+
 ```python
 from transformers import pipeline
 checkpoint = "facebook/bart-base"
@@ -22,6 +34,39 @@ feature_extractor(text,return_tensors = "pt")[0].numpy().mean(axis=0)
          [ 0.2520, -0.6869, -1.0582,  ...,  0.5198, -2.2106,  0.4547]]])'''
 ```
 
+A very popular library for training similarity and search models is called `sentence-transformers`.  To get started, install the library.
+
+```bash
+pip install -U sentence-transformers
+```
+
+You can infer with `sentence-transformers` models as follows.
+
+```python
+from sentence_transformers import SentenceTransformer
+
+model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
+sentences = [
+    "The weather is lovely today.",
+    "It's so sunny outside!",
+    "He drove to the stadium.",
+]
+
+embeddings = model.encode(sentences)
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 0.6660, 0.1046],
+#         [0.6660, 1.0000, 0.1411],
+#         [0.1046, 0.1411, 1.0000]])
+```
+
+### Text Embedding Inference
+
+[Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a toolkit to easily serve feature extraction models using few lines of code.
+
 ## Useful resources
 
-- [Documentation for feature extractor of 🤗Transformers](https://huggingface.co/docs/transformers/main_classes/feature_extractor)
+- [Documentation for feature extraction task in 🤗Transformers](https://huggingface.co/docs/transformers/main_classes/feature_extractor)
+- [Introduction to MTEB Benchmark](https://huggingface.co/blog/mteb)
+- [Cookbook: Simple RAG for GitHub issues using Hugging Face Zephyr and LangChain](https://huggingface.co/learn/cookbook/rag_zephyr_langchain)
+- [sentence-transformers organization on Hugging Face Hub](https://huggingface.co/sentence-transformers)
diff --git a/packages/tasks/src/tasks/feature-extraction/data.ts b/packages/tasks/src/tasks/feature-extraction/data.ts
@@ -33,14 +33,19 @@ const taskData: TaskDataCustom = {
 	models: [
 		{
 			description: "A powerful feature extraction model for natural language processing tasks.",
-			id: "facebook/bart-base",
+			id: "thenlper/gte-large",
 		},
 		{
-			description: "A strong feature extraction model for coding tasks.",
-			id: "microsoft/codebert-base",
+			description: "A strong feature extraction model for retrieval.",
+			id: "Alibaba-NLP/gte-Qwen1.5-7B-instruct",
+		},
+	],
+	spaces: [
+		{
+			description: "A leaderboard to rank best feature extraction models..",
+			id: "mteb/leaderboard",
 		},
 	],
-	spaces: [],
 	summary: "Feature extraction is the task of extracting features learnt in a model.",
 	widgetModels: ["facebook/bart-base"],
 };
diff --git a/packages/tasks/src/tasks/object-detection/data.ts b/packages/tasks/src/tasks/object-detection/data.ts
@@ -3,10 +3,13 @@ import type { TaskDataCustom } from "..";
 const taskData: TaskDataCustom = {
 	datasets: [
 		{
-			// TODO write proper description
-			description: "Widely used benchmark dataset for multiple Vision tasks.",
+			description: "Widely used benchmark dataset for multiple vision tasks.",
 			id: "merve/coco2017",
 		},
+		{
+			description: "Multi-task computer vision benchmark.",
+			id: "merve/pascal-voc",
+		},
 	],
 	demo: {
 		inputs: [
@@ -47,16 +50,16 @@ const taskData: TaskDataCustom = {
 			description: "Strong object detection model trained on ImageNet-21k dataset.",
 			id: "microsoft/beit-base-patch16-224-pt22k-ft22k",
 		},
+		{
+			description: "Fast and accurate object detection model trained on COCO dataset.",
+			id: "PekingU/rtdetr_r18vd_coco_o365",
+		},
 	],
 	spaces: [
 		{
 			description: "Leaderboard to compare various object detection models across several metrics.",
 			id: "hf-vision/object_detection_leaderboard",
 		},
-		{
-			description: "An object detection application that can detect unseen objects out of the box.",
-			id: "merve/owlv2",
-		},
 		{
 			description: "An application that contains various object detection models to try from.",
 			id: "Gradio-Blocks/Object-Detection-With-DETR-and-YOLOS",
@@ -69,6 +72,10 @@ const taskData: TaskDataCustom = {
 			description: "An object tracking, segmentation and inpainting application.",
 			id: "VIPLab/Track-Anything",
 		},
+		{
+			description: "Very fast object tracking application based on object detection.",
+			id: "merve/RT-DETR-tracking-coco",
+		},
 	],
 	summary:
 		"Object Detection models allow users to identify objects of certain defined classes. Object detection models receive an image as input and output the images with bounding boxes and labels on detected objects.",
diff --git a/packages/tasks/src/tasks/text-generation/data.ts b/packages/tasks/src/tasks/text-generation/data.ts
@@ -82,7 +82,7 @@ const taskData: TaskDataCustom = {
 	spaces: [
 		{
 			description: "A leaderboard to compare different open-source text generation models based on various benchmarks.",
-			id: "HuggingFaceH4/open_llm_leaderboard",
+			id: "open-llm-leaderboard/open_llm_leaderboard",
 		},
 		{
 			description: "An text generation based application based on a very powerful LLaMA2 model.",
diff --git a/packages/tasks/src/tasks/text-to-image/data.ts b/packages/tasks/src/tasks/text-to-image/data.ts
@@ -53,18 +53,18 @@ const taskData: TaskDataCustom = {
 			id: "latent-consistency/lcm-lora-sdxl",
 		},
 		{
-			description: "A text-to-image model that can generate coherent text inside image.",
-			id: "DeepFloyd/IF-I-XL-v1.0",
+			description: "A very fast text-to-image model.",
+			id: "ByteDance/SDXL-Lightning",
 		},
 		{
 			description: "A powerful text-to-image model.",
-			id: "kakaobrain/karlo-v1-alpha",
+			id: "stabilityai/stable-diffusion-3-medium-diffusers",
 		},
 	],
 	spaces: [
 		{
 			description: "A powerful text-to-image application.",
-			id: "stabilityai/stable-diffusion",
+			id: "stabilityai/stable-diffusion-3-medium",
 		},
 		{
 			description: "A text-to-image application to generate comics.",
diff --git a/packages/tasks/src/tasks/zero-shot-image-classification/about.md b/packages/tasks/src/tasks/zero-shot-image-classification/about.md
@@ -68,9 +68,8 @@ The highest probability is 0.995 for the label cat and dog
 
 ## Useful Resources
 
-You can contribute useful resources about this task [here](https://github.com/huggingface/hub-docs/blob/main/tasks/src/zero-shot-image-classification/about.md).
-
-Check out [Zero-shot image classification task guide](https://huggingface.co/docs/transformers/tasks/zero_shot_image_classification).
+- [Zero-shot image classification task guide](https://huggingface.co/docs/transformers/tasks/zero_shot_image_classification).
+- [Image-text Similarity Search](https://huggingface.co/learn/cookbook/faiss_with_hf_datasets_and_clip)
 
 This page was made possible thanks to the efforts of [Shamima Hossain](https://huggingface.co/Shamima), [Haider Zaidi
 ](https://huggingface.co/chefhaider) and [Paarth Bhatnagar](https://huggingface.co/Paarth).
diff --git a/packages/tasks/src/tasks/zero-shot-image-classification/data.ts b/packages/tasks/src/tasks/zero-shot-image-classification/data.ts
@@ -55,6 +55,10 @@ const taskData: TaskDataCustom = {
 			description: "Strong zero-shot image classification model.",
 			id: "google/siglip-base-patch16-224",
 		},
+		{
+			description: "Small yet powerful zero-shot image classification model that can run on edge devices.",
+			id: "apple/MobileCLIP-S1-OpenCLIP",
+		},
 		{
 			description: "Strong image classification model for biomedical domain.",
 			id: "microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
diff --git a/packages/tasks/src/tasks/zero-shot-object-detection/data.ts b/packages/tasks/src/tasks/zero-shot-object-detection/data.ts
@@ -39,11 +39,11 @@ const taskData: TaskDataCustom = {
 	],
 	models: [
 		{
-			description: "Solid zero-shot object detection model that uses CLIP as backbone.",
-			id: "google/owlvit-base-patch32",
+			description: "Solid zero-shot object detection model.",
+			id: "IDEA-Research/grounding-dino-base",
 		},
 		{
-			description: "The improved version of the owlvit model.",
+			description: "Cutting-edge zero-shot object detection model.",
 			id: "google/owlv2-base-patch16-ensemble",
 		},
 	],
@@ -52,6 +52,11 @@ const taskData: TaskDataCustom = {
 			description: "A demo to try the state-of-the-art zero-shot object detection model, OWLv2.",
 			id: "merve/owlv2",
 		},
+		{
+			description:
+				"A demo that combines a zero-shot object detection and mask generation model for zero-shot segmentation.",
+			id: "merve/OWLSAM",
+		},
 	],
 	summary:
 		"Zero-shot object detection is a computer vision task to detect objects and their classes in images, without any prior training or knowledge of the classes. Zero-shot object detection models receive an image as input, as well as a list of candidate classes, and output the bounding boxes and labels where the objects have been detected.",

Original file line number	Diff line number	Diff line change
`@@ -82,7 +82,7 @@ const taskData: TaskDataCustom = {`
`82`	`82`	`spaces: [`
`83`	`83`	`{`
`84`	`84`	`description: "A leaderboard to compare different open-source text generation models based on various benchmarks.",`
`85`		`- id: "HuggingFaceH4/open_llm_leaderboard",`
	`85`	`+ id: "open-llm-leaderboard/open_llm_leaderboard",`
`86`	`86`	`},`
`87`	`87`	`{`
`88`	`88`	`description: "An text generation based application based on a very powerful LLaMA2 model.",`
Original file line number	Diff line number	Diff line change
`@@ -53,18 +53,18 @@ const taskData: TaskDataCustom = {`
`53`	`53`	`id: "latent-consistency/lcm-lora-sdxl",`
`54`	`54`	`},`
`55`	`55`	`{`
`56`		`- description: "A text-to-image model that can generate coherent text inside image.",`
`57`		`- id: "DeepFloyd/IF-I-XL-v1.0",`
	`56`	`+ description: "A very fast text-to-image model.",`
	`57`	`+ id: "ByteDance/SDXL-Lightning",`
`58`	`58`	`},`
`59`	`59`	`{`
`60`	`60`	`description: "A powerful text-to-image model.",`
`61`		`- id: "kakaobrain/karlo-v1-alpha",`
	`61`	`+ id: "stabilityai/stable-diffusion-3-medium-diffusers",`
`62`	`62`	`},`
`63`	`63`	`],`
`64`	`64`	`spaces: [`
`65`	`65`	`{`
`66`	`66`	`description: "A powerful text-to-image application.",`
`67`		`- id: "stabilityai/stable-diffusion",`
	`67`	`+ id: "stabilityai/stable-diffusion-3-medium",`
`68`	`68`	`},`
`69`	`69`	`{`
`70`	`70`	`description: "A text-to-image application to generate comics.",`