GET /extractor/method?url=URL
| Parameters | Descriptions |
|---|---|
| extractor | name of the extractor to use |
| method | extraction method |
| url | the url to extract content from |
| Name | Descriptions |
|---|---|
| article | A full-text extractor which is tuned towards news articles. In this scenario it achieves higher accuracy than DefaultExtractor. |
| keepeverything | Treats everything as "content". Useful to track down SAX parsing errors. |
| keepeverythingwithminkwords | - |
| largestcontent | Like DefaultExtractor, but only keeps the largest content block. Good for non-article style texts with only one main content block. |
| numwordsrules | - |
| canola | - |
| default | quite generic full-text extractor, but usually not as good as ArticleExtractor. |
| Name | Descriptions |
|---|---|
| text | Output the extracted main content as plain text |
| images | - |
| html | Output the whole HTML document and highlight the extracted main content |