From 03abc3f12306127098d6445d0adc58c7d5980e37 Mon Sep 17 00:00:00 2001 From: Samuel Huang Date: Mon, 10 Nov 2025 14:45:05 -0500 Subject: [PATCH 1/2] Preserve images within figure elements This CL fixes an issue where images nested within `
` elements inside a `
` tag were being incorrectly removed by Readability's cleaning process. Specifically, a structure like `
` would first be transformed to `

`, and then the outer `
` (and its contents) would be erroneously identified as extraneous and removed. The fix introduces a targeted exception within _cleanConditionally(). It prevents the removal of `
` elements that meet the following criteria: * The element is a `
`. * The `
` is an ancestor of a `
` element. * The `
` contains a single `` element (potentially nested). Also add test case allrecipes-1 taken from: https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010 --- .../allrecipes-1/expected-metadata.json | 10 + test/test-pages/allrecipes-1/expected.html | 291 ++ test/test-pages/allrecipes-1/source.html | 3349 +++++++++++++++++ 3 files changed, 3650 insertions(+) create mode 100644 test/test-pages/allrecipes-1/expected-metadata.json create mode 100644 test/test-pages/allrecipes-1/expected.html create mode 100644 test/test-pages/allrecipes-1/source.html diff --git a/test/test-pages/allrecipes-1/expected-metadata.json b/test/test-pages/allrecipes-1/expected-metadata.json new file mode 100644 index 00000000..48d1f60e --- /dev/null +++ b/test/test-pages/allrecipes-1/expected-metadata.json @@ -0,0 +1,10 @@ +{ + "title": "Hot Honey Brussels Sprouts", + "byline": "Nicole Russell", + "dir": null, + "lang": "en", + "excerpt": "These hot honey Brussels sprouts are a simple side dish with all the elements you'll ever need in a side. They're sweet, spicy, crispy, and melt in your mouth.", + "siteName": "Allrecipes", + "publishedTime": null, + "readerable": true +} diff --git a/test/test-pages/allrecipes-1/expected.html b/test/test-pages/allrecipes-1/expected.html new file mode 100644 index 00000000..2190e23d --- /dev/null +++ b/test/test-pages/allrecipes-1/expected.html @@ -0,0 +1,291 @@ +
+
+
+
+
+

1 Photo

+
+

These hot honey Brussels sprouts are a simple side dish with all the elements you'll ever need in a side. They're sweet, spicy, crispy, and melt in your mouth. I like to pair this with pork chops but it goes great with chicken or wild game.

+
+
+

By

+
+

Nicole Russell

+
+
+

Nicole Russell

+
+

Nicole Russell is a prolific contributor to Allrecipes and an avid member of the Allrecipes Allstars.

+
+
+
+

Published on November 3, 2025

+
+
+
+
+
+

Dish of roasted brussels sprouts with crumbled cheese and garnish

+
+
+
+
+
+

Prep Time:

+

10 mins

+
+
+

Cook Time:

+

20 mins

+
+
+

Total Time:

+

30 mins

+
+
+

Servings:

+

4

+
+
+
+

Keep Screen Awake

+

+

Ingredients

+

+
+
+

+

+

+
+

Original recipe (1X) yields 4 servings

+
+
    +
  • +

    + 1 pound Brussels sprouts, trimmed and halved lengthwise +

    +
  • +
  • +

    + 1 tablespoon olive oil +

    +
  • +
  • +

    + 2 tablespoons hot honey, such as Mike's® Original Hot Honey +

    +
  • +
  • +

    + salt and freshly ground black pepper to taste +

    +
  • +
  • +

    + 1/4 cup crumbled feta cheese +

    +
  • +
  • +

    + 1 tablespoon chopped scallions +

    +
  • +
+
+
+

Directions

+
+
    +
  1. +

    Gather all ingredients. Preheat the oven to 400 degrees F (200 degrees C). Line a large baking sheet with aluminum foil.

    +
    +
    +

    Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface

    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
  2. +
  3. +

    Place Brussels sprouts in a bowl. Add oil, hot honey, salt, and pepper. Stir to combine and transfer to the baking sheet.

    +
    +
    +

    A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking

    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
  4. +
  5. +

    Roast in the preheated oven for 20 minutes.

    +
    +
    +

    Roasted Brussels sprouts on a foillined baking sheet

    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
  6. +
  7. +

    Transfer sprouts to a bowl. Top with feta and scallions. Toss to combine. Serve immediately.

    +
    +
    +

    Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side

    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
  8. +
+
+
+
+
+

Nutrition Facts (per serving) +

+ + + + + + + + + + + + + + + + + + + +
132 Calories
6g Fat
18g Carbs
5g Protein
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Nutrition Facts
+ Servings Per Recipe 4 +
+ Calories 132 +
% Daily Value *
+ Total Fat 6g + 8%
+ Saturated Fat 2g + 10%
+ Cholesterol 8mg + 3%
+ Sodium 185mg + 8%
+ Total Carbohydrate 18g + 7%
+ Dietary Fiber 3g + 11%
+ Total Sugars 12g +
+ Protein 5g + 9%
+ Vitamin C 98mg + 109%
+ Calcium 91mg + 7%
+ Iron 2mg + 9%
+ Potassium 414mg + 9%
+
+

* Percent Daily Values are based on a 2,000 calorie diet. Your daily values may be higher or lower depending on your calorie needs.

+

** Nutrient information is not available for all ingredients. Amount is based on available nutrient data.

+

(-) Information is not currently available for this nutrient. If you are following a medically restrictive diet, please consult your doctor or registered dietitian before preparing this recipe for personal consumption.

+
+
+
+
+
+
+
+
\ No newline at end of file diff --git a/test/test-pages/allrecipes-1/source.html b/test/test-pages/allrecipes-1/source.html new file mode 100644 index 00000000..45e56ad5 --- /dev/null +++ b/test/test-pages/allrecipes-1/source.html @@ -0,0 +1,3349 @@ + + + + + + + + + + + + + + + + + + + Hot Honey Brussels Sprouts Recipe + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ +
+ +
+
+
+ +
+
+
+
+

+ Hot Honey Brussels Sprouts +

+
+ + +
+

+ These hot honey Brussels sprouts are a simple side dish with all the elements you'll ever need in a side. They're sweet, spicy, crispy, and melt in your mouth. I like to pair this with pork chops but it goes great with chicken or wild game. +

+ +
+
+
+
+ +
+
+
+ Dish of roasted brussels sprouts with crumbled cheese and garnish +
+
+
+
+ +
+
+
+
+
+
+ Prep Time: +
+
+ 10 mins +
+
+
+
+ Cook Time: +
+
+ 20 mins +
+
+
+
+ Total Time: +
+
+ 30 mins +
+
+
+
+ Servings: +
+
+ 4 +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+ Keep Screen Awake +
+
+

+ Ingredients +

+
+
+
+
+ +
+
+ +
+
+ +
+
+
+ +
+

+ Original recipe (1X) yields 4 servings +

+
+
    +
  • +

    + 1 pound Brussels sprouts, trimmed and halved lengthwise +

    +
  • +
  • +

    + 1 tablespoon olive oil +

    +
  • +
  • +

    + 2 tablespoons hot honey, such as Mike's® Original Hot Honey +

    +
  • +
  • +

    + salt and freshly ground black pepper to taste +

    +
  • +
  • +

    + 1/4 cup crumbled feta cheese +

    +
  • +
  • +

    + 1 tablespoon chopped scallions +

    +
  • +
+
+
+
+

+ Directions +

+
+
    +
  1. +

    + Gather all ingredients. Preheat the oven to 400 degrees F (200 degrees C). Line a large baking sheet with aluminum foil. +

    +
    +
    +
    + Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface +
    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
    +
  2. +
  3. +

    + Place Brussels sprouts in a bowl. Add oil, hot honey, salt, and pepper. Stir to combine and transfer to the baking sheet. +

    +
    +
    +
    + A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking +
    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
    +
  4. +
  5. +

    + Roast in the preheated oven for 20 minutes. +

    +
    +
    +
    + Roasted Brussels sprouts on a foillined baking sheet +
    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
    +
  6. +
  7. +

    + Transfer sprouts to a bowl. Top with feta and scallions. Toss to combine. Serve immediately. +

    +
    +
    +
    + Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side +
    +
    +
    +

    + Allrecipes / Julia Hartbeck +

    +
    +
    +
    +
  8. +
+
+
+
+
+ +
+
+ +
+
+
+
+

+ Nutrition Facts (per serving) +

+ + + + + + + + + + + + + + + + + + + +
+ 132 + + Calories +
+ 6g + + Fat +
+ 18g + + Carbs +
+ 5g + + Protein +
+
+
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Nutrition Facts +
+ Servings Per Recipe 4 +
+ Calories 132 +
+ % Daily Value * +
+ Total Fat 6g + + 8% +
+ Saturated Fat 2g + + 10% +
+ Cholesterol 8mg + + 3% +
+ Sodium 185mg + + 8% +
+ Total Carbohydrate 18g + + 7% +
+ Dietary Fiber 3g + + 11% +
+ Total Sugars 12g +
+ Protein 5g + + 9% +
+ Vitamin C 98mg + + 109% +
+ Calcium 91mg + + 7% +
+ Iron 2mg + + 9% +
+ Potassium 414mg + + 9% +
+
+

+ * Percent Daily Values are based on a 2,000 calorie diet. Your daily values may be higher or lower depending on your calorie needs. +

+

+ ** Nutrient information is not available for all ingredients. Amount is based on available nutrient data. +

+

+ (-) Information is not currently available for this nutrient. If you are following a medically restrictive diet, please consult your doctor or registered dietitian before preparing this recipe for personal consumption. +

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+

+ You’ll Also Love +

+
+ + + + +
+
+
+ + + + + + + + From 00dfc86eb9a24188085b6d73a2d9b7faa9923ec8 Mon Sep 17 00:00:00 2001 From: Gijs Kruitbosch Date: Tue, 2 Dec 2025 23:13:14 +0000 Subject: [PATCH 2/2] Remove divs to improve noscript img/figure handling. --- Readability.js | 42 +++++++++++++++++++++ test/test-pages/allrecipes-1/expected.html | 24 +++--------- test/test-pages/bug-1255978/expected.html | 29 ++++---------- test/test-pages/mozilla-1/expected.html | 16 +++----- test/test-pages/simplyfound-1/expected.html | 2 + test/test-pages/yahoo-1/expected.html | 2 +- 6 files changed, 65 insertions(+), 50 deletions(-) diff --git a/Readability.js b/Readability.js index 5cff4540..aaa84482 100644 --- a/Readability.js +++ b/Readability.js @@ -1907,6 +1907,47 @@ Readability.prototype = { return false; }, + _removeDeeplyNestedImageDivs() { + var doc = this._doc; + var nodes = Array.from(this._getAllNodesWithTag(doc, ["img"])); + for (var i = 0; i < nodes.length; i++) { + var node = nodes[i]; + var parent = node.parentNode; + while (parent.tagName === "DIV" && !node.previousElementSibling) { + // If we've only got an image and potentially a noscript after it, with + // no other non-whitespace text content, we can unwrap the div. + + // First check sibling elements. If there's a non-noscript el, or + // more stuff after that, we can't unwrap. + if ( + node.nextElementSibling && + (node.nextElementSibling.tagName !== "NOSCRIPT" || + node.nextElementSibling.nextElementSibling) + ) { + break; + } + // Next, check for non-whitespace text content siblings. + let hasNoRealTextContent = !this._someNode( + parent.childNodes, + function (node) { + return ( + node.nodeType === this.TEXT_NODE && + this.REGEXPS.hasContent.test(node.textContent) + ); + } + ); + if (!hasNoRealTextContent) { + break; + } + while (parent.firstElementChild) { + parent.parentNode.insertBefore(parent.firstElementChild, parent); + } + parent.remove(); + parent = node.parentNode; + } + } + }, + /** * Find all
-
-

Dish of roasted brussels sprouts with crumbled cheese and garnish

-
+ Dish of roasted brussels sprouts with crumbled cheese and garnish
@@ -106,9 +102,7 @@

Directions

  • Gather all ingredients. Preheat the oven to 400 degrees F (200 degrees C). Line a large baking sheet with aluminum foil.

    -
    -

    Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface

    -
    + Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface

    Allrecipes / Julia Hartbeck @@ -119,9 +113,7 @@

    Directions

  • Place Brussels sprouts in a bowl. Add oil, hot honey, salt, and pepper. Stir to combine and transfer to the baking sheet.

    -
    -

    A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking

    -
    + A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking

    Allrecipes / Julia Hartbeck @@ -132,9 +124,7 @@

    Directions

  • Roast in the preheated oven for 20 minutes.

    -
    -

    Roasted Brussels sprouts on a foillined baking sheet

    -
    + Roasted Brussels sprouts on a foillined baking sheet

    Allrecipes / Julia Hartbeck @@ -145,9 +135,7 @@

    Directions

  • Transfer sprouts to a bowl. Top with feta and scallions. Toss to combine. Serve immediately.

    -
    -

    Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side

    -
    + Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side

    Allrecipes / Julia Hartbeck diff --git a/test/test-pages/bug-1255978/expected.html b/test/test-pages/bug-1255978/expected.html index 1a7eef1d..ff48580a 100644 --- a/test/test-pages/bug-1255978/expected.html +++ b/test/test-pages/bug-1255978/expected.html @@ -6,9 +6,7 @@

    But even luxury hotels aren’t always cleaned as often as they should be.

    Here are some of the secrets that the receptionist will never tell you when you check in, according to answers posted on Quora.

    -
    -

    bandb2.jpg

    -
    +

    bandb2.jpg

    Even posh hotels might not wash a blanket in between stays

    1. Take any blankets or duvets off the bed

    @@ -28,10 +26,8 @@

    Duration Time 0:00

  • -

    Loaded: 0% -

    -

    Progress: 0% -

    +

    Loaded: 0%

    +

    Progress: 0%

    Remaining Time -0:00

    @@ -43,25 +39,19 @@

    Video shows bed bug infestation at New York hotel

    -
    -

    hotel-door-getty.jpg

    -
    +

    hotel-door-getty.jpg

    Forrest Jones advised stuffing the peep hole with a strip of rolled up notepaper when not in use.

    2. Check the peep hole has not been tampered with

    This is not common, but can happen, Forrest Jones said. He advised stuffing the peep hole with a strip of rolled up notepaper when not in use. When someone knocks on the door, the paper can be removed to check who is there. If no one is visible, he recommends calling the front desk immediately. “I look forward to the day when I can tell you to choose only hotels where every employee who has access to guestroom keys is subjected to a complete public records background check, prior to hire, and every year or two thereafter. But for now, I can't,” he said.

    -
    -

    luggage-3.jpg

    -
    +

    luggage-3.jpg

    Put luggage on the floor

    3. Don’t use a wooden luggage rack

    Bedbugs love wood. Even though a wooden luggage rack might look nicer and more expensive than a metal one, it’s a breeding ground for bugs. Forrest Jones says guests should put the items they plan to take from bags on other pieces of furniture and leave the bag on the floor.

    -
    -

    Lifestyle-hotels.jpg

    -
    +

    Lifestyle-hotels.jpg

    The old rule of thumb is that for every 00 invested in a room, the hotel should charge in average daily rate

    4. Hotel rooms are priced according to how expensive they were to build

    @@ -71,9 +61,7 @@

    5. Beware the wall-mounted hairdryer

    6. Mini bars almost always lose money

    Despite the snacks in the minibar seeming like the most overpriced food you have ever seen, hotel owners are still struggling to make a profit from those snacks. "Minibars almost always lose money, even when they charge $10 for a Diet Coke,” Sharon said.

    -
    -

    agenda7.jpg

    -
    +

    agenda7.jpg

    Towels should always be cleaned between stays

    7. Always made sure the hand towels are clean when you arrive

    @@ -84,7 +72,6 @@

    6. Mini bars almost always lose money

  • Hotels
  • Hygiene
  • -

    Reuse content -

    +

    Reuse content

    \ No newline at end of file diff --git a/test/test-pages/mozilla-1/expected.html b/test/test-pages/mozilla-1/expected.html index 5d66554d..4b771d57 100644 --- a/test/test-pages/mozilla-1/expected.html +++ b/test/test-pages/mozilla-1/expected.html @@ -10,8 +10,7 @@

    Designed to
    be redesigned

    Get fast and easy access to the features you use most in the new menu. Open the “Customize” panel to add, move or remove any button you want. Keep your favorite features — add-ons, private browsing, Sync and more — one quick click away.

    -

    -

    +

    @@ -37,8 +36,8 @@

    Themes


    Learn more

    -

    Next

    -

    Preview of the currently selected theme +

    Next + Preview of the currently selected theme

    @@ -55,19 +54,16 @@

    Add-ons


    Learn more

    -

    -

    +

    Awesome Bar

    Next

    The Awesome Bar learns as you browse to make your version of Firefox unique. Find and return to your favorite sites without having to remember a URL.

    -

    See what it can do for you -

    +

    See what it can do for you

    -

    Firefox Awesome Bar -

    +

    Firefox Awesome Bar

    diff --git a/test/test-pages/simplyfound-1/expected.html b/test/test-pages/simplyfound-1/expected.html index 15c9bf20..cc6ac81d 100644 --- a/test/test-pages/simplyfound-1/expected.html +++ b/test/test-pages/simplyfound-1/expected.html @@ -1,8 +1,10 @@

    The Raspberry Pi Foundation started by a handful of volunteers in 2012 when they released the original Raspberry Pi 256MB Model B without knowing what to expect.  In a short four-year period they have grown to over sixty full-time employees and have shipped over eight million units to-date.  Raspberry Pi has achieved new heights by being shipped to the International Space Station for research and by being an affordable computing platforms used by teachers throughout the world.  "It has become the all-time best-selling computer in the UK".

    +

    Raspberry Pi 3 - A credit card sized PC that only costs $35 - Image: Raspberry Pi Foundation

    Raspberry Pi Foundation is charity organization that pushes for a digital revolution with a mission to inspire kids to learn by creating computer-powered objects.  The foundation also helps teachers learn computing  skills through free training and readily available tutorials & example code for creating cool things such as music.

    +

    Raspberry Pi in educations - Image: Raspberry Pi Foundation

    In celebration of their 4th year anniversary, the foundation has released Raspberry Pi 3 with the same price tag of $35 USD.  The 3rd revision features a 1.2GHz 64-bit quad-core ARM CPU with integrated Bluetooth 4.1 and 802.11n wireless LAN chipsets.  The ARM Cortex-A53 CPU along with other architectural enhancements making it the fastest Raspberry Pi to-date.  The 3rd revision is reportedly about 50-60% times faster than its predecessor Raspberry Pi 2 and about 10 times faster then the original Raspberry PI.

    Raspberry Pi - Various Usage

    diff --git a/test/test-pages/yahoo-1/expected.html b/test/test-pages/yahoo-1/expected.html index a804e460..215a252b 100644 --- a/test/test-pages/yahoo-1/expected.html +++ b/test/test-pages/yahoo-1/expected.html @@ -2,7 +2,7 @@
    -

    The PlayStation VR

    + The PlayStation VR

    Sony’s PlayStation VR.