Skip to content

Commit d0d4109

Browse files
Merge pull request #2562 from redis/DOC-6122-toc-metadata
DOC-6122 page table-of-contents metadata
2 parents d9be4bd + f8fe60e commit d0d4109

File tree

8 files changed

+380
-2
lines changed

8 files changed

+380
-2
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Implementation Notes: Table of Contents Metadata
2+
3+
## Overview
4+
5+
This document captures lessons learned from implementing auto-generated table of contents (TOC) metadata for Redis documentation pages. These insights should help guide future metadata feature implementations.
6+
7+
## Key Lessons
8+
9+
### 1. Start with Hugo's Built-in Functions
10+
11+
**Lesson**: Always check what Hugo provides before building custom solutions.
12+
13+
**Context**: Initial attempts tried to manually extract headers from page content using custom partials. This was complex, error-prone, and required parsing HTML/Markdown.
14+
15+
**Solution**: Hugo's `.TableOfContents` method already generates HTML TOC from page headings. Using this as the source was much simpler and more reliable.
16+
17+
**Takeaway**: For future metadata features, audit Hugo's built-in methods first. They often solve 80% of the problem with minimal code.
18+
19+
### 2. Regex Substitution for Format Conversion
20+
21+
**Lesson**: Simple regex transformations can convert between formats more reliably than complex parsing.
22+
23+
**Context**: Converting HTML to JSON seemed like it would require a full HTML parser or complex state machine.
24+
25+
**Solution**: Breaking the conversion into small, sequential regex steps:
26+
1. Remove wrapper elements (`<nav>`, `</nav>`)
27+
2. Replace structural tags (`<ul>``[`, `</ul>``]`)
28+
3. Replace content tags (`<li><a href="#ID">TITLE</a>``{"id":"ID","title":"TITLE"`)
29+
4. Add structural elements (commas, nested arrays)
30+
31+
**Takeaway**: For format conversions, think in terms of sequential substitution patterns rather than parsing. This is often simpler and more maintainable.
32+
33+
### 3. Hugo Template Whitespace Matters
34+
35+
**Lesson**: Hugo template whitespace and comments generate output that affects final formatting.
36+
37+
**Context**: Generated JSON had many blank lines, making it less readable.
38+
39+
**Solution**: Use Hugo's whitespace trimming markers (`{{-` and `-}}`) to prevent unwanted newlines.
40+
41+
**Takeaway**: When generating structured output (JSON, YAML), always consider whitespace. Test the final output, not just the template logic.
42+
43+
### 4. Markdown Templates Have Different Processing Rules
44+
45+
**Lesson**: Hugo's markdown template processor (`.md` files) behaves differently from HTML templates.
46+
47+
**Context**: Initial attempts to include metadata in markdown output failed because the template processor treated code blocks as boundaries.
48+
49+
**Solution**: Place metadata generation in the template itself, not in content blocks. Use `safeHTML` filter to prevent HTML entity escaping.
50+
51+
**Takeaway**: When targeting multiple output formats, test each format separately. Markdown templates have unique constraints that HTML templates don't have.
52+
53+
### 5. Validate Against Schema Early
54+
55+
**Lesson**: Create the schema before or immediately after implementation, not after.
56+
57+
**Context**: Schema was created last, after implementation was complete.
58+
59+
**Better approach**: Define the schema first, then implement to match it. This:
60+
- Clarifies the target structure
61+
- Enables validation during development
62+
- Provides documentation for implementers
63+
- Helps catch structural issues early
64+
65+
**Takeaway**: For future metadata features, write the schema first as a specification.
66+
67+
### 6. Test Multiple Page Types
68+
69+
**Lesson**: Metadata features must work across different page types with different content.
70+
71+
**Context**: Implementation was tested on data types pages and command pages, which have different metadata fields.
72+
73+
**Takeaway**: Always test on at least 2-3 different page types to ensure the feature is robust and handles optional fields correctly.
74+
75+
## Implementation Checklist for Future Metadata Features
76+
77+
When implementing new metadata features, follow this order:
78+
79+
1. **Define the schema** (`static/schemas/feature-name.json`)
80+
- Specify required and optional fields
81+
- Use JSON Schema Draft 7
82+
- Include examples
83+
84+
2. **Create documentation** (`build/metadata_docs/FEATURE_NAME_FORMAT.md`)
85+
- Explain the purpose and structure
86+
- Show examples
87+
- Document embedding locations (HTML, Markdown)
88+
89+
3. **Implement the feature**
90+
- Create/modify Hugo partials
91+
- Test on multiple page types
92+
- Verify output in both HTML and Markdown formats
93+
94+
4. **Validate the output**
95+
- Write validation scripts
96+
- Test against the schema
97+
- Check whitespace and formatting
98+
99+
5. **Document implementation notes**
100+
- Capture lessons learned
101+
- Note any workarounds or gotchas
102+
- Provide guidance for future similar features
103+
104+
## Common Gotchas
105+
106+
- **HTML entity escaping**: Use `safeHTML` filter when outputting HTML/JSON in markdown templates
107+
- **Whitespace in templates**: Use `{{-` and `-}}` to trim whitespace
108+
- **Nested structures**: Test deeply nested content to ensure regex patterns handle all cases
109+
- **Optional fields**: Remember that not all pages have all metadata fields
110+
- **Markdown vs HTML**: Always test both output formats
111+
112+
## Tools and Techniques
113+
114+
- **Hugo filters**: `replaceRE`, `jsonify`, `safeHTML`
115+
- **Validation**: Python's `jsonschema` library for schema validation
116+
- **Testing**: Extract metadata from generated files and validate against schema
117+
- **Debugging**: Use `grep` and `head` to inspect generated output
118+
119+
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Page Metadata Format
2+
3+
## Overview
4+
5+
Redis documentation pages include AI-friendly metadata that helps AI agents understand page structure, content, and navigation. This metadata is automatically generated during the Hugo build process and embedded in both HTML and Markdown output formats.
6+
7+
## Metadata Structure
8+
9+
### Core Fields (Required)
10+
11+
- **`title`** (string, required): The page title
12+
- **`description`** (string, required): A brief description of the page content
13+
14+
### Navigation Fields
15+
16+
- **`tableOfContents`** (object): Hierarchical structure of page sections
17+
- **`sections`** (array): Array of top-level sections
18+
- **`id`** (string): Unique identifier matching the heading anchor ID
19+
- **`title`** (string): Display title of the section
20+
- **`children`** (array, optional): Nested subsections with the same structure
21+
22+
### Categorization Fields
23+
24+
- **`categories`** (array): Category tags for the page (e.g., `["docs", "develop", "stack"]`)
25+
- **`scope`** (string): Scope or domain of the page content
26+
- **`topics`** (array): Related topics
27+
- **`relatedPages`** (array): Links to related documentation pages
28+
29+
### Command Reference Fields (for `/commands/` pages)
30+
31+
- **`arguments`** (array): Command arguments
32+
- **`syntax_fmt`** (string): Command syntax format
33+
- **`complexity`** (string): Time complexity of the command
34+
- **`group`** (string): Command group
35+
- **`command_flags`** (array): Flags associated with the command
36+
- **`acl_categories`** (array): ACL categories for the command
37+
- **`since`** (string): Redis version when the command was introduced
38+
- **`arity`** (integer): Number of arguments the command accepts
39+
- **`key_specs`** (array): Key specifications for the command
40+
41+
## Example
42+
43+
```json
44+
{
45+
"title": "Redis data types",
46+
"description": "Overview of data types supported by Redis",
47+
"categories": ["docs", "develop", "stack", "oss"],
48+
"tableOfContents": {
49+
"sections": [
50+
{
51+
"id": "data-types",
52+
"title": "Data types",
53+
"children": [
54+
{"id": "strings", "title": "Strings"},
55+
{"id": "lists", "title": "Lists"},
56+
{"id": "sets", "title": "Sets"}
57+
]
58+
},
59+
{
60+
"id": "time-series",
61+
"title": "Time series"
62+
}
63+
]
64+
}
65+
}
66+
```
67+
68+
## Embedding
69+
70+
### HTML Output
71+
72+
Metadata is embedded in a `<script>` tag in the page header:
73+
74+
```html
75+
<script type="application/json" data-ai-metadata>
76+
{...metadata...}
77+
</script>
78+
```
79+
80+
### Markdown Output (`.html.md`)
81+
82+
Metadata is embedded in a JSON code block at the top of the page:
83+
84+
````markdown
85+
```json metadata
86+
{...metadata...}
87+
```
88+
````
89+
90+
## Auto-Generation
91+
92+
The `tableOfContents` is automatically generated from page headings using Hugo's built-in `.TableOfContents` method. The HTML structure is converted to JSON using regex substitutions in the `layouts/partials/toc-json-regex.html` partial.
93+
94+
## Schema
95+
96+
The complete JSON schema is available at: `https://redis.io/schemas/page-metadata.json`
97+
98+
This schema enables:
99+
- Validation of metadata structure
100+
- IDE autocomplete and type checking
101+
- AI agent understanding of page structure
102+
- Consistent metadata across all pages
103+
104+
## Notes
105+
106+
- The in-page JSON metadata does **not** include a `$schema` reference. The schema is available separately for validation and documentation purposes.
107+
- The metadata is auto-generated during the Hugo build process and does not require manual maintenance.
108+

layouts/_default/section.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@
1616
"key_specs": {{ .Params.key_specs | jsonify }}{{ end }}{{ if .Params.topics }},
1717
"topics": {{ .Params.topics | jsonify }}{{ end }}{{ if .Params.relatedPages }},
1818
"relatedPages": {{ .Params.relatedPages | jsonify }}{{ end }}{{ if .Params.scope }},
19-
"scope": {{ .Params.scope | jsonify }}{{ end }}
19+
"scope": {{ .Params.scope | jsonify }}{{ end }},
20+
"tableOfContents": {{ partial "toc-json-regex.html" . }}
2021
}
2122
```
2223

layouts/_default/single.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@
1616
"key_specs": {{ .Params.key_specs | jsonify }}{{ end }}{{ if .Params.topics }},
1717
"topics": {{ .Params.topics | jsonify }}{{ end }}{{ if .Params.relatedPages }},
1818
"relatedPages": {{ .Params.relatedPages | jsonify }}{{ end }}{{ if .Params.scope }},
19-
"scope": {{ .Params.scope | jsonify }}{{ end }}
19+
"scope": {{ .Params.scope | jsonify }}{{ end }},
20+
"tableOfContents": {{ partial "toc-json-regex.html" . }}
2021
}
2122
```
2223

layouts/partials/ai-metadata-body.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,8 @@
3737
{{- $metadata = merge $metadata (dict "scope" .Params.scope) -}}
3838
{{- end -}}
3939
{{- $json := $metadata | jsonify -}}
40+
{{- $toc := partial "toc-json-regex.html" . -}}
41+
{{- /* Manually insert tableOfContents into JSON string */ -}}
42+
{{- $json = $json | replaceRE `}$` (printf `,"tableOfContents":%s}` $toc) -}}
4043
{{- printf `<div hidden data-redis-metadata="page">%s</div>` $json | safeHTML -}}
4144

layouts/partials/ai-metadata.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,8 @@
3737
{{- $metadata = merge $metadata (dict "scope" .Params.scope) -}}
3838
{{- end -}}
3939
{{- $json := $metadata | jsonify -}}
40+
{{- $toc := partial "toc-json-regex.html" . -}}
41+
{{- /* Manually insert tableOfContents into JSON string */ -}}
42+
{{- $json = $json | replaceRE `}$` (printf `,"tableOfContents":%s}` $toc) -}}
4043
{{- printf `<script type="application/json" data-ai-metadata>%s</script>` $json | safeHTML -}}
4144

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{{- /* Convert Hugo's HTML table of contents to JSON using regex substitutions */ -}}
2+
{{- /* Input: page object */ -}}
3+
{{- /* Output: JSON structure representing the table of contents */ -}}
4+
{{- $toc := .TableOfContents -}}
5+
{{- /* Remove the nav wrapper and all newlines/extra whitespace */ -}}
6+
{{- $toc = $toc | replaceRE "<nav[^>]*>" "" -}}
7+
{{- $toc = $toc | replaceRE "</nav>" "" -}}
8+
{{- $toc = $toc | replaceRE "\\n\\s*" "" -}}
9+
{{- /* Step 1: Replace <ul> with opening bracket for children array */ -}}
10+
{{- $toc = $toc | replaceRE "<ul>" "[" -}}
11+
{{- $toc = $toc | replaceRE "</ul>" "]" -}}
12+
{{- /* Step 2: Replace <li><a href="#ID">TITLE</a> with {"id":"ID","title":"TITLE" */ -}}
13+
{{- $toc = $toc | replaceRE "<li><a href=\"#([^\"]+)\">([^<]+)</a>" "{\"id\":\"$1\",\"title\":\"$2\"" -}}
14+
{{- /* Step 3: Replace </li> with } */ -}}
15+
{{- $toc = $toc | replaceRE "</li>" "}" -}}
16+
{{- /* Step 4: Handle nested structure - replace ][ with ],[ (sibling arrays) */ -}}
17+
{{- $toc = $toc | replaceRE "\\]\\[" "],[" -}}
18+
{{- /* Step 4b: Handle nested structure - replace [ with ,"children":[ (child arrays) */ -}}
19+
{{- $toc = $toc | replaceRE "\"\\[" "\",\"children\":[" -}}
20+
{{- /* Step 5: Add commas between sibling objects - replace }{ with },{ */ -}}
21+
{{- $toc = $toc | replaceRE "\\}\\{" "},{" -}}
22+
{{- /* Step 6: Wrap the entire structure */ -}}
23+
{{- $toc = print "{\"sections\":" $toc "}" -}}
24+
{{- /* Step 7: Remove all newlines and extra whitespace for clean output */ -}}
25+
{{- $toc = $toc | replaceRE "\\n\\s*" "" -}}
26+
{{- /* Output the JSON - use safeHTML to prevent quote escaping */ -}}
27+
{{- $toc | safeHTML -}}
28+

0 commit comments

Comments
 (0)