collective.html2blocks.utils#
collective.html2blocks.utils.blocks#
Utility functions for Volto block detection and layout construction.
This module provides helpers for identifying Volto blocks and assembling block layout information for use in Volto-based content management systems.
Example
from collective.html2blocks.utils import blocks
block = {"@type": "slate", "value": [...], ...}
blocks.info_from_blocks([block])
- collective.html2blocks.utils.blocks.info_from_blocks(raw_blocks: list[VoltoBlock]) VoltoBlocksInfo[source]#
Construct Volto blocks info and layout from a list of blocks.
This function generates unique IDs for each block and assembles them into the Volto blocks structure, including the layout order.
- Parameters:
raw_blocks (list[VoltoBlock]) -- List of Volto blocks to include.
- Returns:
Dictionary with 'blocks' and 'blocks_layout' keys.
- Return type:
VoltoBlocksInfo
Example
>>> blocks = [{"@type": "slate", "value": []}] >>> info = info_from_blocks(blocks) >>> print(info) {'blocks': {'...uuid...': {...}}, 'blocks_layout': {'items': ['...uuid...']}}
- collective.html2blocks.utils.blocks.is_volto_block(block: VoltoBlock | SlateBlockItem) bool[source]#
Check if the given block is a Volto block.
A Volto block is identified by the presence of the
@typekey.- Parameters:
block (VoltoBlock | SlateBlockItem) -- The block to check.
- Returns:
True if the block is a Volto block, False otherwise.
- Return type:
Example
>>> is_volto_block({"@type": "slate", "value": []}) True >>> is_volto_block({"type": "p", "children": []}) False
collective.html2blocks.utils.generator#
Generator utility for iterating over Slate item generators.
This module provides helpers for consuming generator-based block conversion
functions, filtering out None values, and returning final results.
Example
from collective.html2blocks.utils.generator import item_generator
def my_gen():
yield {'type': 'p', 'children': []}
yield None
yield {'type': 'h1', 'children': []}
return 'done'
result = list(item_generator(my_gen(), filter_none=True))
- collective.html2blocks.utils.generator.item_generator(gen: Generator[VoltoBlock | None, None, SlateBlockItem | None], filter_none: bool = True) Generator[VoltoBlock | None, None, SlateBlockItem | None][source]#
Yield items from a
SlateItemGenerator, optionally filtering outNonevalues.This function consumes a generator, yielding each item. If
filter_noneisTrue,Nonevalues are skipped. When the generator is exhausted, the return value is returned from theStopIterationexception.- Parameters:
gen (SlateItemGenerator) -- The generator to consume.
filter_none (bool, optional) -- If
True, skipNonevalues. Defaults toTrue.
- Yields:
SlateBlockItem -- Each item produced by the generator.
- Returns:
The value returned by the generator when exhausted.
- Return type:
Any
Example
def my_gen(): yield {'type': 'p', 'children': []} yield None yield {'type': 'h1', 'children': []} return 'done' result = list(item_generator(my_gen(), filter_none=True))
collective.html2blocks.utils.inline#
Constants for inline and empty HTML elements in collective.html2blocks.
This module defines tuples of tag names that are considered inline elements or allowed to be empty when converting HTML to Volto blocks.
Example
from collective.html2blocks.utils.inline import ALLOW_EMPTY_ELEMENTS
from collective.html2blocks.utils.inline import INLINE_ELEMENTS
if tag in INLINE_ELEMENTS:
# do something...
if tag in ALLOW_EMPTY_ELEMENTS:
# do something...
- collective.html2blocks.utils.inline.ALLOW_EMPTY_ELEMENTS = ('br', 'hr')#
Tuple of tag names allowed to be empty elements.
These elements are permitted to have no content when converting HTML.
Example
if tag in ALLOW_EMPTY_ELEMENTS: # do something...
- collective.html2blocks.utils.inline.INLINE_ELEMENTS = ('b', 'br', 'code', 'em', 'i', 'link', 's', 'strong', 'sub', 'sup', 'u')#
Tuple of tag names considered inline elements.
These elements are treated as inline when converting HTML to Volto blocks.
Example
if tag in INLINE_ELEMENTS: # do something...
collective.html2blocks.utils.markup#
HTML markup utilities for collective.html2blocks.
This module provides functions for parsing, normalizing, and extracting information from HTML markup, including grouping inline elements, filtering, normalizing, and extracting table and style information.
Example
from collective.html2blocks.utils import markup
soup = markup.parse_source('<p>Hello <b>world</b></p>')
children = markup.all_children(soup)
- collective.html2blocks.utils.markup.all_children(element: PageElement | Tag, allow_tags: list[str] | None = None) list[PageElement][source]#
Return a list of all children of an element, optionally filtered by tag names.
- collective.html2blocks.utils.markup.cleanse_url(url: str) str[source]#
Clean up a URL by decoding HTML entities and normalizing.
- collective.html2blocks.utils.markup.css_classes(element: Tag) list[str][source]#
Return a list of CSS classes from an element.
- collective.html2blocks.utils.markup.extract_plaintext(element: Tag) str[source]#
Extract plaintext from an element, handling lists specially.
- Parameters:
element (Tag) -- The element to extract text from.
- Returns:
The extracted plaintext.
- Return type:
- collective.html2blocks.utils.markup.extract_rows_and_possible_blocks(table_element: Tag, tags_to_extract: list[str]) tuple[list[tuple[Tag, bool]], list[Tag]][source]#
Extract rows and possible blocks from a table element.
- collective.html2blocks.utils.markup.is_empty(tag: Tag | NavigableString) bool[source]#
Check if a tag or string is empty (not allowed or has no content).
- Parameters:
tag (Tag | NavigableString) -- The tag or string to check.
- Returns:
Trueif empty,Falseotherwise.- Return type:
- collective.html2blocks.utils.markup.is_ignorable(el: PageElement) bool[source]#
Check if an element is ignorable (empty string or allowed empty tag).
- Parameters:
el (PageElement) -- The element to check.
- Returns:
Trueif ignorable,Falseotherwise.- Return type:
- collective.html2blocks.utils.markup.is_inline(element: PageElement, include_span: bool = False) bool[source]#
Check if an element is considered inline.
- collective.html2blocks.utils.markup.parse_source(source: str, filter_: bool = True, group: bool = True, normalize: bool = True, block_level_tags: Iterable[str] = ()) Tag[source]#
Parse HTML source and return a normalized soup object.
- Parameters:
source (str) -- The HTML source to parse.
filter (bool, optional) -- Whether to filter children. Defaults to
True.group (bool, optional) -- Whether to group inline elements. Defaults to
True.normalize (bool, optional) -- Whether to normalize HTML. Defaults to
True.block_level_tags (Iterable[str], optional) -- Block-level tags. Defaults to
().
- Returns:
The parsed and normalized soup object.
- Return type:
Tag
Example
soup = parse_source("<p>Hello <b>world</b></p>")
- collective.html2blocks.utils.markup.styles(element: Tag) dict[source]#
Parse style attributes from an element into a dictionary.
- Parameters:
element (Tag) -- The element to parse styles from.
- Returns:
Dictionary of style properties.
- Return type:
collective.html2blocks.utils.slate#
Slate block utilities for collective.html2blocks.
This module provides functions for manipulating Slate block items, including wrapping, flattening, grouping, and normalizing block structures for Volto.
Example
from collective.html2blocks.utils import slate
block = slate.wrap_text('Hello world')
paragraph = slate.wrap_paragraph([block])
- collective.html2blocks.utils.slate.flatten_children(raw_block_children: list[SlateBlockItem | list]) list[SlateBlockItem][source]#
Flatten nested children lists into a single list of SlateBlockItems.
- collective.html2blocks.utils.slate.group_inline_nodes(block_children: list, tag_name: str = 'span') list[source]#
Group inline nodes together under a common tag.
- collective.html2blocks.utils.slate.group_text_blocks(block_children: list[SlateBlockItem]) list[SlateBlockItem][source]#
Group consecutive text blocks, preserving whitespace.
- collective.html2blocks.utils.slate.has_internal_block(block_children: list[SlateBlockItem]) bool[source]#
Check if any child is an inline block.
- collective.html2blocks.utils.slate.invalid_subblock(block: SlateBlockItem | VoltoBlock) bool[source]#
Check if a block should not be a child of a Slate block.
- Parameters:
block (SlateBlockItem | VoltoBlock) -- The block to check.
- Returns:
Trueif invalid,Falseotherwise.- Return type:
- collective.html2blocks.utils.slate.is_inline(value: SlateBlockItem | str) bool[source]#
Check if a block or string is considered inline.
- collective.html2blocks.utils.slate.is_simple_text(data: SlateBlockItem) bool[source]#
Check if a SlateBlockItem is simple text (only has
textkey).- Parameters:
data (SlateBlockItem) -- The block to check.
- Returns:
Trueif simple text,Falseotherwise.- Return type:
- collective.html2blocks.utils.slate.normalize_block_nodes(block_children: list, tag_name: str = 'span') list[source]#
Normalize block nodes, avoiding nested similar tags.
- collective.html2blocks.utils.slate.process_children(block: SlateBlockItem) SlateBlockItem[source]#
Ensure block children are not empty; add empty text if needed.
- Parameters:
block (SlateBlockItem) -- The block to process.
- Returns:
The processed block.
- Return type:
SlateBlockItem
- collective.html2blocks.utils.slate.process_top_level_items(raw_value: list[SlateBlockItem]) list[SlateBlockItem][source]#
Process and wrap top-level items as paragraphs where needed.
- collective.html2blocks.utils.slate.random() x in the interval [0, 1).#
- collective.html2blocks.utils.slate.remove_empty_text(value: list[SlateBlockItem]) list[SlateBlockItem][source]#
Remove empty text blocks from a list of SlateBlockItems.
- collective.html2blocks.utils.slate.table(rows: list[dict | str], css_classes: list[str], hide_headers: bool = False) dict[source]#
Construct a table block from rows and CSS classes.
- collective.html2blocks.utils.slate.table_cell(cell_type: str, value: SlateBlockItem) SlateBlockItem[source]#
Construct a table cell block.
- Parameters:
cell_type (str) -- The cell type (
headerordata).value (SlateBlockItem) -- The cell value.
- Returns:
The table cell block.
- Return type:
SlateBlockItem
- collective.html2blocks.utils.slate.table_row(cells: list[SlateBlockItem]) SlateBlockItem[source]#
Construct a table row block from cells.
- Parameters:
cells (list[SlateBlockItem]) -- The row cells.
- Returns:
The table row block.
- Return type:
SlateBlockItem
- collective.html2blocks.utils.slate.wrap_paragraph(value: list[SlateBlockItem]) SlateBlockItem[source]#
Wrap a list of SlateBlockItems into a paragraph block.
- Parameters:
value (list[SlateBlockItem]) -- The children to wrap.
- Returns:
The paragraph block.
- Return type:
SlateBlockItem
Example
>>> wrap_paragraph([{'text': 'Hello'}]) {'type': 'p', 'children': [{'text': 'Hello'}]}