collective.html2blocks.utils#

collective.html2blocks.utils.blocks#

Utility functions for Volto block detection and layout construction.

This module provides helpers for identifying Volto blocks and assembling block layout information for use in Volto-based content management systems.

Example

from collective.html2blocks.utils import blocks

block = {"@type": "slate", "value": [...], ...}
blocks.info_from_blocks([block])
collective.html2blocks.utils.blocks.info_from_blocks(raw_blocks: list[VoltoBlock]) VoltoBlocksInfo[source]#

Construct Volto blocks info and layout from a list of blocks.

This function generates unique IDs for each block and assembles them into the Volto blocks structure, including the layout order.

Parameters:

raw_blocks (list[VoltoBlock]) -- List of Volto blocks to include.

Returns:

Dictionary with 'blocks' and 'blocks_layout' keys.

Return type:

VoltoBlocksInfo

Example

>>> blocks = [{"@type": "slate", "value": []}]
>>> info = info_from_blocks(blocks)
>>> print(info)
{'blocks': {'...uuid...': {...}}, 'blocks_layout': {'items': ['...uuid...']}}
collective.html2blocks.utils.blocks.is_volto_block(block: VoltoBlock | SlateBlockItem) bool[source]#

Check if the given block is a Volto block.

A Volto block is identified by the presence of the @type key.

Parameters:

block (VoltoBlock | SlateBlockItem) -- The block to check.

Returns:

True if the block is a Volto block, False otherwise.

Return type:

bool

Example

>>> is_volto_block({"@type": "slate", "value": []})
True
>>> is_volto_block({"type": "p", "children": []})
False

collective.html2blocks.utils.generator#

Generator utility for iterating over Slate item generators.

This module provides helpers for consuming generator-based block conversion functions, filtering out None values, and returning final results.

Example

from collective.html2blocks.utils.generator import item_generator

def my_gen():
    yield {'type': 'p', 'children': []}
    yield None
    yield {'type': 'h1', 'children': []}
    return 'done'

result = list(item_generator(my_gen(), filter_none=True))
collective.html2blocks.utils.generator.item_generator(gen: Generator[VoltoBlock | None, None, SlateBlockItem | None], filter_none: bool = True) Generator[VoltoBlock | None, None, SlateBlockItem | None][source]#

Yield items from a SlateItemGenerator, optionally filtering out None values.

This function consumes a generator, yielding each item. If filter_none is True, None values are skipped. When the generator is exhausted, the return value is returned from the StopIteration exception.

Parameters:
  • gen (SlateItemGenerator) -- The generator to consume.

  • filter_none (bool, optional) -- If True, skip None values. Defaults to True.

Yields:

SlateBlockItem -- Each item produced by the generator.

Returns:

The value returned by the generator when exhausted.

Return type:

Any

Example

def my_gen():
    yield {'type': 'p', 'children': []}
    yield None
    yield {'type': 'h1', 'children': []}
    return 'done'

result = list(item_generator(my_gen(), filter_none=True))

collective.html2blocks.utils.inline#

Constants for inline and empty HTML elements in collective.html2blocks.

This module defines tuples of tag names that are considered inline elements or allowed to be empty when converting HTML to Volto blocks.

Example

from collective.html2blocks.utils.inline import ALLOW_EMPTY_ELEMENTS
from collective.html2blocks.utils.inline import INLINE_ELEMENTS

if tag in INLINE_ELEMENTS:
    # do something...
if tag in ALLOW_EMPTY_ELEMENTS:
    # do something...
collective.html2blocks.utils.inline.ALLOW_EMPTY_ELEMENTS = ('br', 'hr')#

Tuple of tag names allowed to be empty elements.

These elements are permitted to have no content when converting HTML.

Example

if tag in ALLOW_EMPTY_ELEMENTS:
    # do something...
collective.html2blocks.utils.inline.INLINE_ELEMENTS = ('b', 'br', 'code', 'em', 'i', 'link', 's', 'strong', 'sub', 'sup', 'u')#

Tuple of tag names considered inline elements.

These elements are treated as inline when converting HTML to Volto blocks.

Example

if tag in INLINE_ELEMENTS:
    # do something...

collective.html2blocks.utils.markup#

HTML markup utilities for collective.html2blocks.

This module provides functions for parsing, normalizing, and extracting information from HTML markup, including grouping inline elements, filtering, normalizing, and extracting table and style information.

Example

from collective.html2blocks.utils import markup
soup = markup.parse_source('<p>Hello <b>world</b></p>')
children = markup.all_children(soup)
collective.html2blocks.utils.markup.all_children(element: PageElement | Tag, allow_tags: list[str] | None = None) list[PageElement][source]#

Return a list of all children of an element, optionally filtered by tag names.

Parameters:
  • element (PageElement | Tag) -- The element to get children from.

  • allow_tags (list[str], optional) -- List of tag names to include. Defaults to None.

Returns:

List of child elements.

Return type:

list[PageElement]

collective.html2blocks.utils.markup.cleanse_url(url: str) str[source]#

Clean up a URL by decoding HTML entities and normalizing.

Parameters:

url (str) -- The URL to clean.

Returns:

The cleansed URL.

Return type:

str

collective.html2blocks.utils.markup.css_classes(element: Tag) list[str][source]#

Return a list of CSS classes from an element.

Parameters:

element (Tag) -- The element to get classes from.

Returns:

List of CSS class names.

Return type:

list[str]

collective.html2blocks.utils.markup.extract_plaintext(element: Tag) str[source]#

Extract plaintext from an element, handling lists specially.

Parameters:

element (Tag) -- The element to extract text from.

Returns:

The extracted plaintext.

Return type:

str

collective.html2blocks.utils.markup.extract_rows_and_possible_blocks(table_element: Tag, tags_to_extract: list[str]) tuple[list[tuple[Tag, bool]], list[Tag]][source]#

Extract rows and possible blocks from a table element.

Parameters:
  • table_element (Tag) -- The table element to process.

  • tags_to_extract (list[str]) -- List of tag names to extract.

Returns:

Rows and extracted blocks.

Return type:

tuple[list[tuple[Tag, bool]], list[Tag]]

collective.html2blocks.utils.markup.is_empty(tag: Tag | NavigableString) bool[source]#

Check if a tag or string is empty (not allowed or has no content).

Parameters:

tag (Tag | NavigableString) -- The tag or string to check.

Returns:

True if empty, False otherwise.

Return type:

bool

collective.html2blocks.utils.markup.is_ignorable(el: PageElement) bool[source]#

Check if an element is ignorable (empty string or allowed empty tag).

Parameters:

el (PageElement) -- The element to check.

Returns:

True if ignorable, False otherwise.

Return type:

bool

collective.html2blocks.utils.markup.is_inline(element: PageElement, include_span: bool = False) bool[source]#

Check if an element is considered inline.

Parameters:
  • element (PageElement) -- The element to check.

  • include_span (bool, optional) -- Whether to treat span as inline. Defaults to False.

Returns:

True if inline, False otherwise.

Return type:

bool

collective.html2blocks.utils.markup.parse_source(source: str, filter_: bool = True, group: bool = True, normalize: bool = True, block_level_tags: Iterable[str] = ()) Tag[source]#

Parse HTML source and return a normalized soup object.

Parameters:
  • source (str) -- The HTML source to parse.

  • filter (bool, optional) -- Whether to filter children. Defaults to True.

  • group (bool, optional) -- Whether to group inline elements. Defaults to True.

  • normalize (bool, optional) -- Whether to normalize HTML. Defaults to True.

  • block_level_tags (Iterable[str], optional) -- Block-level tags. Defaults to ().

Returns:

The parsed and normalized soup object.

Return type:

Tag

Example

soup = parse_source("<p>Hello <b>world</b></p>")
collective.html2blocks.utils.markup.styles(element: Tag) dict[source]#

Parse style attributes from an element into a dictionary.

Parameters:

element (Tag) -- The element to parse styles from.

Returns:

Dictionary of style properties.

Return type:

dict

collective.html2blocks.utils.markup.table_cell_type(cell: Tag, is_header: bool = False) str[source]#

Get the type of a table cell (header or data).

Parameters:
  • cell (Tag) -- The table cell element.

  • is_header (bool, optional) -- Whether the cell is a header. Defaults to False.

Returns:

header or data.

Return type:

str

collective.html2blocks.utils.markup.url_from_iframe(element: Tag) str[source]#

Parse an iframe element and return its src URL.

Parameters:

element (Tag) -- The iframe element.

Returns:

The src URL of the iframe.

Return type:

str

collective.html2blocks.utils.slate#

Slate block utilities for collective.html2blocks.

This module provides functions for manipulating Slate block items, including wrapping, flattening, grouping, and normalizing block structures for Volto.

Example

from collective.html2blocks.utils import slate
block = slate.wrap_text('Hello world')
paragraph = slate.wrap_paragraph([block])
collective.html2blocks.utils.slate.flatten_children(raw_block_children: list[SlateBlockItem | list]) list[SlateBlockItem][source]#

Flatten nested children lists into a single list of SlateBlockItems.

Parameters:

raw_block_children (list[SlateBlockItem | list]) -- The children to flatten.

Returns:

The flattened list.

Return type:

list[SlateBlockItem]

collective.html2blocks.utils.slate.group_inline_nodes(block_children: list, tag_name: str = 'span') list[source]#

Group inline nodes together under a common tag.

Parameters:
  • block_children (list) -- The nodes to group.

  • tag_name (str, optional) -- The tag name to use. Defaults to span.

Returns:

The grouped nodes.

Return type:

list

collective.html2blocks.utils.slate.group_text_blocks(block_children: list[SlateBlockItem]) list[SlateBlockItem][source]#

Group consecutive text blocks, preserving whitespace.

Parameters:

block_children (list[SlateBlockItem]) -- The blocks to group.

Returns:

The grouped blocks.

Return type:

list[SlateBlockItem]

collective.html2blocks.utils.slate.has_internal_block(block_children: list[SlateBlockItem]) bool[source]#

Check if any child is an inline block.

Parameters:

block_children (list[SlateBlockItem]) -- The children to check.

Returns:

True if any child is inline, False otherwise.

Return type:

bool

collective.html2blocks.utils.slate.invalid_subblock(block: SlateBlockItem | VoltoBlock) bool[source]#

Check if a block should not be a child of a Slate block.

Parameters:

block (SlateBlockItem | VoltoBlock) -- The block to check.

Returns:

True if invalid, False otherwise.

Return type:

bool

collective.html2blocks.utils.slate.is_inline(value: SlateBlockItem | str) bool[source]#

Check if a block or string is considered inline.

Parameters:

value (SlateBlockItem | str) -- The value to check.

Returns:

True if inline, False otherwise.

Return type:

bool

collective.html2blocks.utils.slate.is_simple_text(data: SlateBlockItem) bool[source]#

Check if a SlateBlockItem is simple text (only has text key).

Parameters:

data (SlateBlockItem) -- The block to check.

Returns:

True if simple text, False otherwise.

Return type:

bool

collective.html2blocks.utils.slate.normalize_block_nodes(block_children: list, tag_name: str = 'span') list[source]#

Normalize block nodes, avoiding nested similar tags.

Parameters:
  • block_children (list) -- The block nodes to normalize.

  • tag_name (str, optional) -- The tag name to use. Defaults to span.

Returns:

The normalized nodes.

Return type:

list

collective.html2blocks.utils.slate.process_children(block: SlateBlockItem) SlateBlockItem[source]#

Ensure block children are not empty; add empty text if needed.

Parameters:

block (SlateBlockItem) -- The block to process.

Returns:

The processed block.

Return type:

SlateBlockItem

collective.html2blocks.utils.slate.process_top_level_items(raw_value: list[SlateBlockItem]) list[SlateBlockItem][source]#

Process and wrap top-level items as paragraphs where needed.

Parameters:

raw_value (list[SlateBlockItem]) -- The items to process.

Returns:

The processed items.

Return type:

list[SlateBlockItem]

collective.html2blocks.utils.slate.random() x in the interval [0, 1).#
collective.html2blocks.utils.slate.remove_empty_text(value: list[SlateBlockItem]) list[SlateBlockItem][source]#

Remove empty text blocks from a list of SlateBlockItems.

Parameters:

value (list[SlateBlockItem]) -- The items to filter.

Returns:

The filtered items.

Return type:

list[SlateBlockItem]

collective.html2blocks.utils.slate.table(rows: list[dict | str], css_classes: list[str], hide_headers: bool = False) dict[source]#

Construct a table block from rows and CSS classes.

Parameters:
  • rows (list[dict | str]) -- The table rows.

  • css_classes (list[str]) -- CSS classes for styling.

  • hide_headers (bool, optional) -- Whether to hide headers. Defaults to False.

Returns:

The table block.

Return type:

dict

collective.html2blocks.utils.slate.table_cell(cell_type: str, value: SlateBlockItem) SlateBlockItem[source]#

Construct a table cell block.

Parameters:
  • cell_type (str) -- The cell type (header or data).

  • value (SlateBlockItem) -- The cell value.

Returns:

The table cell block.

Return type:

SlateBlockItem

collective.html2blocks.utils.slate.table_row(cells: list[SlateBlockItem]) SlateBlockItem[source]#

Construct a table row block from cells.

Parameters:

cells (list[SlateBlockItem]) -- The row cells.

Returns:

The table row block.

Return type:

SlateBlockItem

collective.html2blocks.utils.slate.wrap_paragraph(value: list[SlateBlockItem]) SlateBlockItem[source]#

Wrap a list of SlateBlockItems into a paragraph block.

Parameters:

value (list[SlateBlockItem]) -- The children to wrap.

Returns:

The paragraph block.

Return type:

SlateBlockItem

Example

>>> wrap_paragraph([{'text': 'Hello'}])
{'type': 'p', 'children': [{'text': 'Hello'}]}
collective.html2blocks.utils.slate.wrap_text(value: str) SlateBlockItem[source]#

Wrap a string value into a SlateBlockItem with text.

Parameters:

value (str) -- The string to wrap.

Returns:

The wrapped text block.

Return type:

SlateBlockItem

Example

>>> wrap_text('Hello')
{'text': 'Hello'}