Skip to content

Latest commit

 

History

History
46 lines (33 loc) · 1.83 KB

File metadata and controls

46 lines (33 loc) · 1.83 KB

API

All callable SQL lives in schema pageindex. Source of truth: sql/pg_pageindex--1.0.sql.

Ingest

Function Returns Notes
build_pdf(path text, options jsonb default '{}') jsonb C, strict
build_markdown(path text, options jsonb default '{}') jsonb C, strict
build_from_pdf(...) jsonb SQL wrapper → build_pdf
build_from_markdown(...) jsonb SQL wrapper → build_markdown

options is passed through to the PageIndex bridge; shape depends on upstream PageIndex (see PageIndex docs).

Search

Function Returns
tree_search(doc jsonb, query text, options jsonb default '{}') jsonb
tree_search_with_preference(doc jsonb, query text, preference text, options jsonb default '{}') jsonb

Structure helpers

These operate on JSON shaped like PageIndex Document.structure (array of root nodes).

Function Returns
structure_to_list(structure jsonb) jsonb
create_clean_structure_for_description(structure jsonb) jsonb
remove_structure_text(structure jsonb) jsonb
post_process_tree(items jsonb, end_physical_index integer) jsonb
add_node_text(structure jsonb, page_texts jsonb) jsonb
add_node_text_with_labels(structure jsonb, page_texts jsonb) jsonb
generate_summaries_for_structure(structure jsonb, model text) jsonb
generate_doc_description(structure jsonb, model text) text

page_texts is a JSON array of strings aligned to 1-based page order (index 0 → page 1), per comments in the SQL script.

Meta

Function Returns
version() text

Immutable, parallel-safe—returns extension version string from C (pg_pageindex_version).