describegpt

Infer a "neuro-symbolic" Data Dictionary, Description & Tags or ask questions about a CSV with a configurable, Mini Jinja prompt file, using any OpenAI API-compatible LLM, including local LLMs. (e.g. Markdown, JSON, TOON, JSON Schema, Semantic Markdown, OKF, Everything, Spanish, Mandarin, Controlled Tags; --prompt "What are the top 10 complaint types by community board & borough by year?" - deterministic, hallucination-free SQL RAG result; iterative, session-based SQL RAG refinement - refined SQL RAG result)

Table of Contents | Source: src/cmd/describegpt.rs | 📇🗃️🤖🌐🪄📚⛩️

Description ↩

Create a "neuro-symbolic" Data Dictionary and/or infer Description & Tags about a Dataset using an OpenAI API-compatible Large Language Model (LLM).

It does this by compiling Summary Statistics & a Frequency Distribution of the Dataset, and then prompting the LLM with detailed, configurable, Mini Jinja-templated prompts with these extended statistical context.

The Data Dictionary is "neuro-symbolic" as it uses a hybrid approach. It's primarily populated deterministically using Summary Statistics & Frequency Distribution, and only the human-friendly Label and Description (plus Content Type when --infer-content-type is set) are populated by the "neural network" LLM using the same statistical context.

CHAT MODE:
You can also use the --prompt option to ask a natural language question about the Dataset.

If the question can be answered by solely using the Dataset's Summary Statistics and Frequency Distribution data, the LLM will return the answer directly.

CHAT SQL RETRIEVAL-AUGMENTED GENERATION (RAG) SUB-MODE:
If the question cannot be answered using the Dataset's Summary Statistics & Frequency Distribution, it will first create a Data Dictionary and a small random sample (default: 100 rows) of the Dataset and provide it to the LLM as additional context to help it generate a SQL query that DETERMINISTICALLY answers the natural language question.

Two SQL dialects are currently supported - DuckDB (highly recommended) & Polars. If the QSV_DUCKDB_PATH environment variable is set to the absolute path of the DuckDB binary, DuckDB will be used to answer the question. Otherwise, if the "polars" feature is enabled, Polars SQL will be used.

If neither DuckDB nor Polars is available, the SQL query will be returned in a Markdown code block, along with the reasoning behind the query.

Even in "SQL RAG" mode, though the SQL query is guaranteed to be deterministic, the query itself may not be correct. In the event of a SQL query execution failure, run the same --prompt with the --fresh option to request the LLM to generate a new SQL query.

When using DuckDB, all loaded DuckDB extensions will be sent as additional context to the LLM to let it know what functions (even UDFs!) it can use in the SQL queries it generates. If you want a specific function or technique to be used in the SQL query, mention it in the prompt.

SUPPORTED MODELS & LLM PROVIDERS:
OpenAI's open-weights gpt-oss model (both 20b and 120b variants) was used during development & is recommended for most use cases. It was also tested with OpenAI, TogetherAI, OpenRouter and Google Gemini cloud providers. For Gemini, use the base URL "https://generativelanguage.googleapis.com/v1beta/openai". Local LLMs tested include Ollama, Jan and LM Studio.

Note

LLMs are prone to inaccurate information being produced. Verify output results before using them.

CACHING:
As LLM inferencing takes time and can be expensive, describegpt caches the LLM inferencing results in a either a disk cache (default) or a Redis cache. It does so by calculating the BLAKE3 hash of the input file and using it as the primary cache key along with the prompt type, model and every flag that influences the rendered prompt (including prompt-file, context-file, language, tag-vocab, num-tags, enum-threshold, infer-content-type, sample-size, fewshot-examples, the QSV_DUCKDB_PATH toggle and the generated Data Dictionary), so changing any of them produces a fresh LLM call rather than stale cached output.

The default disk cache is stored in the ~/.qsv-cache/describegpt directory with a default TTL of 28 days and cache hits NOT refreshing an existing cached value's TTL. Adjust the QSV_DISKCACHE_TTL_SECS & QSV_DISKCACHE_TTL_REFRESH env vars to change disk cache settings.

Alternatively a Redis cache can be used instead of the disk cache. This is especially useful if you want to share the cache across the network with other users or computers. The Redis cache is stored in database 3 by default with a TTL of 28 days and cache hits NOT refreshing an existing cached value's TTL. Adjust the QSV_DG_REDIS_CONNSTR, QSV_REDIS_MAX_POOL_SIZE, QSV_REDIS_TTL_SECS & QSV_REDIS_TTL_REFRESH env vars to change Redis cache settings.

Examples ↩

Generate a Data Dictionary, Description & Tags of data.csv using default OpenAI gpt-oss-20b model (replace <API_KEY> with your OpenAI API key)

qsv describegpt data.csv --api-key <API_KEY> --all

Generate a Data Dictionary of data.csv using the DeepSeek R1:14b model on a local Ollama instance

qsv describegpt data.csv -u http://localhost:11434/v1 --model deepseek-r1:14b --dictionary

Generate a Data Dictionary that also infers a semantic Content Type for each field (e.g. email, city, latitude) so the dictionary can later drive synthetic data generation

qsv describegpt data.csv --dictionary --infer-content-type

Ask questions about the sample NYC 311 dataset using LM Studio with the default gpt-oss-20b model. Questions that can be answered using the Summary Statistics & Frequency Distribution of the dataset.

qsv describegpt NYC_311.csv --prompt "What is the most common complaint?"

Ask detailed natural language questions that require SQL queries and auto-invoke SQL RAG mode Generate a DuckDB SQL query to answer the question

QSV_DUCKDB_PATH=/path/to/duckdb \
qsv describegpt NYC_311.csv -p "What's the breakdown of complaint types by borough descending order?"

Prompt requires a natural language query. Convert query to SQL using the LLM and save results to a file with the --sql-results option. If generated SQL query runs successfully, the file is "results.csv". Otherwise, it is "results.sql".

qsv describegpt NYC_311.csv -p "Aggregate complaint types by community board" --sql-results results

Cache Dictionary, Description & Tags inference results using the Redis cache instead of the disk cache

qsv describegpt data.csv --all --redis-cache

Get fresh Description & Tags inference results from the LLM and refresh disk cache entries for both

qsv describegpt data.csv --description --tags --fresh

Get fresh inference results from the LLM and refresh the Redis cache entries for all three

qsv describegpt data.csv --all --redis-cache --fresh

Forget a cached response for data.csv's data dictionary if it exists and then exit

qsv describegpt data.csv --dictionary --forget

Flush/Remove ALL cached entries in the disk cache

qsv describegpt --flush-cache

Flush/Remove ALL cached entries in the Redis cache

qsv describegpt --redis-cache --flush-cache

Generate Data Dictionary but exclude ID columns from frequency analysis to reduce overhead

qsv describegpt data.csv --dictionary --freq-options "--select '!id,!uuid' --limit 20"

Generate Data Dictionary, Description & Tags but reduce frequency context by showing only top 5 values per field

qsv describegpt data.csv --all --freq-options "--limit 5"

Generate Description using weighted frequencies with ascending sort

qsv describegpt data.csv --description --freq-options "--limit 50 --asc --weight count_column"

Generate a Data Dictionary, Description & Tags using a previously compiled stats CSV file and frequency CSV file instead of running the stats and frequency commands

qsv describegpt data.csv --all --stats-options "file:my_stats.csv" --freq-options "file:my_freq.csv"

For more examples, see tests.

For more detailed info on how describegpt works and how to prepare a prompt file, see https://github.com/dathere/qsv/blob/master/docs/Describegpt.md and https://github.com/dathere/qsv/wiki/AI-and-Documentation#describegpt

Usage ↩

qsv describegpt [options] [<input>]
qsv describegpt --prepare-context [options] [<input>]
qsv describegpt --process-response [options]
qsv describegpt (--redis-cache) (--flush-cache)
qsv describegpt --help

Data Analysis/Inferencing Options ↩

Option	Type	Description
`‑‑dictionary`	flag	Create a Data Dictionary using a hybrid "neuro-symbolic" pipeline - i.e. the Dictionary is populated deterministically using Summary Statistics and Frequency Distribution data, and only the human-friendly Label and Description (and Content Type when --infer-content-type is set) are populated by the LLM using the same statistical context.
`‑‑description`	flag	Infer a general Description of the dataset based on detailed statistical context. An Attribution signature is embedded in the Description.
`‑‑tags`	flag	Infer Tags that categorize the dataset based on detailed statistical context. Useful for grouping datasets and filtering.
`‑A,` `‑‑all`	flag	Shortcut for --dictionary --description --tags.

Dictionary Options ↩

Option	Type	Description	Default
`‑‑num‑examples`	integer	The number of Example values to include in the dictionary.	`5`
`‑‑truncate‑str`	integer	The maximum length of an Example value in the dictionary. An ellipsis is appended to the truncated value. If zero, no truncation is performed.	`25`
`‑‑infer‑content‑type`	flag	Also have the LLM classify each field's semantic "Content Type", mapped to a curated, documented vocabulary (e.g. email, city, category, name, credit card, etc.) see https://github.com/dathere/qsv/blob/master/src/cmd/synthesize/faker_map.rs. Adds a "Content Type" column/field to the Data Dictionary output. Fields where cardinality equals the row count (i.e. every row has a distinct non-null value - primary keys, surrogate keys, sequence numbers) are deterministically classified as "unique_id", overriding any token the LLM returned for that field. For Date/DateTime fields, the LLM also infers the column's strftime date format (e.g. "date:%m/%d/%Y"); the Markdown, JSON & JSON Schema dictionaries then render Min/Max AND Examples in that inferred format so they match how the dates actually appear in the data, instead of qsv's normalized form. (TSV output keeps Min/Max & Examples in qsv's raw normalized form.)
`‑‑two‑pass`	flag	Run a second LLM call that takes the full first-pass Data Dictionary as JSON context and refines each field's Label, Description and (when --infer-content-type is set) Content Type using cross-field awareness. The LLM can then relate fields that belong together (e.g. street_no + street_name + city + state + zip describing a single mailing address; first_name + last_name naming a single person; lat + lng forming a coordinate pair). The refined dictionary becomes the emitted output and is also what downstream Description, Tags and Prompt inference phases see as dictionary context. Roughly doubles dictionary LLM cost and latency, so opt-in. Most useful when combined with --infer-content-type. Allowed with the --dictionary, --all and --prompt inference flags. Mutually exclusive with --prepare-context and --process-response (MCP sampling is single-turn per inference phase).
`‑‑addl‑cols`	flag	Add additional columns to the dictionary from the Summary Statistics.
`‑‑addl‑cols‑list`	string	A comma-separated list of additional stats columns to add to the dictionary. The columns must be present in the Summary Statistics. If the columns are not present in the Summary Statistics or already in the dictionary, they will be ignored. These values are case-insensitive and automatically set the --addl-cols option to true. "everything" can be used to add all 45 "available" statistics columns. You can adjust the available columns with --stats-options. "everything!" automatically sets --stats-options to compute "all" 51 supported stats. The 6 addl cols are the mode/s & antimode/s stats with each having counts & occurrences. "moar" gets you even moar stats, with detailed outliers info. "moar!" gets you even moar with --advanced stats (Kurtosis, Gini Coefficient & Shannon Entropy)	`sort_order, sortiness, mean, median, mad, stddev, variance, cv`

Tag Options ↩

Option	Type	Description	Default
`‑‑num‑tags`	integer	The maximum number of tags to infer when the --tags option is used. Maximum allowed value is 50.	`10`
`‑‑tag‑vocab`	string	The CSV file containing the tag vocabulary to use for inferring tags. If no tag vocabulary file is provided, the model will use free-form tags. Supports local files, remote URLs (http/https), CKAN resources (ckan://), and dathere:// scheme. Remote resources are cached locally. The CSV file must have two columns with headers: first column is the tag, second column is the description. Note that qsvlite only supports local files.
`‑‑cache‑dir`	string	The directory to use for caching downloaded tag vocabulary resources. If the directory does not exist, qsv will attempt to create it. If the QSV_CACHE_DIR envvar is set, it will be used instead.	`~/.qsv-cache`
`‑‑ckan‑api`	string	The URL of the CKAN API to use for downloading tag vocabulary resources with the "ckan://" scheme. If the QSV_CKAN_API envvar is set, it will be used instead.	`https://data.dathere.com/api/3/action`
`‑‑ckan‑token`	string	The CKAN API token to use. Only required if downloading private resources. If the QSV_CKAN_TOKEN envvar is set, it will be used instead.

Stats/Frequency Options ↩

Option	Type	Description	Default
`‑‑stats‑options`	string	Options for the stats command used to generate summary statistics. If it starts with "file:" prefix, the statistics are read from the specified CSV file instead of running the stats command. e.g. "file:my_custom_stats.csv"	`--infer-dates --infer-boolean --mad --quartiles --percentiles --force --stats-jsonl`
`‑‑freq‑options`	string	Options for the frequency command used to generate frequency distributions. You can use this to exclude certain variable types from frequency analysis (e.g., --select '!id,!uuid'), limit results differently per use case, or control output format. If --limit is specified here, it takes precedence over --enum-threshold. If it starts with "file:" prefix, the frequency data is read from the specified CSV file instead of running the frequency command. e.g. "file:my_custom_frequency.csv" A "file:"-backed CSV is assumed to use frequency's default "(NULL)" null text; a custom --null-text in a file-supplied CSV is not recognized when validating inferred date/datetime formats.	`--rank-strategy dense`
`‑‑enum‑threshold`	integer	The threshold for compiling Enumerations with the frequency command before bucketing other unique values into the "Other" category. This is a convenience shortcut for --freq-options --limit . If --freq-options contains --limit, this flag is ignored.	`10`

Custom Prompt Options ↩

Option	Type	Description	Default
`‑p,` `‑‑prompt`	string	Custom prompt to answer questions about the dataset. The prompt will be answered based on the dataset's Summary Statistics, Frequency data & Data Dictionary. If the prompt CANNOT be answered by looking at these metadata, a SQL query will be generated to answer the question. If the "polars" or the "QSV_DUCKDB_PATH" environment variable is set & the `--sql-results` option is used, the SQL query will be automatically executed and its results returned. Otherwise, the SQL query will be returned along with the reasoning behind it. If it starts with "file:" prefix, the prompt is read from the file specified. e.g. "file:my_long_prompt.txt"
`‑‑sql‑results`	string	The file to save the SQL query results to. Only valid if the --prompt option is used & the "polars" or the "QSV_DUCKDB_PATH" environment variable is set. If the SQL query executes successfully, the results will be saved with a ".csv" extension. Otherwise, it will be saved with a ".sql" extension so the user can inspect why it failed and modify it.
`‑‑prompt‑file`	string	The configurable TOML file containing prompts to use for inferencing. If no file is provided, default prompts will be used. The prompt file uses the Mini Jinja template engine (https://docs.rs/minijinja) See https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml
`‑‑context‑file`	string	Path to a file with additional context about the dataset - e.g. variable/code labels, provenance & domain notes - injected into the prompts as the {{ context }} Mini Jinja variable. The file TYPE is sniffed from its contents (not its extension). Supported types: plain text, Markdown, CSV, Excel/ODS spreadsheets (extracted to CSV), and PDF or image files (JPEG/PNG/WebP/GIF) sent to the LLM as a multimodal attachment (needs a multimodal model & endpoint; max ~32 MB). Word/PowerPoint (docx/pptx) are NOT supported - convert to PDF or text. By default qsv injects this context into the USER message; custom prompt file templates may reference {{ context }} anywhere to place it instead. If the option is unset or the file is empty, {{ context }} renders as an empty string and the prompts are unaffected. The file's contents are part of the cache key, so editing it produces a fresh LLM call.
`‑‑markdown‑template`	string	TOML file with Mini Jinja templates for Markdown output. The TOML contains four wrapper templates - one per inference kind: dictionary_md_template, description_md_template, tags_md_template and custom_prompt_md_template - plus a dictionary_md_body_template that drives the per-field dictionary table that fills the dictionary wrapper's {{ llm_response }}. All template fields are optional; any omitted field falls back to the embedded default, so a minimal TOML can override just the templates you want to change. Custom Mini Jinja filters (pipe_escape, br_replace, human_count, dict_cell, humanize_examples) and template variables are documented inline in the default TOML referenced below. If no file is provided, built-in defaults are used (matching legacy output). See https://github.com/dathere/qsv/blob/master/resources/describegpt_md_defaults.toml
`‑‑sample‑size`	integer	The number of rows to randomly sample from the input file for the sample data. Uses the INDEXED sampling method with the qsv sample command.	`100`
`‑‑fewshot‑examples`	flag	By default, few-shot examples are NOT included in the LLM prompt when generating SQL queries. When this option is set, few-shot examples in the default prompt file are included. Though this will increase the quality of the generated SQL, it comes at a cost - increased LLM API call cost in terms of tokens and execution time. See https://en.wikipedia.org/wiki/Prompt_engineering for more info.
`‑‑session`	string	Enable stateful session mode for iterative SQL RAG refinement. The session name is the file path of the markdown file where session messages will be stored. When used with --prompt, subsequent queries in the same session will refine the baseline SQL query. SQL query results (10-row sample) and errors are automatically included in subsequent messages for context.
`‑‑session‑len`	integer	Maximum number of recent messages to keep in session context before summarizing older messages. Only used when --session is specified.	`10`
`‑‑no‑score‑sql`	flag	Disable scoresql validation of generated SQL queries before execution. By default, when --prompt generates a SQL query and --sql-results is set, the query is scored and iteratively improved if below threshold.
`‑‑score‑threshold`	integer	Minimum scoresql score for a SQL query to be accepted. Typical range is 0-100; values >100 will always trigger retries and the below-threshold warning.	`50`
`‑‑score‑max‑retries`	integer	Max LLM re-prompts to improve a low-scoring SQL query.	`3`

LLM API Options ↩

Option	Type	Description	Default
`‑u,` `‑‑base‑url`	string	The LLM API URL. Supports APIs & local LLMs compatible with the OpenAI API specification. Some common base URLs: OpenAI: https://api.openai.com/v1 Gemini: https://generativelanguage.googleapis.com/v1beta/openai TogetherAI: https://api.together.ai/v1
`‑m,` `‑‑model`	string	The model to use for inferencing. This model must be compatible with OpenAI API spec. Works with both cloud LLM providers and local LLMs. Tested open weights models include OpenAI's gpt-oss-20b and gpt-oss-120b; Google's Gemma family of open models; and Mistral's Magistral reasoning models. Precedence: explicit CLI flag > QSV_LLM_MODEL env var > prompt file model > built-in default (openai/gpt-oss-20b). No docopt default — same rationale as --base-url above.
`‑‑language`	string	The output language/dialect/tone to use for the response. (e.g., "Spanish", "French", "Hindi", "Mandarin", "Italian", "Castilian", "Franglais", "Taglish", "Pig Latin", "Valley Girl", "Pirate", "Shakespearean English", "Chavacano", "Gen Z", "Yoda", etc.)
`‑‑addl‑props`	string	Additional model properties to pass to the LLM chat/completion API. Various models support different properties beyond the standard ones. For instance, gpt-oss-20b supports the "reasoning_effort" property. e.g. to set the "reasoning_effort" property to "high" & "temperature" to 0.5, use '{"reasoning_effort": "high", "temperature": 0.5}'
`‑k,` `‑‑api‑key`	string	The API key to use. If set, takes precedence over the QSV_LLM_APIKEY envvar. Required when the base URL is not localhost. Set to NONE to suppress sending the API key.
`‑t,` `‑‑max‑tokens`	integer	Limits the number of generated tokens in the output. Set to 0 to disable token limits. If the --base-url is localhost, indicating a local LLM, the default is automatically set to 0.	`10000`
`‑‑timeout`	integer	Timeout for completions in seconds. If 0, no timeout is used. Defaults to 300 if not set. If the --base-url is localhost, indicating a local LLM, the timeout is automatically disabled unless you set --timeout (or the QSV_TIMEOUT envvar) explicitly.
`‑‑user‑agent`	string	Specify custom user agent. It supports the following variables - $QSV_VERSION, $QSV_TARGET, $QSV_BIN_NAME, $QSV_KIND and $QSV_COMMAND. Try to follow the syntax here - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
`‑‑export‑prompt`	string	Export the default prompts to the specified file that can be used with the --prompt-file option. The file will be saved with a .toml extension. If the file already exists, it will be overwritten. It will exit after exporting the prompts.

Caching Options ↩

Option	Type	Description	Default
`‑‑no‑cache`	flag	Disable default disk cache.
`‑‑disk‑cache‑dir`	string	The directory to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange).	`~/.qsv-cache/describegpt`
`‑‑redis‑cache`	flag	Use Redis instead of the default disk cache to cache LLM completions. It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool size of 20, with a TTL of 28 days, and cache hits NOT refreshing an existing cached value's TTL. This option automatically disables the disk cache.
`‑‑fresh`	flag	Send a fresh request to the LLM API, refreshing a cached response if it exists. When a --prompt SQL query fails, you can also use this option to request the LLM to generate a new SQL query.
`‑‑forget`	flag	Remove a cached response if it exists and then exit.
`‑‑flush‑cache`	flag	Flush the current cache entries on startup. WARNING: This operation is irreversible.

MCP Sampling Options ↩

Option	Type	Description	Default
`‑‑prepare‑context`	flag	Output the prompt context as JSON to stdout without calling the LLM. JSON includes system/user prompts, cache state, and analysis results for each inference phase. Useful for inspecting prompts or piping to custom LLM integrations. Used by the MCP server for sampling mode.
`‑‑process‑response`	flag	Process LLM responses provided as JSON via stdin. Takes the output format from --prepare-context with LLM responses filled in, and produces the final output (dictionary, description, tags, or prompt results). Used by the MCP server for sampling mode.

Common Options ↩

Option	Type	Description	Default
`‑h,` `‑‑help`	flag	Display this message
`‑‑format`	string	Output format: Markdown, TSV, JSON, TOON, JSONSchema, SemanticMd, or OKF. TOON is a compact, human-readable encoding of the JSON data model for LLM prompts. See https://toonformat.dev/ for more info. JSONSchema emits the Data Dictionary as a JSON Schema (draft 2020-12) document, enriched with LLM-inferred Label, Description and Content Type (the latter only when the infer-content-type flag is set). qsv- and LLM- specific metadata not modeled by the JSON Schema spec (cardinality, null_count, weighted example counts, content_type, addl stats columns) is preserved via a single x-qsv annotation object per property; unknown keywords are ignored by validators per the 2020-12 spec. The JSONSchema format requires the dictionary inference phase (the dictionary or all flag). The description inference, when also run, becomes the schema's top-level description; tags, when also run, are embedded at x-qsv.tags. The prompt inference is not supported. SemanticMd emits the Data Dictionary as a Semantic Markdown document (https://semanticmd.org/) - human-readable markdown with light, agent-parseable conventions that a companion converter turns into JSON. It enriches each column with a catalog-wide Concept ID (for cross-dataset join discovery), an analytical Role (dimension/measure/identifier/timestamp), join keys & cardinality, data-quality flags, and a richer per-column statistics block; it also emits a dataset grain and a temporal/spatial envelope. To populate Concept/Role/grain, SemanticMd implies --infer-content-type. Like JSONSchema, it requires the dictionary inference phase (the dictionary or all flag). The description inference, when also run, becomes the '# Dataset' description; tags, when also run, are embedded in the document frontmatter. The prompt inference is not supported. OKF emits the Data Dictionary as an Open Knowledge Format document (https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf) - a leaner, vendor-neutral plain-markdown-plus-YAML-frontmatter format. It emits frontmatter (type/title/description/resource/timestamp/tags) and a Schema section with a Column/Type/Content Type/Description/Enumeration table. The type frontmatter key is set via the okf-type flag (default "CSV Table"); ds-source maps to resource and ds-updated maps to timestamp. Like SemanticMd, OKF implies infer-content-type, requires the dictionary inference phase (the dictionary or all flag), folds the description inference into the body prose and a single-sentence frontmatter description, and embeds tags in the frontmatter. The prompt inference is not supported.	`Markdown`
`‑‑allow‑extra‑cols`	flag	When the format is JSONSchema, emit additionalProperties as true at the schema root (default is false, strict). Only meaningful with the JSONSchema format; ignored otherwise.
`‑‑strict‑dates`	flag	When the format is JSONSchema, emit format date or date-time for columns that stats infers as Date or DateTime. Off by default because qsv's --infer-dates is permissive (accepts strings like "June 27, 1968") and JSON Schema's date formats require RFC 3339, so the validate roundtrip would otherwise fail. Set this only when your source columns are guaranteed to be RFC 3339 full-date / date-time. Mirrors the same flag on the schema command.
`‑‑ds‑source`	string	For the SemanticMd & OKF formats only: the dataset source/provenance recorded in the document frontmatter (e.g. a source URL or publisher). For OKF this populates the `resource` key. Optional; the frontmatter key is omitted when unset. Ignored by other formats.
`‑‑ds‑updated`	string	For the SemanticMd & OKF formats only: the dataset's last-updated date recorded in the document frontmatter. For OKF this populates the `timestamp` key. Optional. Ignored by other formats.
`‑‑ds‑license`	string	For the SemanticMd format only: the dataset license recorded in the document frontmatter. Optional. Ignored by other formats.
`‑‑okf‑type`	string	For the OKF format only: the value of the required `type` frontmatter key (e.g. "CSV Table", "BigQuery Table"). Optional; defaults to "CSV Table". Ignored by other formats.
`‑o,` `‑‑output`	string	Write output to instead of stdout. If --format is set to TSV, separate files will be created for each prompt type with the pattern {filestem}.{kind}.tsv (e.g., output.dictionary.tsv, output.tags.tsv).
`‑q,` `‑‑quiet`	flag	Do not print status messages to stderr.

Source: src/cmd/describegpt.rs | Table of Contents | README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

describegpt

Description ↩

Examples ↩

Usage ↩

Data Analysis/Inferencing Options ↩

Dictionary Options ↩

Tag Options ↩

Stats/Frequency Options ↩

Custom Prompt Options ↩

LLM API Options ↩

Caching Options ↩

MCP Sampling Options ↩

Common Options ↩

Uh oh!

FilesExpand file tree

describegpt.md

Latest commit

History

describegpt.md

File metadata and controls

describegpt

Description ↩

Examples ↩

Usage ↩

Data Analysis/Inferencing Options ↩

Dictionary Options ↩

Tag Options ↩

Stats/Frequency Options ↩

Custom Prompt Options ↩

LLM API Options ↩

Caching Options ↩

MCP Sampling Options ↩

Common Options ↩