Skip to content

Latest commit

 

History

History
206 lines (169 loc) · 26.6 KB

File metadata and controls

206 lines (169 loc) · 26.6 KB

Environment Variables

Variable Description
QSV_DOTENV_PATH The full pathname of the dotenv file to load, OVERRIDING existing environment variables. This takes precedence over any other dotenv files in the filesystem. Set to "" to disable dotenv processing.
QSV_DEFAULT_DELIMITER single ascii character to use as delimiter. Overrides --delimiter option. Defaults to "," (comma) for CSV files & "\t" (tab) for TSV files when not set. Note that this will also set the delimiter for qsv's output to stdout.
However, using the --output option, regardless of this environment variable, will automatically change the delimiter used in the generated file based on the file extension - i.e. comma for .csv; tab for .tsv & .tab ; and semicolon for .ssv files
QSV_SNIFF_DELIMITER if set, the delimiter is automatically detected. Overrides QSV_DEFAULT_DELIMITER & --delimiter option. Note that this does not work with stdin.
QSV_SNIFF_PREAMBLE if set, qsv will attempt to sniff the number of preamble rows.
QSV_NO_HEADERS if set, the first row will NOT be interpreted as headers. Supersedes QSV_TOGGLE_HEADERS.
QSV_TOGGLE_HEADERS if set to 1, toggles header setting - i.e. inverts qsv header behavior, with no headers being the default, & setting --no-headers will actually mean headers will not be ignored.
QSV_ANTIMODES_LEN set to the maximum number of characters when listing "antimodes" in stats. Otherwise, the default is 100. Set to 0 to disable length limiting.
QSV_AUTOINDEX_SIZE if set, specifies the minimum file size (in bytes) of a CSV file before an index is automatically created. Note that stale indices are automatically updated regardless of this setting.
QSV_STATSCACHE_MODE Specifies how the stats cache is used by "smart" commands. Valid values are:
* auto - use the stats cache if it's valid (the stats-jsonl file exists and is current) - default.
* force - if the cache does not exist, create it by running stats.
* none - do not use the stats cache, even if it exists.
QSV_STATS_STRING_MAX_LENGTH Specifies the maximum string length for the "min"/"max" stats column. Some CSVs can have very long string columns that can cause other parsers to fail (e.g. Python's CSV reader can only accommodate 128kb strings by default) and when converting spatial formats like GeoJSON or Shapefile to CSV, the geometry column can easily be larger than this. When set, truncates the "min"/"max" columns of type String at the specified length and then appends an ellipsis (...).
QSV_STATS_CHUNK_MEMORY_MB Controls memory-aware chunk sizing for parallel statistics processing. When set to a positive number, limits the maximum memory per chunk (in MB). When set to 0, dynamically estimates chunk size by sampling records and available system memory. When unset, automatically enables dynamic sizing for non-streaming statistics (median, quartiles, modes, cardinality) and uses CPU-based chunking (dividing work by number of CPU cores) for streaming statistics only. Dynamic sizing means chunk sizes are determined based on available memory and sampled record sizes, allowing efficient processing of large files without exceeding system resources, while CPU-based chunking divides work evenly among CPU cores for streaming statistics. When set to -1, forces CPU-based chunking. This allows processing arbitrarily large files by creating smaller chunks that fit in available memory.
QSV_FREQ_CHUNK_MEMORY_MB Controls memory-aware chunk sizing for frequency distribution processing. Set to 0 for dynamic sizing, or a positive number for a fixed memory limit per chunk, or -1 for CPU-based chunking (chunk size = num records / number of CPUs). This allows processing arbitrarily large files by creating smaller chunks that fit in available memory.
QSV_FREQ_HIGH_CARD_THRESHOLD Absolute cardinality threshold for HIGH_CARDINALITY classification in the frequency cache (--frequency-jsonl). Columns with cardinality exceeding this value (or QSV_FREQ_HIGH_CARD_PCT of rowcount, whichever is smaller) get a single HIGH_CARDINALITY sentinel entry instead of full frequency data. Only used as a fallback when the --high-card-threshold CLI value still equals its default (100). Precedence caveat: because docopt does not distinguish a flag left at its default from one explicitly passed at the default value, running with --high-card-threshold 100 is indistinguishable from omitting the flag, so this env var will still override. To force the value 100 and ignore the env var, unset QSV_FREQ_HIGH_CARD_THRESHOLD (or set it to 100).
QSV_FREQ_HIGH_CARD_PCT Percentage of rowcount threshold for HIGH_CARDINALITY classification in the frequency cache (--frequency-jsonl). Columns with cardinality exceeding this percentage of the total row count (or QSV_FREQ_HIGH_CARD_THRESHOLD, whichever is smaller) get a single HIGH_CARDINALITY sentinel entry. Only used as a fallback when the --high-card-pct CLI value still equals its default (90). Precedence caveat: because docopt does not distinguish a flag left at its default from one explicitly passed at the default value, running with --high-card-pct 90 is indistinguishable from omitting the flag, so this env var will still override. To force the value 90 and ignore the env var, unset QSV_FREQ_HIGH_CARD_PCT (or set it to 90).
QSV_CACHE_DIR The directory to use for caching downloaded lookup_table resources using the luau qsv_register_lookup() helper function.
QSV_CKAN_API The CKAN Action API endpoint to use with the luau qsv_register_lookup() helper function when using the "ckan://" scheme.
QSV_CKAN_TOKEN The CKAN token to use with the luau qsv_register_lookup() helper function when using the "ckan://" scheme. Only required to access private resources.
QSV_GET_PART_SIZE the byte size of each part when the get command downloads a remote object — over HTTP(s) or from cloud storage (s3://, gs://, az://). Objects larger than this are fetched as parallel byte-ranges and streamed into the cache; objects this size or smaller are fetched in a single request (no overhead). (default (bytes): 8388608 (8 MiB)). Requires the get feature.
QSV_GET_CONCURRENCY the maximum number of concurrent range GETs for a single get download, whether over HTTP(s) or from cloud storage (clamped to 1-64). Peak extra memory is roughly QSV_GET_CONCURRENCY × QSV_GET_PART_SIZE, independent of the total object size. (default: 4). Requires the get feature.
QSV_COMMENT_CHAR set to an ascii character. If set, any lines(including the header) that start with this character are ignored.
QSV_MAX_JOBS number of jobs to use for multithreaded commands (currently apply, applydp, blake3, datefmt, dedup, diff, excel, extsort, frequency, geocode, joinp, jsonl, moarstats, pivotp, pragmastat, replace, sample, schema, search, searchset, snappy, sort, split, sqlp, stats, template, to, tojsonl & validate). If not set, max_jobs is set to the detected number of logical processors. See Multithreading for more info.
QSV_NO_UPDATE if set, prohibit self-update version check for the latest qsv release published on GitHub.
QSV_LLM_BASE_URL The LLM API URL to use with the describegpt command.
QSV_LLM_APIKEY The API key of the supported LLM service to use with the describegpt command.
QSV_LLM_MODEL The LLM Model to use with the describegpt command (e.g. openai/gpt-oss-20b, openai/gpt-oss-120b, google/gemma-4-31b, google/gemma-4-e4b).
QSV_DUCKDB_PATH The fully qualified path to the DuckDB binary. In describegpt, when set, DuckDB is used instead of the default Polars SQL engine and all loaded DuckDB extensions are sent as additional context to the LLM. In scoresql, the --duckdb flag is required to use DuckDB; the env var only supplies the binary path (if unset, scoresql looks for duckdb in PATH).
QSV_TEST_DESCRIBEGPT If set, enables describegpt command tests. Requires LM Studio with openai/gpt-oss-20b model loaded.
QSV_OUTPUT_BOM if set, the output will have a Byte Order Mark (BOM) at the beginning. This is used to generate Excel-friendly CSVs on Windows.
QSV_FORCE_COLOR if set, forces colorized output even when redirecting or running in CI. Used by the color command to override automatic color detection.
QSV_THEME sets the color theme for the color command. Valid values are DARK or LIGHT (case-insensitive). If not set, the theme is automatically detected based on the terminal background color.
QSV_TERMWIDTH overrides the detected terminal width for the color command. Must be a value between 1 and 1000. If not set, the terminal width is automatically detected or defaults to 80 when output is redirected.
QSV_POLARS_FLOAT_PRECISION The precision to use when converting Polars-enabled formats (Avro,Arrow,Parquet,JSON,JSONL and gz,zlib & zst compressed files) to CSV. If set, this will also override the --float-precision option of the sqlp command.
QSV_POLARS_DECIMAL_SCALE The scale to use when using the Polars Decimal type. If not set, this defaults to 5.
QSV_PREFER_DMY if set, date parsing will use DMY format. Otherwise, use MDY format (used with datefmt, frequency, joinp, moarstats, pivotp, sample, schema, sniff, stats & tojsonl commands).
QSV_REGEX_UNICODE if set, makes search, searchset & replace commands unicode-aware. For increased performance, these commands are not unicode-aware by default & will ignore unicode values when matching & will abort when unicode characters are used in the regex. Note that the apply operations regex_replace operation is always unicode-aware.
QSV_RDR_BUFFER_CAPACITY reader buffer size (default - 128k (bytes): 131072)
QSV_SKIP_FORMAT_CHECK if set, skips mime-type checking of input files. Set this when optimizing for performance and when encountering false positives as a format check involves scanning the input file to infer the mime-type/format.
QSV_STATS_SEPARATOR the separator to use to delimit multiple MODE/ANTIMODE and PERCENTILE values.
QSV_WTR_BUFFER_CAPACITY writer buffer size (default - 512k (bytes): 524288)
QSV_FREEMEMORY_HEADROOM_PCT the percentage of free available memory required when running qsv in "non-streaming" mode (i.e. the entire file needs to be loaded into memory). If the incoming file is greater than the available memory after the headroom is subtracted, qsv will not proceed. Set to 0 to skip memory check. See Memory Management for more info. (default: (percent) 20 )
QSV_MEMORY_CHECK if set, enables CONSERVATIVE memory check mode when running in "non-streaming" mode. In CONSERVATIVE mode, qsv computes total available memory by adding the current available memory and free swap space, applies a platform-specific multiplier (1.3x on macOS, 1.15x on Linux, 1.0x on Windows), then subtracts the headroom percentage. If the input file size exceeds this adjusted value, qsv will abort with an error. Otherwise (NORMAL mode), qsv will only check if the input file size < TOTAL memory - HEADROOM. This is done to prevent Out-of-Memory errors. See Memory Management for more info.
QSV_NO_ALLOC_TUNING if set, disables qsv's automatic jemalloc tuning (jemalloc builds only). By default qsv enables jemalloc background_thread purging where supported (Linux) at no extra memory cost, and on platforms without it (e.g. macOS) retains freed pages during parallel, hashmap-heavy commands (stats, frequency, schema, moarstats) to cut madvise syscall overhead — trading higher peak RSS (~+16%) for speed (~3-9% faster). Set this to opt out (e.g. in RSS-constrained environments or for diagnostics). Power users can still override jemalloc directly via MALLOC_CONF/_RJEM_MALLOC_CONF.
QSV_LOG_LEVEL desired level for qsv_rCURRENT.log (error, warn, info, trace, debug). Default: off for qsv/qsvlite/qsvdp; info for qsvmcp to capture process-level START/END entries. Note: this is separate from the MCP audit trail (qsvmcp.log), which is controlled by QSV_MCP_LOG_LEVEL.
QSV_LOG_DIR when logging is enabled, the directory where the log files will be stored. If the specified directory does not exist, qsv will attempt to create it. If not set, the log files are created in the directory where qsv was started. See Logging for more info.
QSV_LOG_UNBUFFERED if set, log messages are written directly to disk, without buffering. Otherwise, log messages are buffered before being written to the log file (8k buffer, flushing every second). See flexi_logger for details.
QSV_PROGRESSBAR if set, enable the --progressbar option on the apply, datefmt, fetch, fetchpost, foreach, geocode, luau, moarstats, py, replace, search, searchset, sniff, sortcheck, template & validate commands.
QSV_DISKCACHE_TTL_SECS set time-to-live of diskcache cached values (default (seconds): 2419200 (28 days)).
QSV_DISKCACHE_TTL_REFRESH if set, enables cache hits to refresh TTL of diskcache cached values.
QSV_REDIS_CONNSTR the fetch command can use Redis to cache responses. Set to connect to the desired Redis instance. (default: redis://127.0.0.1:6379/1). For more info on valid Redis connection string formats, click here.
QSV_FP_REDIS_CONNSTR the fetchpost command can also use Redis to cache responses (default: redis://127.0.0.1:6379/2).
QSV_DG_REDIS_CONNSTR the describegpt command can also use Redis to cache responses (default: redis://127.0.0.1:6379/3).
QSV_REDIS_MAX_POOL_SIZE the maximum Redis connection pool size. (default: 20).
QSV_REDIS_TTL_SECS set time-to-live of Redis cached values (default (seconds): 2419200 (28 days)).
QSV_REDIS_TTL_REFRESH if set, enables cache hits to refresh TTL of Redis cached values.
QSV_TIMEOUT for commands with a --timeout option (describegpt, fetch, fetchpost, geocode, luau, sample, snappy, sniff, template & validate), the number of seconds before a web request times out (default: 30).
QSV_USER_AGENT the user-agent to use for web requests. When specifying a custom user agent. It supports the following variables - $QSV_VERSION, $QSV_TARGET, $QSV_BIN_NAME, $QSV_KIND and $QSV_COMMAND. Try to conform to the IETF RFC 7231 standard. See here for examples.
(default: $QSV_BIN_NAME/$QSV_VERSION ($QSV_TARGET; $QSV_COMMAND; $QSV_KIND; https://github.com/dathere/qsv) - e.g.
qsv/21.1.0 (x86_64-unknown-linux; fetch; prebuilt; https://github.com/dathere/qsv)).
QSV_OPENCAGE_API_KEY the OpenCage API key used by the geocode opencage/opencagenow subcommands for online forward/reverse geocoding. Can also be set with the --api-key option.
QSV_GEOIP2_FILENAME the filename of the GeoIP2 database to use for the geocode iplookup* commands. (default: GeoLite2-City.mmdb)
QSV_GEOCODE_INDEX_FILENAME The filename of the Geonames index file to use for the geocode command. If not set, the default index file for that qsv version is downloaded and saved to QSV_CACHE_DIR. Set this only if you have a custom Geonames index file.
QSV_MAPBOX_TOKEN the Mapbox access token used by the viz map command for Mapbox-hosted basemap styles (basic, streets, outdoors, light, dark, satellite, satellite-streets). Can also be set with the --mapbox-token option, which takes precedence. Token-free styles (e.g. open-street-map, carto-positron) do not require it. Requires the viz feature.

Several dependencies also have environment variables that influence qsv's performance & behavior:

  • Memory Allocator
    When incorporating qsv into a data pipeline that runs in batch mode, particularly with very large CSV files using qsv commands that load entire CSV files into memory, you can fine tune qsv's memory allocator run-time behavior using the environment variables for the allocator you're using:

    • mimalloc

    • jemalloc

      qsv's automatic jemalloc tuning (see QSV_NO_ALLOC_TUNING above) covers the two levers from jemalloc's TUNING.md that consistently help qsv's batch workloads: background_thread purging and per-command dirty/muzzy page retention. Two other TUNING.md levers were evaluated and deliberately left off by default — on a high-cardinality stress workload (8M rows, multi-million-cardinality columns, Linux THP always) neither was a clear win:

      • metadata_thp:auto — put jemalloc's internal metadata on Transparent Huge Pages. Mixed: frequency ~4% faster but stats ~3% slower, at ~+5% peak RSS. Since it's a process-global, startup-only option it can't be scoped to just the command it helps, so qsv does not bake it in.
      • percpu_arena:percpu — no measurable, RSS-safe win (rayon worker threads migrate, so the locality premise doesn't hold).

      Power users can still opt into either directly via _RJEM_MALLOC_CONF / MALLOC_CONF (e.g. _RJEM_MALLOC_CONF=metadata_thp:auto,percpu_arena:percpu). The manual Bench jemalloc metadata_thp GitHub Actions workflow reproduces the A/B/C comparison on a Linux runner if you want to re-measure for your own data shapes.

  • Network Access (reqwest)
    qsv uses reqwest and will honor proxy settings set through the HTTP_PROXY, HTTPS_PROXY, ALL_PROXY & NO_PROXY environment variables.

  • Polars
    qsv uses polars for several commands - currently color, count, joinp, lens, pivotp, prompt, schema, scoresql and sqlp. Polars has its own set of environment variables that can be set to influence its behavior (see here). The most relevant ones are:

    • POLARS_VERBOSE - if set to 1, polars will output logging messages to stderr.
    • POLARS_PANIC_ON_ERR - if set to 1, panics on polars-related errors, instead of returning an error.
    • POLARS_BACKTRACE_IN_ERR - if set to 1, includes backtrace in polars-related error messages.

ℹ️ NOTE: To get a list of all active qsv-relevant environment variables, run qsv --envlist. Relevant env vars always include anything that starts with QSV_ & the proxy variables listed above. Allocator-specific env vars are build-dependent: MIMALLOC_ vars are included when qsv is built with mimalloc support, and JEMALLOC_ & MALLOC_CONF vars are included when qsv is built with jemalloc support.

MCP Server Environment Variables

The qsv MCP (Model Context Protocol) Server exposes qsv's capabilities to AI agents like Claude. It can be used as a standalone MCP server or as a Claude Desktop Extension (MCPB).

These environment variables configure the MCP server behavior:

Core Configuration

Variable Description Default
QSV_MCP_BIN_PATH Full path to the qsv binary (qsvmcp preferred over qsv). If not set, auto-detects from PATH and common installation locations. Auto-detect
QSV_MCP_WORKING_DIR Default working directory for file operations. Supports template variables like ${HOME}, ${DOWNLOADS}, ${DOCUMENTS}, ${PWD}. ${PWD} (plugin mode) / ${DOWNLOADS} (extension & legacy modes)
QSV_MCP_ALLOWED_DIRS Additional directories where qsv can access files (colon-separated on Unix, semicolon on Windows, or JSON array). File access is restricted to working directory and these directories only. Empty (working dir only)

Performance Tuning

Variable Description Default Range
QSV_MCP_OPERATION_TIMEOUT_MS Operation timeout in milliseconds for qsv command execution. 600,000 (10 min) 1,000 - 1,800,000
QSV_MCP_MAX_OUTPUT_SIZE Maximum output size in bytes before results are automatically saved to disk. 52,428,800 (50 MB) 1,048,576 - 104,857,600
QSV_MCP_CONVERTED_LIFO_SIZE_GB Maximum size for the converted file cache (Excel→CSV, JSONL→CSV) in GB. Uses LIFO eviction. 1.0 0.1 - 100.0
QSV_MCP_MAX_FILES_PER_LISTING Maximum number of files returned in a single directory listing. 1,000 1 - 100,000
QSV_MCP_MAX_CONCURRENT_OPERATIONS Maximum number of concurrent qsv operations. 3 (plugin mode) / 1 (extension & legacy modes) 1 - 100
QSV_MCP_MAX_EXAMPLES Maximum number of examples to include in MCP tool descriptions. Set to 0 to disable examples. 5 0 - 20

Advanced Configuration

Variable Description Default
QSV_MCP_LOG_LEVEL MCP audit log level controlling the qsvmcp.log audit trail. Valid values: "info" (log all tool invocations), "error" (log only failed invocations), "off" (disable audit logging). This is separate from the Rust binary's QSV_LOG_LEVEL. info
QSV_MCP_PLUGIN_MODE Override for plugin mode detection. Set to true when using non-Claude AI CLI agents (e.g., Gemini CLI) to relax directory security (since the host environment provides filesystem isolation). Auto-detect
QSV_MCP_CONCURRENCY_WAIT_TIMEOUT_MS Timeout in milliseconds when waiting for a concurrency slot. If all concurrent operation slots are in use, new requests wait up to this duration before being rejected. Set to 0 for immediate rejection. 120,000 (2 min)
QSV_MCP_SERVER_INSTRUCTIONS Custom server instructions that override built-in workflow guidance. Leave empty to use built-in defaults. "" (empty)
QSV_MCP_EXPOSE_ALL_TOOLS When set to true, expose all tools at startup instead of using deferred loading (core tools only). Useful for AI clients that don't support deferred tool loading. Auto-detect
QSV_MCP_DUCKDB_BIN_PATH Full path to the DuckDB binary. If not set, auto-detects from PATH. "" (auto-detect)
QSV_MCP_USE_DUCKDB Enable DuckDB for SQL queries when available. When false, always uses sqlp (Polars SQL). true
QSV_MCP_OUTPUT_FORMAT Output format for qsv command results. Valid values: "tsv" or "csv". tsv
QSV_MCP_ENABLE_APPS Enable MCP Apps UI features (e.g., qsv_browse_directory interactive directory browser). true
QSV_MCP_SANITIZE_ERRORS Strip absolute paths (usernames, directory layout) from command and error output before returning it to the client. Set to false to opt out (e.g., for local debugging). Sanitization protects hosted/shared deployments. true

Update Checking

Variable Description Default
QSV_MCP_CHECK_UPDATES_ON_STARTUP Check for new qsv releases on GitHub at startup. true
QSV_MCP_NOTIFY_UPDATES Display update notifications in server logs when new versions are available. true
QSV_MCP_AUTO_REGENERATE_SKILLS Automatically regenerate skill definitions when qsv version changes. Runs qsv --update-mcp-skills. false
QSV_MCP_GITHUB_REPO GitHub repository for update checks. dathere/qsv

Desktop Extension Mode

Variable Description Default
MCPB_EXTENSION_MODE Set automatically by Claude Desktop when running as an extension. Enables stricter validation (requires fully qualified qsv path, version >= 17.0.0). false

Cowork Plugin Setup

Variable Description Default
QSV_NO_COWORK_SETUP When set to 1, disables the Cowork plugin's auto-deployment of the workflow-guidance CLAUDE.md. Set in your shell before launching Claude Code. Unset (auto-deploy enabled)

Template Variables

The following template variables can be used in QSV_MCP_WORKING_DIR and QSV_MCP_ALLOWED_DIRS:

Variable Expands To
${HOME} User's home directory
${USERPROFILE} User's home directory (Windows alias)
${DESKTOP} User's Desktop folder
${DOCUMENTS} User's Documents folder
${DOWNLOADS} User's Downloads folder
${TEMP} / ${TMPDIR} System temporary directory
${PWD} Current working directory

Example Configuration

Legacy MCP mode (claude_desktop_config.json):

{
  "mcpServers": {
    "qsv": {
      "command": "node",
      "args": ["/path/to/.claude/skills/dist/mcp-server.js"],
      "env": {
        "QSV_MCP_BIN_PATH": "/usr/local/bin/qsv",
        "QSV_MCP_WORKING_DIR": "${DOWNLOADS}",
        "QSV_MCP_ALLOWED_DIRS": "${DOCUMENTS}:${DESKTOP}",
        "QSV_MCP_CHECK_UPDATES_ON_STARTUP": "true"
      }
    }
  }
}

Desktop Extension mode: Configuration is managed through Claude Desktop Settings → Extensions → qsv, which sets these environment variables automatically.

.env File Support

qsv supports the use of .env files to set environment variables. The .env file is a simple text file that contains key-value pairs, one per line.

It processes .env files as follows:

  • Upon invocation, qsv will check if the QSV_DOTENV_PATH environment variable is set. If it is, it will look for the file specified by the variable. If the file is found, it will be processed.
  • If the QSV_DOTENV_PATH environment variable is not set, qsv will look for a file named .env in the current working directory. If one is found, it will be processed.
  • If no .env file is not found in the current working directory, qsv will next look for an .env file with the same filestem as the binary in the directory where the binary is (e.g. if qsv/qsvmcp/qsvlite/qsvdp is in /usr/local/bin, it will look for /usr/local/bin/qsv.env, /usr/local/bin/qsvmcp.env, /usr/local/bin/qsvlite.env or /usr/local/bin/qsvdp.env respectively).
  • If no .env files are found, qsv will proceed with its default settings and the current environment variables, which may include "QSV_" variables.

When processing .env files, qsv will:

  • overwrite any existing environment variables with the same name
  • where multiple declarations of the same variable exist, the last one will be used
  • ignore any lines that start with # (comments)

To facilitate the use of .env files, a dotenv.template file is included in the qsv distribution. This file contains all the environment variables that qsv recognizes, along with their default values. Copy the template to a file named '.env' and modify it to suit your needs.