feat: Add Spark-compatible array functions: array_except, array_intersect, array_union, array_compact, array_position

### Is your feature request related to a problem?

Daft already provides a rich set of `list_*` functions, but several commonly used array operations from Apache Spark / Databricks SQL are still missing. This makes it harder for users to:

- Migrate existing Spark / PySpark workloads to Daft.
- Write portable SQL that works across both engines.
- Express common set-style list manipulations (set difference / intersection / union / dedup, and finding an element's position) without resorting to verbose UDFs or workarounds.

Specifically, the following Spark functions currently have no direct equivalent in Daft:

| Spark function   | Behavior |
|------------------|----------|
| `array_except`   | Returns elements in array1 that are not in array2 (set difference, deduped) |
| `array_intersect`| Returns the set intersection of two arrays (deduped) |
| `array_union`    | Returns the set union of two arrays (deduped) |
| `array_compact`  | Removes NULL elements from an array |
| `array_position` | Returns the 1-based position of the first occurrence of an element, 0 if not found |

These appear in the `SPARK_FUNCTION_COMPARISON.md` matrix as unimplemented.

No related issue ID — this is a fresh proposal, but it follows the ongoing effort tracked in `SPARK_FUNCTION_COMPARISON.md` to close the Spark-compatibility gap.


### Describe the solution you'd like


Implement five new list/array functions in `daft-functions-list`, expose them via the Python expression API, the top-level `daft.functions` module, and the SQL layer with Spark-compatible aliases.

### Behavior (Spark-compatible)

- **`array_except(a, b)` / `list_except`** — set difference, deduplicated, NULLs dropped, preserves first-seen order from `a`.
- **`array_intersect(a, b)` / `list_intersect`** — set intersection, deduplicated, NULLs dropped, preserves first-seen order from `a`.
- **`array_union(a, b)` / `list_union`** — set union, deduplicated, NULLs dropped, order = elements of `a` followed by new elements of `b`.
- **`array_compact(a)` / `list_compact`** — returns the input list with all NULL elements removed.
- **`array_position(a, item)` / `list_position`** — returns the 1-based index of the first occurrence of `item` in `a`, `0` if not found, NULL if either input is NULL.

### Surfaces to update

- ✅ Rust kernels in `src/daft-functions-list/src/{except,intersect,union,compact,position}.rs`
- ✅ Hash-based set semantics in `src/daft-functions-list/src/series.rs` (with proper NULL handling matching Spark)
- ✅ Python expressions: `daft.Expression.list.except_/intersect/union/compact/position`
- ✅ Top-level functions: `daft.functions.list_except / list_intersect / list_union / list_compact / list_position`
- ✅ SQL aliases: `array_except`, `array_intersect`, `array_union`, `array_compact`, `array_position` (plus `list_*` equivalents)
- ✅ Tests under `tests/recordbatch/list/test_list_set_ops.py` covering Python API + SQL paths, NULL handling, dedup, type promotion, empty arrays, and not-found cases.
- ✅ Update `SPARK_FUNCTION_COMPARISON.md` to mark these as implemented.

### Type semantics

- All five functions promote element types via the existing `try_supertype` machinery so `array_union(int_list, float_list)` works correctly.
- Output element nullability is preserved consistently with other list functions.

### Implementation note

Set operations are implemented with a hashed probe table over the right-hand list (mirroring the existing `list_distinct` approach), giving O(N + M) behavior per row instead of O(N·M).

PR: https://github.com/Eventual-Inc/Daft/pull/new/feat/spark-array-functions  
Branch: `feat/spark-array-functions`


### Describe alternatives you've considered


1. **Pure Python UDFs** — users could compose `list_distinct`, `list_value_counts`, `is_in`, `list_filter`, etc. to emulate these. Rejected because:
   - It produces verbose, error-prone code (especially for NULL semantics, which differ subtly between Spark and naive set operations).
   - It bypasses native vectorized execution, costing significant performance.
   - It does not solve the SQL-compatibility gap — `array_*` SQL identifiers still need to be resolved by the planner.

2. **Implement only the SQL-side aliases** that map to existing `list_*` functions — Rejected because Daft does not yet have native equivalents for `except` / `intersect` / `union` / `compact` / `position` semantics as Spark defines them (set-deduplicated with NULL drop). Aliasing alone is insufficient.

3. **Wait for Substrait / external library integration** to provide these — Rejected because these are simple, foundational list primitives that should live in `daft-functions-list` next to their existing siblings (`list_distinct`, `list_contains`, …) for consistency and discoverability.

The chosen approach (native Rust kernels + Python + SQL) is the same pattern Daft already uses for its existing list functions, so it integrates cleanly with no new abstractions.


### Component(s)

_No response_

### Additional Context

_No response_

### Would you like to implement a fix?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Spark-compatible array functions: array_except, array_intersect, array_union, array_compact, array_position #7082

Is your feature request related to a problem?

Describe the solution you'd like

Behavior (Spark-compatible)

Surfaces to update

Type semantics

Implementation note

Describe alternatives you've considered

Component(s)

Additional Context

Would you like to implement a fix?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spark function	Behavior
`array_except`	Returns elements in array1 that are not in array2 (set difference, deduped)
`array_intersect`	Returns the set intersection of two arrays (deduped)
`array_union`	Returns the set union of two arrays (deduped)
`array_compact`	Removes NULL elements from an array
`array_position`	Returns the 1-based position of the first occurrence of an element, 0 if not found

Uh oh!

feat: Add Spark-compatible array functions: array_except, array_intersect, array_union, array_compact, array_position #7082

Description

Is your feature request related to a problem?

Describe the solution you'd like

Behavior (Spark-compatible)

Surfaces to update

Type semantics

Implementation note

Describe alternatives you've considered

Component(s)

Additional Context

Would you like to implement a fix?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions