Release list

v0.21.0 Latest

Latest

laggui released this 07 May 21:17

v0.21.0

546cacb

Summary

Burn 0.21.0 brings 4 months of improvements that make the framework significantly faster and more reliable across the board. The gains span distributed workflows for training large models all the way down to small-model inference, where the reduced framework overhead becomes especially noticeable.

We rethought our distributed computing stack around differentiable collective operations. Kernel selection is now more reliable thanks to better autotuning and a new validation layer, and a project-level burn.toml file lets you tweak those internals (and many others) without recompiling. A reworked device handle reduces framework overhead, and a new burn-dispatch crate simplifies backend selection while paving the way for faster compile times. The release also ships burn-flex, a lightweight eager CPU backend for WebAssembly and embedded targets that replaces burn-ndarray. Finally, we added early off-policy reinforcement learning support and a fresh round of kernel work on GEMV, top-k, and FFT.

For more details, check out the release post on our website.

Changelog

Breaking

We've introduced a couple of breaking changes with this release. The affected areas are detailed in the sections below.

`burn-dataset` cache directory

To respect platform conventions, we switched from using a hardcoded ~/.cache directory root for downloaded artifacts.

Platform	Path
Linux	`$XDG_CACHE_HOME` or `~/.cache`
macOS	`~/Library/Caches`
Windows	`{FOLDERPATH_LOCAL_APPDATA}`

For Linux users without $XDG_CACHE_HOME configured, this change has no effect. The cache directory is still ~/.cache.

Interface Changes

TensorData::shape now stores a Shape instead of a Vec<usize>. Existing binary records using BinFileRecorder or BinBytesRecorder are no not forward-compatible and must be converted before upgrading.

static STATE_ENCODED: &[u8] = include_bytes!("model.bin");

let model: Model<B> = Model::new(&Default::default());

// Old format can still be loaded before upgrade, but must be re-saved in a forward-compatible format.
let record = BinBytesRecorder::<FullPrecisionSettings, &'static [u8]>::default()
    .load(STATE_ENCODED, &Default::default())
    .expect("Failed to decode state");
let model = model.load_record(record);

model.save_file("model.mpk", &NamedMpkFileRecorder::<FullPrecisionSettings>::new()).unwrap();

The module derive macro has been improved, and the Ignored<T> wrapper is now deprecated. For fields that should not considered modules, use #[module(skip)] instead.

pub struct Conv1d<B: Backend> {
-    pub padding: Ignored<PaddingConfig1d>,
+    #[module(skip)]
+    pub padding: PaddingConfig1d,
}

We added support for explicit asymmetric padding. If you were using explicit padding, you must now specify the same value for all pairs. Note that PaddingConfig3d does not support asymmetric padding yet.

// Symmetric (left, right)
- PaddingConfig1d::Explicit(1)
+ PaddingConfig1d::Explicit(1, 1)
// Symmetric (top, left, bottom, right)
- PaddingConfig2d::Explicit(1, 1)
+ PaddingConfig2d::Explicit(1, 1, 1, 1)

The Gelu activation module can now be configured with tanh approximation. This only affects code that instantiated Gelu directly.

- let activation = Gelu;
+ let activation = Gelu::new(); // or Gelu::default()

The position-wise feed-forward module now has a configurable activation function. To keep it backwards compatible with previously saved records, the field is marked as #[module(skip)].

#[derive(Module, Debug)]
pub struct PositionWiseFeedForward<B: Backend> {
    // ...
-   /// GELU activation function.
-   pub gelu: Gelu,
+   /// Activation function.
+   #[module(skip)]
+   pub activation: Activation<B>,
}

The Shape fields are now private and some methods have been renamed. ShapeError has been renamed to MetadataError.

- let b = tensor.shape().dims[0];
+ let b = tensor.shape()[0]

- if let Err(ShapeError::RankMismatch{...}) = lhs.broadcast(&rhs) {
+ if let Err(MetadataError::RankMismatch{...}) = lhs.broadcast(&rhs) {

- let shape = shape.swap(1, 2).unwrap();
+ let shape = shape.swapped(1, 2).unwrap();

- let shape = shape.permute(&[0, 2, 1, 3]).unwrap();
+ let shape = shape.permuted(&[0, 2, 1, 3]).unwrap();

The boolean data type was expanded to include its storage type.

match bool_tensor.dtype() {
-   DType::Bool => todo!(),
+   DType::Bool(BoolStore::Native) => todo!(),
+   DType::Bool(BoolStore::U8) => todo!(),
+   DType::Bool(BoolStore::U32) => todo!(),
    _ => unreachable!(),
}

powf is no longer supported for Int tensors, as it previously relied on incorrect implicit truncation. These operations are now only available for Float tensors.

- let tensor_i = tensor_int.powf(tensor_float);
+ let tensor_f = tensor_int.float().powf(tensor_float);

- let tensor_i = tensor_int.powf_scalar(scalar_float);
+ let tensor_f = tensor_int.float().powf_scalar(scalar_float);

Backend tensor creation and conversion ops now take an explicit output dtype. This removes backend-specific dtype inference and ensures consistent behavior across backends. (Backend implementors only.)

impl BoolTensorOps<Self> for MyBackend {
-    fn bool_empty(shape: Shape, device: &Device<Self>) -> BoolTensor<Self> {
+    fn bool_empty(shape: Shape, device: &Device<Self>, dtype: BoolDType) -> BoolTensor<Self> {
        // use `dtype` instead of inferring internally
    }
-    fn bool_into_int(tensor: BoolTensor<Self>) -> IntTensor<Self> {
+    fn bool_into_int(tensor: BoolTensor<Self>, out_dtype: IntDType) -> IntTensor<Self> {
        // use `dtype` instead of inferring internally
    }
}

Associated types were moved from Backend to BackendTypes. Prefer the type aliases (Device<B>, FloatTensor<B>, etc.) to avoid type resolution issues.

impl BoolTensorOps<Self> for MyBackend {
-    fn bool_empty(shape: Shape, device: &<Self as Backend>::Device, dtype: BoolDType)) -> <Self as Backend>::BoolTensorPrimitive {
+    fn bool_empty(shape: Shape, device: &Device<Self>, dtype: BoolDType) -> BoolTensor<Self> {
    }
}

Module & Tensor

Feat/device policy (#4373) @laggui
Implement basic RNN module (#4460) @aditya0by0
Add deg2rad and rad2deg (#4462) @softmaximalist
Implement median tensor operation (#4454) @softmaximalist
Add Selu activation function (#4439) @antimora
Add CELU activation function (#4441) @antimora
Add Elu activation function (#4438) @antimora
Add BiGru (bidirectional GRU) module (#4442) @antimora
Add ThresholdedRelu activation function (#4440) @antimora
Add Softsign activation function (#4437) @antimora
[Breaking] Add configurable activation and layer_norm_eps to transformer layers (#4410) @antimora
[Breaking] Add asymmetric padding support for conv and pool operations (#4263) @antimora
Implement HardShrink, SoftShrink and Shrink Activations (#4556) @aditya0by0
feat: add align_corners support to InterpolateOptions (#4518) @antimora
feat: support padding on arbitrary dimensions (#4507) @antimora
feat: enhance attention() with scale, attn_bias, softcap, and is_causal (#4476) @antimora
feat: Introduce Lanczos3 interpolation method (#4601) @ovr
Add HannWindow operator to burn-tensor (#4631) @walkinggo
[Breaking] Remove int powf and make powi numeric op (#4646) @laggui
[Breaking] Add bool store dtype + remove bool elem from fusion (#4649) @laggui
[Breaking] Use device settings to provide output dtype (#4653) @laggui
feat: add categorical sampling for tensors (#4655) @majiayu000
Add HammingWindow operator to burn-tensor (#4698) @RunjiaChen
Fix: make module cloning efficient for CPU devices (#4703) @antimora
feat: support cross-kind tensor casting via .cast() (#4713) @antimora
Add FloatInfo for dtype-aware precision info (#4721) @antimora
Fix unsqueeze_dims panic (#4755) @softmaximalist
Fix unsqueeze_dims panic on duplicate sorted axes (#4764) @antimora
feat(burn-nn): add native LocalResponseNorm module (#4765) @jcwal1516
Add det (determinant) tensor operation (#4813) @softmaximalist
Add Blackman window function to signal module (#4842) @softmaximalist
Add STFT/ISTFT and thread n through FFT backend trait (#4835) @antimora
Add linear op to ModuleOps for fused matmul+bias (#4747) @antimora
Add native impementations for scatter_nd / gather_nd; provide autodiff for assign & add (#4709) @cu9hue
Fix conv x-backward padding_out bug (#4806) @antimora
Extract float math ops in a new trait (#4891) @skewballfox
linalg::lu: Improve numerical handling and small perf cleanup (#4902) @softmaximalist
Adding complex to complex FFT implementation (#4903) @RunjiaChen
add autodiff for scatter_nd min/max/mul (#4909) @cu9hue
fix: conv_transpose x-backward output size (#4916) @SAY-5
Change pwff activation to #[module(skip)] for backward compat (stateless) (#4929)

Datasets & Training

Implement SSIM vision metric (#4396) @softmaximalist
add KLDivLoss and batch_mean in reduction (#4399) @donjuanplatinum
Fix cubek matmul stage size (#4435) @laggui
Implement the PSNR vision metric (#4379) @softmaximalist
Implement Mean(L(P) Norm Error)Loss (#4341) @softmaximalist
Feature flag + Tests for RL in burn-rl and burn-train (#4470) @Charles23R
Burn rl (#4447) @Charles23R
add AMSgrad support for Adam/AdamW (#4388) @donjuanplatinum
add LBFGS optimizer (#4471) @donjuanplatinum
Add SequenceOutput struct for sequence prediction outputs (#4474) @softmaximalist
fix: OptimSharded strategy validation device mismatch (#4527) @Dreaming-Codes
Implement CTC loss (#4529) @softmaximalist
Add Smoot...

Contributors

jnamika, ovr, and 38 other contributors

Assets 2

v0.21.0-pre.5 Pre-release

Pre-release

laggui released this 05 May 20:21

v0.21.0-pre.5

7728590

What's Changed

Re-enable fusion f16 conv + bn regression tests (#4920) @laggui
Enable & fix cubecl tests w/ fusion (#4917) @laggui
Fusion tests (#4872) @nathanielsimard
Adding complex to complex FFT implementation (#4903) @RunjiaChen
Add cubecl integration to topk (#4906) @Sublime12
Extract float math ops in a new trait (#4891) @skewballfox
Add ParamId::try_deserialize() (#4881) @crutcher
Add Clone + 'static bounds to LrScheduler::Record and derive Clone for scheduler records (#4905) @crutcher
linalg::lu: Improve numerical handling and small perf cleanup (#4902) @softmaximalist
Add fusion integration for argtopk (#4904) @Sublime12
Add argtopk for Cubecl backend (#4900) @Sublime12
Update CubeK: tile matmul refactor (#4901) @louisfd
Use gather_nd in RNN-T gather_loss (#4895) @antimora
Fix cubecl cross product on non-last dimension (#4850) (#4850) @dschulmeist
Fix PytorchReader bugs to load legacy files correctly (#4897) @softmaximalist
Add native impementations for scatter_nd / gather_nd; provide autodiff for assign & add (#4709) @cu9hue

Full Changelog: v0.21.0-pre.4...v0.21.0-pre.5

Contributors

antimora, laggui, and 9 other contributors

Assets 2

v0.21.0-pre.4 Pre-release

Pre-release

laggui released this 27 Apr 20:49

v0.21.0-pre.4

8e485cb

What's Changed

Update cubek: tile matmul refactor (#4888) @louisfd
Add ctc_loss backend trait hook + tch and cubecl impls (#4819) @antimora
Centralize internal burn-* deps in [workspace.dependencies] (#4876) @antimora
Update cubecl + cubek: fix matmul, reduce WASM and vector size check on strided tensors (#4874) @laggui
Split Associated Types from Backend into BackendTypes (#4868) @skewballfox
All reduce backward (#4873) @Charles23R
Update/cubecl to client (#4866) @Charles23R
Fix select_assign OOB units (#4870) @laggui
Add linear op to ModuleOps for fused matmul+bias (#4747) @antimora
Add burn-std::config runtime configuration with fusion logging and search optimization (#4864) @nathanielsimard
Fix Typo in One Hot encoding class size error (#4869) @Baseng0815
Fix fusion reduce broadcasted when multi block local might be a view (#4867) @laggui
Add STFT/ISTFT and thread n through FFT backend trait (#4835) @antimora
Fix burn-flex argmax NaN ordering; tighten expand; precise erf (#4859) @antimora
Fix burn-flex sum_dim reading contiguous storage on transposed input (#4861) @antimora
Fix rustls-webpki audit (#4863) @laggui
Add det (determinant) tensor operation (#4813) @softmaximalist
Add Blackman window function to signal module (#4842) @softmaximalist
Display FlexDevice as Cpu (#4857) @antimora
Update cubecl: refactor toml config, fix autotune priority and fix persistent memory pool reset (#4858) @nathanielsimard
Migrate default test backend from NdArray to Flex (#4854) @antimora
Use burn-flex in docs and examples (#4841) @antimora
Fix burn-flex to_contiguous fast path for prefix views (#4856) @antimora
Migrate benchmarks from burn-flex to burn-backend-tests (#4853) @antimora
Fix autotune context, remove unsafe code (#4781) @ArthurBrussee
Override float_mean in cubecl backends (#4840) @laggui
Device service usage (#4839) @nathanielsimard
Fusion all reduce + refactor collective (#4803) @Charles23R
Add missing dispatch overrides and native tch ops for softmax, layer_norm (#4834) @antimora
Fix CrossEntropyLoss with probabilities (#4829) @laggui
Move tensor tests from burn-flex to burn-backend-tests (#4812) @antimora
Remove unused M param from SimpleOptimizerMapper. (#4823) @crutcher
Forward gemm perf features and fix burn-flex SIMD flag cascade (#4826) @antimora
Add Record<(R0,)> 1-Tuple (#4825) @crutcher
Cleanup OptimizerAdaptor / GradAdaptor API. (#4822) @crutcher
Prep for Group Multi Optimizers (#4818) @crutcher
Fix clippy lints (#4820) @laggui
Matmul selection (#4773) @nathanielsimard
Fix conv x-backward padding_out bug (#4806) @antimora
burn-flex: implement softmax and layer_norm backend op (#4805) @antimora
Add FloatInfo for dtype-aware precision info (#4721) @antimora
Add softmax and layer_norm backend trait hooks (#4797) @antimora
Update bitstream-io & rustls-webpki (yanked + audit) (#4801) @laggui
feat(burn-nn): add native LocalResponseNorm module (#4765) @jcwal1516
Fix: make module cloning efficient for CPU devices (#4703) @antimora
burn-flex: enable f16 tests and fix mean overflow, grid_sample and quantization (#4769) @antimora
Seed CubeCL normal distribution test (#4791) @leohenon
Drop burn-flex I64 debug_asserts (#4780) @antimora
fix(vision): propagate backend features to burn-vision (#4753) @jcwal1516
Optimize and update LU decomposition function (#4738) @softmaximalist
Fix burn-flex attention rejecting broadcasted mask/bias (#4777) @antimora
Fix burn-flex bool binary ops to broadcast operands (#4775) @antimora
Add burn-flex CPU backend (#4761) @antimora
Fix flaky initializer_normal_init test (#4766) @leohenon
Fix unsqueeze_dims panic on duplicate sorted axes (#4764) @antimora
fix(ndarray): grouped conv SIMD clamp + regressions (#4727) @dnvt
Fix xtask CI renamed feature (#4763) @laggui
Fix/fusion autotune context (#4759) @nathanielsimard

Full Changelog: v0.21.0-pre.3...v0.21.0-pre.4

Contributors

antimora, ArthurBrussee, and 11 other contributors

Assets 2

v0.21.0-pre.3 Pre-release

Pre-release

laggui released this 08 Apr 20:11

v0.21.0-pre.3

52998ac

What's Changed

Fix select_assign OOB (#4760) @nathanielsimard
Fix unsqueeze_dims panic (#4755) @softmaximalist
Fix quantization tests and flaky tolerance (#4743) @laggui
Fix fusion scalar broadcasting in write_output_aligned (#4741) @laggui
Feat/implement fusion for irfft (#4736) @Sublime12
Fix cubecl cuda all-reduce + remove useless check in distributed server (#4720) @Charles23R
Feat/implement fusion for rfft (#4735) @Sublime12
update cubek & fix gemv autotune (#4726) @louisfd
Add more checks for quantized tensor reshape (#4704) @laggui
feat: support cross-kind tensor casting via .cast() (#4713) @antimora
chore: Fix some clippy errors, fix quant tests (#4708) @wingertge
Update cubek (#4714) @louisfd
Feat/add irfft (#4719) @Sublime12
Feat/add rfft (#4707) @Sublime12
Make Param Sync for parallel model inference (#4701) @antimora
Perf/burn fusion overhead (#4645) @nathanielsimard
Split TrainingStrategy to decouple the DistributedBackend requirement (#4710) @laggui
fix: use integer arithmetic for nearest-neighbor coordinate scaling (#4687) @wkrettek
All reduce in backward (#4650) @Charles23R
fix output in attention tuner (#4702) @louisfd
Fix attention_fallback NaN for fully-masked rows (#4697) @antimora
Add HammingWindow operator to burn-tensor (#4698) @RunjiaChen
update cubek and cubecl (#4699) @louisfd
Fix fusion consistency checks and binding estimation in burn-cubecl-fusion (#4695) @nathanielsimard
Update cubek and fix vecmat autotune (#4682) @louisfd
Ignore local tests with pre-trained weights (#4676) @laggui
Fix dispatch when only wgpu is enabled (maps to webgpu) (#4678) @laggui
update cubek (#4677) @louisfd
Fix fusion kernel vector_size mismatch on f16 output writes (#4675) @AdrianEddy
Include new vec2mat routine in matmul autotune (#4673) @louisfd
Update cubecl & cubek revs (#4672) @laggui
feat: add categorical sampling for tensors (#4655) @majiayu000
chore: Update to upstream changes in cubecl (#4670) @wingertge
Refactor backend tests to set device settings at initialization + use Dispatch (#4666) @laggui
Add HannWindow operator to burn-tensor (#4631) @walkinggo
fixup:(burn-ndarray) fix comment and tidy imports (#4668) @TsaoLun
Fix tch int_zeros dtype in sync (#4664) @laggui
[Breaking] Use device settings to provide output dtype (#4653) @laggui
feat: add FID vision metric (#4644) @cong-or
Add Adan optimizer implementation with tests (#4651) @sepcnt
[Breaking] Add bool store dtype + remove bool elem from fusion (#4649) @laggui
Selector/attention (#4648) @louisfd
fix(burn-ndarray): use owned storage for native heap allocations in from_data (#4647) @TsaoLun
add utilities fn to FusionServer (#4640) @Charles23R
Remove int powf and make powi numeric op (#4646) @laggui
refactor: View launch (#4639) @wingertge
chore: Update to cubecl changes (#4630) @wingertge
Dispatch autodiff checkpointing strategy support (#4629) @laggui
Implement RNNT loss (#4623) @cong-or
Remove named tensor (#4628) @laggui
Perf: Improve fusion score (#4511) @nathanielsimard
refactor: Vector size generic (#4624) @wingertge
Fix function arg name inconsistencies (#4626) @softmaximalist
Update building-blocks chapter (#4625) @softmaximalist
Refactor/device handle (#4593) @nathanielsimard
feat: Introduce Lanczos3 interpolation method (#4601) @ovr
Add Gram Matrix Loss for vision tasks (#4595) @softmaximalist
Fix fusion cumulative op inputs (#4621) @laggui
fix: replace ValidStep with InferenceStep in training.md (#4620) @TsaoLun
Update documentation link for burn-store (#4619) @softmaximalist
Improve module derive + add #[module(skip)] attribute (#4618) @laggui
Add HalfPrecisionAdapter for F32/F16 mixed-precision storage (#4594) @antimora
Fix cosine scheduler record in composed scheduler (#4617) @laggui
Update ONNX import docs for LoadStrategy and from_bytes (#4607) @antimora
Use shape in TensorData (#4603) @laggui
Update SSIM float types to f32 (#4602) @softmaximalist
Fix conv2d_weight_backward w/ strided channels and unit spatial dims (via conv_im2col_1x1) (#4591) @laggui
Add multi-scale SSIM for image quality assessment (#4555) @softmaximalist
Remove Clone bound from WindowsDataset item (#4597) @laggui
Add contributing guidelines with AI-assisted contributions policy (#4569) @antimora
feat: Implements DISTS metric (#4574) @koreaygj
Fix dispatch autodiff feature propagation (#4592) @laggui

Full Changelog: v0.21.0-pre.2...v0.21.0-pre.3

Contributors

ovr, antimora, and 16 other contributors

Assets 2

v0.21.0-pre.2 Pre-release

Pre-release

nathanielsimard released this 02 Mar 18:44

v0.21.0-pre.2

ab9f793

What's Changed

Fix: create multiple elemwise fused block by @nathanielsimard in #4497
Upgrade to rand 0.10 by @laggui in #4500
fix overflow in int_abs_elem for i64 min value by @Olexandr88 in #4486
Implements: LPIPS matrics for Image quality by @koreaygj in #4403
Fix quantization non-contiguous input by @laggui in #4498
Add SequenceOutput struct for sequence prediction outputs by @softmaximalist in #4474
feat: enhance attention() with scale, attn_bias, softcap, and is_causal by @antimora in #4476
Fix too many kernels by @nathanielsimard in #4505
feat: Enable 64-bit indexing for kernels by @wingertge in #4502
feat: support padding on arbitrary dimensions by @antimora in #4507
allow flash attention with causal by @louisfd in #4509
Remove getrandom w/ wasm_js backend by @laggui in #4515
Bump polars to 0.53.0 by @laggui in #4514
perf: Make backing storage of Shape more flexible by @wingertge in #4516
Combined PRs by @github-actions[bot] in #4528
feat: add align_corners support to InterpolateOptions by @antimora in #4518
fix: OptimSharded strategy validation device mismatch by @Dreaming-Codes in #4527
Add native sign unary ops for CubeCL float and int by @yash27-lab in #4513
Bump zip to 8.1.0 by @laggui in #4533
Fix image-classification-web links by @laggui in #4536
Fix zip yanked downstream dep by @laggui in #4540
add LBFGS optimizer by @donjuanplatinum in #4471
Replace Vec-based TransitionBuffer with tensor-backed storage by @arferreira in #4504
Implement CTC loss by @softmaximalist in #4529
refactor: Metadata type/strides refactor by @wingertge in #4534
Attention: remove default impl and implement for all backends by @louisfd in #4544
fix: resolve macOS build and test failures by @antimora in #4545
fix: Bool from_data_dtype panics on GPU backends by @antimora in #4551
Attention autotune by @louisfd in #4552
Attention: add autotune gate by @louisfd in #4554
Combined PRs by @github-actions[bot] in #4565
Optional Ordering for NdArrayElement by @skewballfox in #4559
Add Smooth L1 loss by @softmaximalist in #4547
Implement HardShrink, SoftShrink and Shrink Activations by @aditya0by0 in #4556
doc(notebook) : add more basic operations and some examples by @Tyooughtul in #4542
Update cubecl/cubek revs by @laggui in #4568
Fix(lpips): load ImageNet backbone weights for pretrained models by @koreaygj in #4557
[Feat] Global backend Dispatch by @laggui in #4508
fix(burn-candle): move wildcard match arm to end of dtype match by @holg in #4571
move sign back to mathOps by @skewballfox in #4573
refactor: Move from CubeOption to Option by @wingertge in #4543
update attention cubek autotune by @louisfd in #4579
Add evaluator summary by @laggui in #4578
Move burn-nn module name checks in burn-store adapter to the test section by @softmaximalist in #4580
Expose BurnpackError by @AdrianEddy in #4585
Combined PRs by @github-actions[bot] in #4588
Bump versions by @nathanielsimard in #4589
Add burn-dispatch publish by @laggui in #4590

Full Changelog: v0.21.0-pre.1...v0.21.0-pre.2

Contributors

antimora, wingertge, and 15 other contributors

Assets 2

v0.21.0-pre.1 Pre-release

Pre-release

nathanielsimard released this 09 Feb 22:13

v0.21.0-pre.1

3fa8dfa

What's Changed

Bump burn version 0.21 by @laggui in #4333
Use NodeType to point to unimplemented node by @laggui in #4334
burn-train: include GPU power draw in CudaMetric by @StanByriukov02 in #4322
Fix book guide training changes by @laggui in #4340
Combined PRs by @github-actions[bot] in #4352
ensure that tensor is owned on iter_dim call by @tzemanovic in #4309
docs: add DataframeDataset example using Polars by @SameerVers3 in #4298
Add evaluation name as_str + display by @laggui in #4354
Fix memory growth: use GraphLocator::remove_entry for orphan cleanup by @jnamika in #4342
Bump ratatui from 0.29.0 to 0.30.0 by @dependabot[bot] in #4305
Performance tweaks to the lp_norm code. by @crutcher in #4318
Add Scalar runtime literal by @laggui in #4337
Add compile errors for module derive by @laggui in #4356
Make ElementComparison optional for dtypes by @skewballfox in #4255
fix: Actually implement conv backwards ops for burn-fusion/burn-router by @wingertge in #4360
Update for cubecl try_cast_unchecked -> downcast rename by @adolago in #4335
fix: Fix interpolate with NHWC input by @wingertge in #4363
Move ONNX import to burn-onnx crate by @laggui in #4361
Update cubek by @laggui in #4365
Implement Mean(L(P) Norm Error)Loss by @softmaximalist in #4341
Fix clippy rust 1.93 by @laggui in #4371
Use cache_dir() instead of hardcoded ~/.cache path by @antimora in #4372
Combined PRs by @github-actions[bot] in #4386
Fix typo in dataset.md in Burn Book by @softmaximalist in #4380
chore: Enable macos CI by @dcvz in #4389
add AMSgrad support for Adam/AdamW by @donjuanplatinum in #4388
Implement the PSNR vision metric by @softmaximalist in #4379
Update cubecl wgpu v28 by @laggui in #4244
Bump tracel-ai/github-actions from 6 to 7 by @dependabot[bot] in #4394
Bump tracel-ai/github-actions/.github/workflows/publish-crate.yml from 6 to 7 by @dependabot[bot] in #4395
chore: enable metal backend tests on ci by @dcvz in #4390
Feat/device policy by @laggui in #4373
More explicit global dtype support by @laggui in #4400
Move ONNX crates to burn-onnx repository by @antimora in #4393
opt(burn-cubecl): Optimized tensors by default by @wingertge in #4402
chore: fix typos caught by xtask by @huahuadeliaoliao in #4406
Add field docs to generated methods by @swfsql in #4408
Make transformer layer APIs public for cross-crate usage by @antimora in #4409
Implement SSIM vision metric by @softmaximalist in #4396
Combined PRs by @github-actions[bot] in #4425
move sort functions to orderable trait by @skewballfox in #4419
[BREAKING] Add asymmetric padding support for conv and pool operations by @antimora in #4263
Update Burn Book: metrics and trig functions by @softmaximalist in #4413
Add device dtype usage by @laggui in #4404
add KLDivLoss and batch_mean in reduction by @donjuanplatinum in #4399
feat(burn-store): add ModuleAdapter chaining by @huahuadeliaoliao in #4407
Fix cubek matmul stage size by @laggui in #4435
Bump tracel-ai/github-actions/.github/workflows/publish-crate.yml from 7 to 8 by @dependabot[bot] in #4443
chore: deprecate burn-candle backend by @antimora in #4416
Add configurable activation and layer_norm_eps to transformer layers by @antimora in #4410
Add Softsign activation function by @antimora in #4437
chore: update workflows by @syl20bnr in #4446
Add ThresholdedRelu activation function by @antimora in #4440
Combined PRs by @github-actions[bot] in #4453
Add check for wasm-bindgen installation by @zhoukekestar in #4358
Add BiGru (bidirectional GRU) module by @antimora in #4442
Fix: SupervisedTraining should use the model device by default by @laggui in #4456
Add Elu activation function by @antimora in #4438
chore: update workflows to use Tracel GitHub actions v9 by @syl20bnr in #4457
Add CELU activation function by @antimora in #4441
Add Selu activation function by @antimora in #4439
Burn rl by @Charles23R in #4447
Perf/fusion/reduce broadcasted by @nathanielsimard in #4338
Implement median tensor operation by @softmaximalist in #4454
Add deg2rad and rad2deg by @softmaximalist in #4462
fix: use all dilation entries in max_pool2d_with_indices_backward by @fcasal in #4466
Update zip + time by @laggui in #4468
Implement basic RNN module by @aditya0by0 in #4460
fix: default to single device strat when only 1 device by @Charles23R in #4463
Combined PRs by @github-actions[bot] in #4485
Add module.train() to move a module back to the autodiff backend by @laggui in #3975
chore: Update cubecl to runtime config refactor by @wingertge in #4489
Feature flag + Tests for RL in burn-rl and burn-train by @Charles23R in #4470
Fix reduce line size parallel and mean accumulator precision by @laggui in #4467
Chore: Pre-Release 0.21.0-pre.1 by @nathanielsimard in #4494
Fix pre-release by @nathanielsimard in #4495

Contributors

jnamika, antimora, and 20 other contributors

Assets 2

v0.20.1

laggui released this 23 Jan 17:43

v0.20.1

75b7881

Bug Fixes & Improvement

Fix book guide training changes (#4340) @laggui
Fix dequantize native debug statement (tracel-ai/cubek#69) @laggui
Do not point to pinned exact versions to allow pulling patch releases @laggui

Contributors

laggui

Assets 2

v0.20.0

laggui released this 15 Jan 16:08

v0.20.0

3475ba8

Summary

This release marks a major turning point for the ecosystem with the introduction of CubeK. Our goal was to solve a classic challenge in deep learning: achieving peak performance on diverse hardware without maintaining fragmented codebases.

By unifying CPU and GPU kernels through CubeCL, we've managed to squeeze maximum efficiency out of everything from NVIDIA Blackwell GPUs to standard consumer CPUs.

Beyond performance, this release makes the library more robust, flexible, and significantly easier to debug.

This release also features a complete overhaul of the ONNX import system, providing broader support for a wide range of ONNX models. In addition, various bug fixes and new tensor operations enhance stability and usability.

For more details, check out the release post on our website.

Changelog

Breaking

We've introduced a couple of breaking API changes with this release. The affected interfaces are detailed in the sections below.

Training

We refactored burn-train to better support different abstractions and custom training strategies. As part of this,
the LearnerBuilder has been replaced by the LearningParadigm flow:

- let learner = LearnerBuilder::new(ARTIFACT_DIR)
+ let training = SupervisedTraining::new(ARTIFACT_DIR, dataloader_train, dataloader_valid)
        .metrics((AccuracyMetric::new(), LossMetric::new()))
        .num_epochs(config.num_epochs)
-       .learning_strategy(burn::train::LearningStrategy::SingleDevice(device))
-       .build(model, config.optimizer.init(), lr_scheduler.init().unwrap());
+       .summary();
 
- let result = learner.fit(dataloader_train, dataloader_valid);
+ let result = training.launch(Learner::new(
+      model,
+      config.optimizer.init(),
+      lr_scheduler.init().unwrap(),
+ ));

Interface Changes

The scatter and select_assign operations now require an IndexingUpdateOp to specify the update behavior.

- let output = tensor.scatter(0, indices, values);
+ let output = tensor.scatter(0, indices, values, IndexingUpdateOp::Add);

API calls for slice, slice_assign, and slice_fill no longer require const generics for dimensions, which cleans up the syntax quite a bit:

- let prev_slice = tensor.slice::<[Range<usize>; D]>(slices.try_into().unwrap());
+ let prev_slice = tensor.slice(slices.as_slice());

The grid_sample_2d operation now supports different options.
To preserve the previous behavior, make sure to specify the matching options:

- let output = tensor.grid_sample_2d(grid, InterpolateMode::Bilinear);
+ let options = GridSampleOptions::new(InterpolateMode::Bilinear)
+     .with_padding_mode(GridSamplePaddingMode::Border)
+     .with_align_corners(true);
+ let output = tensor.grid_sample_2d(grid, options);

The QuantStore variants used in QuantScheme have been updated to support a packing dimension.

  pub enum QuantStore {
      /// Native quantization doesn't require packing and unpacking.
      Native,
+     /// Store packed quantized values in a natively supported packing format (i.e. e2m1x2).
+     PackedNative(usize),
      /// Store packed quantized values in a 4-byte unsigned integer.
-     U32,
+     PackedU32(usize),
 }

Finally, Shape no longer implements IntoIterator. If you need to iterate by-value over dimensions, access the dims field directly.

- for s in shape {
+ for s in shape.dims {

Module & Tensor

Generalize linalg::outer semantics; add linalg::outer_dim (#3923) @crutcher
Use square() where appropriate. (#3900) @crutcher
Add linalg matvec (#3967) @huy209vn
Add GaussianNoise layer (#4022) @kul-sudo
Make TransformerEncoderLayer fields public (#4053) @Mnwa
Workaround MPS embedding allocation error in LibTorch (#4073) @antimora
Fix Slice operation to handle empty ranges (#4083) @antimora
Handle empty tensors in cat and slice_assign ops (#4095) @antimora
[Breaking] Add IndexingUpdateOp to scatter and select_assign (#4070) @laggui
Add CrossAttention module to burn-nn (#4101) @huy209vn
Add reflect and edge padding modes to tensor.pad (#4105 #) @antimora
Fix GLU and quiet softmax activations (#4121) @laggui
Add ceil_mode support to pooling operations (MaxPool, AvgPool) (#4112) @antimora
[Breaking] Remove D2 const generic from slice / SliceArg (#4127) @crutcher
Add backend supports_dtype (#4155) @laggui
Fix repeat 0 times (#4216) @laggui
feat: add hardswish activation (#4209) @mertalev
Add more trig ops (#4282) @laggui
Add empty/zeros/ones/full TensorCreationOptions (#4285) @laggui
feat: nms op (#4246) @mertalev

Datasets & Training

Refactor metric loggers(#3895 #4017) @Charles23R
Add support for custom learning strategy (#3921) @Charles23R
Feat/optim/distributed (#4018) @nathanielsimard
Refactor MetricEntry (#4031) @Charles23R
Feature muon (#3925) @NewBornRustacean
Add warmup epochs to MetricEarlyStoppingStrategy (#4041) @crutcher
Log running values (#4199) @Charles23R
Fix checkpoint and summary log level (#4201) @J-F-Liu
[Breaking] Burn train api refactor (#4223 #4283) @Charles23R
Fix checkpointer interrupt (#4268) @Charles23R

Backends

Add candle device seeding (#3959) @laggui
feat: Enable tuning for MMA matmul (#3961) @wingertge
feat: TMA autotuning (#3986) @wingertge
feat: Enable tuning specialized matmul (#4026) @wingertge
Add CubeCL Flash Attention module (#4089 #4192) @louisfd
Zero-copy tensor loading for NdArray backend (#4178) @antimora
feat: Implicit GEMM weight gradients for convolution (#4182) @wingertge
Perf/reduce cpu + Fix OOB (#4197 #4204) @nathanielsimard
feat: Accelerated convolution data gradient (#4220) @wingertge
Remove linux-only constraint for cpu (#4233) @louisfd
Perf/into contiguous (#4257) @nathanielsimard
fix: grid sample using excessive memory (#4236 #4242) @mertalev
Add fast-path for batched vector–matrix matmul (#4300) @louisfd

Bug Fixes

Fix async barrier & TMA checks (#4007) @nathanielsimard
Fix fusion reduce local already registered as output (#4014) @laggui
Fix remainder int (#4015) @laggui
Fix cuda mem error (#4020) @nathanielsimard
Cleanup autodiff unused roots (#4039) @laggui
Fix autotuner (#4049) @nathanielsimard
Fix scatter values backward (#4064) @khoek
More correctness fixes in autodiff ops (#4069) @khoek
Fix transaction read (#4074) @laggui
Fix tch bf16 kind (#4088 #4142 #4203) @laggui
Fix cubecl cuda compilation error/typo (#4092) @BjornTheProgrammer
Fix output dtype for argmin / argmax (#4195) @tzemanovic
Return slice for each dimension in shape (#4152) @laggui

Documentation & Examples

Update raspberry pi pico example (#4034 #4132) @BjornTheProgrammer
Contributor Book: Update the "ONNX to Burn" Page (#4229) @softmaximalist
docs: add examples for bool tensor operations (#4248) @qburke
Update the "Adding New Operation" guide in the contributor book (#4284) @softmaximalist
Refactor dop_timer for multiple trials (for warmup). (#4288) @crutcher
Added documentation examples for more boolean tensor operations in burn-tensor (#4289) @qburke

Fixes

Fix book (#3942) @laggui
remove repetitive words in comment (#4029) @black5box
Include katex header as symlink (#4118) @laggui
Fix quantization docs (make it clear that only PTQ is currently supported) (#4316) @laggui

ONNX Support

ONNX IR and import refactor to better support complex graphs (#3872 #4019 #4033 #4094) @antimora
Add ONNX control flow operators: If, Loop, and Scan (#3936) @antimora
Silero VAD ONNX model verification (#3999) @antimora
Add support for yolo12x model variant (#4048) @antimora
Remove burn-import abstraction layer and use onnx-ir types directly (#4033) @antimora
Fix ConstantOfShape output size determination (#4085) @antimora
Specify output rank in squeeze_dims for type inference (#4086) @antimora
Fix Expand operation to use ONNX max-semantics (#4082) @antimora
[Breaking] Add ONNX GridSample op support and tests (#4084) @antimora
Add RF-DETR model check for burn-import (#4087) @antimora
Add LSTM operator support with configurable activations (#4106) @antimora
Add memory-mapped ONNX loading with tensor data ref (#4097) @antimora
Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) (#4119) @antimora
Add Reshape scalar optimization and Gather scalar input support (#4146) @antimora
Update GELU ONNX test to use native op and fix expected values (#4161) @antimora
Add ONNX CumSum operator support (#4162) @antimora
Remove global ONNX opset version restriction, recommend opset 16 (#4168) @antimora
Handle 1D slope when importing prelu from onnx (#4205) @mertalev
Fix handling scalar scan outputs in ONNX loop nodes (#4210) @antimora
Add ONNX external data support for models >2GB (#4158) @antimora
fix: handle negative indices in onnx gather op (#4207) @mertalev
Split backend tensor ops tests (#4232) @laggui
Do not use alloc import in burn-import codegen (#4286) @laggui
Fix ONNX where broadcasted dims (#4315) @laggui

Enhancements

Feat/pinned memory staging (#4016) @nathanielsimard
burn-store enhancements for troubleshooting and new enum skip flag (#4051) @antimora
Feat/runtime error (#4079 #4110) @nathanielsimard
Perf/improve reduce autotuning + plane non uniform control flow check (#4208) @nathanielsimard
Packed quantized matmul with QuantStore changes (#4310 #4323) @wingertge

Refactoring

chore: Update to batch caching PR for cubecl (#3948) @wingertge
Refactor IR to define outputs as a function of the operation (#3877) ...

Contributors

khoek, antimora, and 19 other contributors

Assets 2

v0.20.0-pre.6 Pre-release

Pre-release

nathanielsimard released this 18 Dec 21:27

v0.20.0-pre.6

91dd62c

What's Changed

doc warning fix by @crutcher in #4130
Fix tch bf16 into_data by @laggui in #4142
Update raspberry-pi-pico example to use the Pico 2, and burnpack by @BjornTheProgrammer in #4132
Unify all_reduce LocalCollectiveClient operation handling. by @crutcher in #4125
Add direct tensor snapshot retrieval API to ModuleStore by @antimora in #4131
Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) by @antimora in #4119
Add removed docs for tensor equal_elem by @laggui in #4145
Add ceil_mode support to pooling operations (MaxPool, AvgPool) by @antimora in #4112
chore: Update cubecl by @wingertge in #4134
Implement Slice iterator and utility methods. by @crutcher in #4042
Bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #4148
Add slice_dyn, slice_assign_dyn, and slice_fill_dyn variants. by @crutcher in #4127
Add Reshape scalar optimization and Gather scalar input support by @antimora in #4146
Shape FromStr/ToString by @crutcher in #4143
Add contiguous reindexing for non-contiguous layer indices by @antimora in #4150
Add warmup epochs to MetricEarlyStoppingStrategy. (#3970) by @crutcher in #4041
fix(onnx): Use activation function for GELU codegen instead of non-existent tensor method by @antimora in #4161
Refactor more basic ops by @laggui in #4156
Refactor LocalCollectiveServer for improved clarity and error handling by @crutcher in #4126
Fix typo in comment for logger_task function by @crutcher in #4159
Refactor configurable backend tests (no more testgen macros) by @laggui in #4129
Zero-copy loading for embedded burnpack weights by @antimora in #4154
Fix candle cuda imports by @laggui in #4171
Backends no longer depend on burn-tensor, but strictly burn-backend by @laggui in #4169
Chore/update cubek cubecl by @nathanielsimard in #4172
Add ONNX CumSum operator support by @antimora in #4162
Add backend supports_dtype by @laggui in #4155
Fix attention shapes and out rank by @laggui in #4192
Fix matmul & reduce execute fuse no autotune by @laggui in #4193
Fix output dtype for argmin / argmax by @laggui in #4195
Add flatten_dims method to Shape and refactor tensor flattening API by @crutcher in #4189
Return slice for each dimension in shape by @laggui in #4152
Make xtask validate run no-std checks first. by @crutcher in #4198
Fix: CubeCL Reduce by @nathanielsimard in #4197
Reorganize and tracing::instrument collective operations. by @crutcher in #4157
Log running values by @Charles23R in #4199
Remove global ONNX opset version restriction, recommend opset 16 by @antimora in #4168
Fix dtype preservation when loading tensors in burn-store by @antimora in #4194
Fix TchTensor::from_data bf16 by @laggui in #4203
Perf/reduce cpu + Fix OOB by @nathanielsimard in #4204
feat: Implicit GEMM weight gradients for convolution by @wingertge in #4182
Fix checkpoint and summary log level by @J-F-Liu in #4201
fix: handle 1D slope when importing prelu from onnx by @mertalev in #4205
Zero-copy tensor loading for NdArray backend by @antimora in #4178
Fix quantized tensor storage data length calculation by @antimora in #4180
Fix handling scalar scan outputs in ONNX loop nodes by @antimora in #4210
Perf/improve reduce autotuning + plane non uniform control flow check by @nathanielsimard in #4208
Add ONNX external data support for models >2GB by @antimora in #4158
Update/cubek by @louisfd in #4214
Refactor: Replace canonicalize_dim with expect_dim by @crutcher in #4196
fix: handle negative indices in onnx gather op by @mertalev in #4207
Refactor/cube dim by @nathanielsimard in #4217
Refactor: Consolidate shape and slice error handling into ExpressionError by @crutcher in #4218
Update: CubeK by @louisfd in #4222
feat: Accelerated convolution data gradient by @wingertge in #4220
Fix repeat 0 times by @laggui in #4216
Burn train api refactor by @Charles23R in #4223
Chore/pre release 6 by @nathanielsimard in #4224

Contributors

antimora, wingertge, and 9 other contributors

Assets 2

v0.20.0-pre.5 Pre-release

Pre-release

nathanielsimard released this 08 Dec 14:53

v0.20.0-pre.5

42edc63

What's Changed

Bump version by @nathanielsimard in #4102
Handle empty tensors in cat and slice_assign ops by @antimora in #4095
Add network utilities to burn-std by @laggui in #4104
Remove RefCell from onnx-ir Arguments by @antimora in #4094
Fix raspberry pi pico example not compiling by @BjornTheProgrammer in #4034
Flash Attention module by @louisfd in #4089
[Breaking] Add IndexingUpdateOp to scatter and select_assign by @laggui in #4070
Feat/improve errors by @nathanielsimard in #4110
Add 256-byte tensor alignment to burnpack format for mmap zero-copy support by @antimora in #4100
Add CrossAttention module to burn-nn by @huy209vn in #4101
Add reflect and edge padding modes to tensor.pad by @antimora in #4105
Add LSTM operator support with configurable activations by @antimora in #4106
Add memory-mapped ONNX loading with lazy tensor data by @antimora in #4097
Refactor RemoteDevice to use a thread-safe global address registry. by @crutcher in #4113
Partial cleanup of RemoteSender api. by @crutcher in #4108
Move backend traits and types to burn-backend by @laggui in #4111
Fix remote sync error by @laggui in #4117
Small LSTM clean up of unused variable by @antimora in #4116
Fix/autotune checks by @nathanielsimard in #4114
Include katex header as symlink by @laggui in #4118
chore: Update cubecl by @wingertge in #4120
Fix GLU and quiet softmax activations by @laggui in #4121
Migrate ONNX import to burnpack format (removing Record type) by @antimora in #4122
Combined PRs by @github-actions[bot] in #4140
Chore/pre release 5 by @nathanielsimard in #4141

Contributors

antimora, wingertge, and 6 other contributors

Assets 2

Uh oh!

Releases: tracel-ai/burn

Release list

v0.21.0

Summary

Changelog

burn-dataset cache directory

Interface Changes

Module & Tensor

Datasets & Training

Contributors

Uh oh!

v0.21.0-pre.5

What's Changed

Contributors

Uh oh!

v0.21.0-pre.4

What's Changed

Contributors

Uh oh!

v0.21.0-pre.3

What's Changed

Contributors

Uh oh!

v0.21.0-pre.2

What's Changed

Contributors

Uh oh!

v0.21.0-pre.1

What's Changed

Contributors

Uh oh!

v0.20.1

Bug Fixes & Improvement

Contributors

Uh oh!

v0.20.0

Summary

Changelog

Training

Interface Changes

Module & Tensor

Datasets & Training

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Contributors

Uh oh!

v0.20.0-pre.6

What's Changed

Contributors

Uh oh!

v0.20.0-pre.5

What's Changed

Contributors

Uh oh!

`burn-dataset` cache directory