Skip to content

Handling categorical variables on new data #1101

Description

@krolikowskib

Hey,
I have 2 questions regarding how FLAML handles categorical variables on new data, different from the initial training dataset (for example, during inference after model deployment).

  1. Does it handle new categories in categorical features (unseen during training)?
  2. SKLearn and XGBoost estimators use ordinal encodings of categorical features. But it seems the categorical codes are extracted during inference (code). Doesn't it mean that the encodings will be different when running on a different dataset, thus mixing the categories passed to the model? If so, then sklearn's OrdinalEncoder would be a better choice here (persisting correct category codes).

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationquestionFurther information is requested

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions