Extractors

Base Extractors

class udao.data.extractors.base_extractors.StaticFeatureExtractor

Bases: ABC, Generic[T]

class udao.data.extractors.base_extractors.TrainedFeatureExtractor

Bases: ABC, Generic[T]

Query Plan Extractors

class udao.data.extractors.query_structure_extractor.QueryStructureExtractor

Bases: StaticFeatureExtractor[QueryStructureContainer]

Extracts the features of the operations in the logical plan, and the tree structure of the logical plan. Keep track of the different query plans seen so far, and their template id.

extract_features(df: DataFrame) QueryStructureContainer

Extract the features of the operations in the logical plan, and the tree structure of the logical plan for each query plan in the dataframe.

Parameters:

df (pd.DataFrame) – Dataframe with a column “plan” containing the query plans.

Returns:

Dataframe with one row per operation in the query plans, and one column per feature of the operations.

Return type:

pd.DataFrame

class udao.data.extractors.predicate_embedding_extractor.PredicateEmbeddingExtractor(embedder: ~udao.data.embedders.base_embedder.BaseEmbedder, op_preprocessing: ~typing.Callable[[str], str] = <function prepare_operation>)

Bases: TrainedFeatureExtractor[TabularContainer]

Class to extract embeddings from a DataFrame of query plans.

Parameters:

embedder (BaseEmbedder) – Embedder to use to extract the embeddings, e.g. an instance of Word2Vecembedder.

extract_features(df: DataFrame, split: str) TabularContainer

Extract embeddings from a DataFrame of query plans.

Parameters:
  • df (pd.DataFrame) – DataFrame containing the query plans and their ids.

  • split (str) – Split of the dataset, either “train”, “test” or “validation”. Will fit the embedder if “train” and transform otherwise.

Returns:

DataFrame containing the embeddings of each operation of the query plans.

Return type:

pd.DataFrame

Tabular Extractor

class udao.data.extractors.tabular_extractor.TabularFeatureExtractor(feature_func: Callable)

Bases: StaticFeatureExtractor[TabularContainer]