Extractors
Base Extractors
- class udao.data.extractors.base_extractors.StaticFeatureExtractor
Bases:
ABC,Generic[T]
- class udao.data.extractors.base_extractors.TrainedFeatureExtractor
Bases:
ABC,Generic[T]
Query Plan Extractors
- class udao.data.extractors.query_structure_extractor.QueryStructureExtractor
Bases:
StaticFeatureExtractor[QueryStructureContainer]Extracts the features of the operations in the logical plan, and the tree structure of the logical plan. Keep track of the different query plans seen so far, and their template id.
- extract_features(df: DataFrame) QueryStructureContainer
Extract the features of the operations in the logical plan, and the tree structure of the logical plan for each query plan in the dataframe.
- Parameters:
df (pd.DataFrame) – Dataframe with a column “plan” containing the query plans.
- Returns:
Dataframe with one row per operation in the query plans, and one column per feature of the operations.
- Return type:
pd.DataFrame
- class udao.data.extractors.predicate_embedding_extractor.PredicateEmbeddingExtractor(embedder: ~udao.data.embedders.base_embedder.BaseEmbedder, op_preprocessing: ~typing.Callable[[str], str] = <function prepare_operation>)
Bases:
TrainedFeatureExtractor[TabularContainer]Class to extract embeddings from a DataFrame of query plans.
- Parameters:
embedder (BaseEmbedder) – Embedder to use to extract the embeddings, e.g. an instance of Word2Vecembedder.
- extract_features(df: DataFrame, split: str) TabularContainer
Extract embeddings from a DataFrame of query plans.
- Parameters:
df (pd.DataFrame) – DataFrame containing the query plans and their ids.
split (str) – Split of the dataset, either “train”, “test” or “validation”. Will fit the embedder if “train” and transform otherwise.
- Returns:
DataFrame containing the embeddings of each operation of the query plans.
- Return type:
pd.DataFrame
Tabular Extractor
- class udao.data.extractors.tabular_extractor.TabularFeatureExtractor(feature_func: Callable)