Data handling

Reports

VaRA-TS manages experiment result data in the form of reports. The report file contains all generated data during the experiment and the report class gives the user an interface to interact with the data. To simplify report handling and storage management, the report base classes provide functionality to automatically create customized filenames. In each filename, the framework encodes information like report type, project, revision, and a UUID, to specify the run that created the file. Furthermore, report implementers have the option to customize the filename even further.

As a simple example and help to implement your own report, take a look at the EmptyReport.

List of provided report classes

Report module

The Report module implements basic report functionalities and provides a minimal interface BaseReport to implement own reports.

class varats.report.report.FileStatusExtension(*values)[source]

Bases: Enum

Enum to abstract the status of a file.

Specific report files can map these to their own specific representation.

value: Tuple[str, Color]
SUCCESS = ('success', <ANSIStyle: Green>)
PARTIAL = ('partial', <ANSIStyle: Full: DarkTurquoise>)
INCOMPLETE = ('incomplete', <ANSIStyle: Full: OrangeRed1>)
FAILED = ('failed', <ANSIStyle: LightRed>)
COMPILE_ERROR = ('cerror', <ANSIStyle: Red>)
MISSING = ('###', <ANSIStyle: Full: Yellow3A>)
BLOCKED = ('blocked', <ANSIStyle: Blue>)
get_status_extension()[source]

Returns the corresponding file ending to the status.

Return type:

str

nice_name()[source]

Returns a nicely formatted name.

Return type:

str

property status_color: Color

Returns the corresponding color to the status.

get_colored_status()[source]

Returns the corresponding file status, colored in the specific status color.

Return type:

str

num_color_characters()[source]

Returns the number of non printable color characters.

Return type:

int

static get_physical_file_statuses()[source]

Returns the set of file status extensions that are associated with real result files.

Return type:

Set[FileStatusExtension]

static get_virtual_file_statuses()[source]

Returns the set of file status extensions that are not associated with real result files.

Return type:

Set[FileStatusExtension]

static get_regex_grp()[source]

Returns a regex group that can match all file stati.

Return type:

str

static get_file_status_from_str(status_name)[source]

Converts the name of a status extensions to the specific enum value.

Parameters:

status_name (str) – name of the status extension

Return type:

FileStatusExtension

Returns:

FileStatusExtension enum with the specified name

Test: >>> FileStatusExtension.get_file_status_from_str(‘success’) <FileStatusExtension.SUCCESS: (‘success’, <ANSIStyle: Green>)>

>>> FileStatusExtension.get_file_status_from_str('SUCCESS')
<FileStatusExtension.SUCCESS: ('success', <ANSIStyle: Green>)>
>>> FileStatusExtension.get_file_status_from_str('###')
<FileStatusExtension.MISSING: ('###', <ANSIStyle: Full: Yellow3A>)>
>>> FileStatusExtension.get_file_status_from_str('CompileError')
<FileStatusExtension.COMPILE_ERROR: ('cerror', <ANSIStyle: Red>)>
static combine(lhs, rhs)[source]

Combines two FileStatusExtension into one.

Should no specific combination rule apply, the lhs is used as a default.

Return type:

FileStatusExtension

class varats.report.report.ReportFilename(file_name)[source]

Bases: object

ReportFilename wraps special semantics about our report filenames around strings and paths.

static construct(filepath, base_folder)[source]

Constructs a ReportFilename from a given path and a base folder.

The base folder can be omitted should the filepath only contain the file.

Return type:

ReportFilename

property filename: str

Literal file name.

property project_name: str

Name of the analyzed project.

property binary_name: str

Name of the analyzed binary.

has_status_success()[source]

Checks if the file name is a (Success) result file.

Return type:

bool

Returns:

True, if the file name is for a success file

has_status_failed()[source]

Check if the file name is a (Failed) result file.

Return type:

bool

Returns:

True, if the file name is for a failed file

has_status_compileerror()[source]

Check if the filename is a (CompileError) result file.

Return type:

bool

Returns:

True, if the file name is for a compile error file

has_status_missing()[source]

Check if the filename is a (Missing) result file.

Return type:

bool

Returns:

True, if the file name is for a missing file

has_status_blocked()[source]

Check if the filename is a (Blocked) result file.

Return type:

bool

Returns:

True, if the file name is for a blocked file

static result_file_has_status(file_name, extension_type)[source]

Check if the passed file name is of the expected file status.

Parameters:
  • file_name (str) – name of the file to check

  • extension_type (FileStatusExtension) – expected file status extension

Return type:

bool

Returns:

True, if the file name is for a file with the the specified extension_type

is_result_file()[source]

Check if the file name is formatted like a result file.

Return type:

bool

Returns:

True, if the file name is correctly formatted

property commit_hash: ShortCommitHash

Commit hash of the result file.

Returns:

the commit hash from a result file name

property experiment_shorthand: str

Experiment shorthand of the result file.

Returns:

the experiment shorthand from a result file

property report_shorthand: str

Report shorthand of the result file.

Returns:

the report shorthand from a result file

property file_status: FileStatusExtension

Get the FileStatusExtension from a result file.

Returns:

the FileStatusExtension of the result file

property config_id: int | None

Configuration ID of the result file. A configuartion ID is only present in configuration specific reports, for others, no ID exists.

Returns:

the configuration ID from a result file

is_configuration_specific_file()[source]

Check if the file name contains configuration specific information.

Return type:

bool

Returns:

True, if the file name is configuration specific

property uuid: str

Report UUID of the result file, genereated by BenchBuild during the experiment.

property file_suffix: str

File suffix, commonly known as file ending/type (in the codebase referred to as file_ext).

static get_file_name(experiment_shorthand, report_shorthand, project_name, binary_name, project_revision, project_uuid, extension_type, file_ext='.txt', config_id=None)[source]

Generates a filename for a report file out the different parts.

Parameters:
  • experiment_shorthand (str) – unique shorthand of the experiment

  • report_shorthand (str) – unique shorthand of the report

  • project_name (str) – name of the project for which the report was generated

  • binary_name (str) – name of the binary for which the report was generated

  • project_revision (ShortCommitHash) – revision of the project, i.e., commit hash

  • project_uuid (str) – benchbuild uuid for the experiment run

  • extension_type (FileStatusExtension) – to specify the status of the generated report

  • file_ext (str) – file extension of the report file

Return type:

ReportFilename

Returns:

name for the report file that can later be uniquely identified

with_status(new_status)[source]

Returns a new report filename, adapted with the new file extension new_status.

Return type:

ReportFilename

class varats.report.report.ReportFilepath(base_path, report_filename)[source]

Bases: object

ReportFilepath combines report filenames with path semantics and presents the file as a full path.

static construct(full_filepath, base_folder)[source]

Constructs a ReportFilepath from a given full path, ideally a fully qualified path but this is not strictly required, and a base folder.

Return type:

ReportFilepath

property base_path: Path
property report_filename: ReportFilename
full_path()[source]
Return type:

Path

with_status(new_status)[source]
Return type:

ReportFilepath

stat()[source]
Return type:

stat_result

class varats.report.report.BaseReport(path)[source]

Bases: object

Report base class to add general report properties and helper functions.

REPORT_TYPES: Dict[str, Type[BaseReport]] = {'BlameAnnotations': <class 'varats.data.reports.blame_annotations.BlameAnnotations'>, 'BlameComparisonReport': <class 'varats.data.reports.blame_annotations.BlameComparisonReport'>, 'BlameReport': <class 'varats.data.reports.blame_report.BlameReport'>, 'BlameVerifierReportNoOptTBAA': <class 'varats.data.reports.blame_verifier_report.BlameVerifierReportNoOptTBAA'>, 'BlameVerifierReportOpt': <class 'varats.data.reports.blame_verifier_report.BlameVerifierReportOpt'>, 'CommitReport': <class 'varats.data.reports.commit_report.CommitReport'>, 'CompiledBinaryReport': <class 'varats.data.reports.compiled_binary_report.CompiledBinaryReport'>, 'EmptyReport': <class 'varats.data.reports.empty_report.EmptyReport'>, 'EnvTraceReport': <class 'varats.data.reports.env_trace_report.EnvTraceReport'>, 'FeatureAnalysisReport': <class 'varats.data.reports.feature_analysis_report.FeatureAnalysisReport'>, 'FeatureInstrumentationPointsReport': <class 'varats.data.reports.feature_instrumentation_points_report.FeatureInstrumentationPointsReport'>, 'FeatureTracingStatsReport': <class 'varats.data.reports.feature_tracing_stats_report.FeatureTracingStatsReport'>, 'GlobalsReportWith': <class 'varats.data.reports.globals_report.GlobalsReportWith'>, 'GlobalsReportWithout': <class 'varats.data.reports.globals_report.GlobalsReportWithout'>, 'HotFunctionReport': <class 'varats.report.hot_functions_report.HotFunctionReport'>, 'InstrVerifierReport': <class 'varats.data.reports.instrumentation_verifier_report.InstrVerifierReport'>, 'KeyedReportAggregate': <class 'varats.report.report.KeyedReportAggregate'>, 'LinuxPerfReport': <class 'varats.report.linux_perf_report.LinuxPerfReport'>, 'LinuxPerfReportAggregate': <class 'varats.report.linux_perf_report.LinuxPerfReportAggregate'>, 'MPRPIMAggregate': <class 'varats.experiments.vara.feature_perf_precision.MPRPIMAggregate'>, 'MPRTEFAggregate': <class 'varats.experiments.vara.feature_perf_precision.MPRTEFAggregate'>, 'MPRTimeReportAggregate': <class 'varats.experiments.vara.feature_perf_precision.MPRTimeReportAggregate'>, 'MultiPatchReport': <class 'varats.report.multi_patch_report.MultiPatchReport'>, 'PerfInfluenceTraceReport': <class 'varats.data.reports.performance_influence_trace_report.PerfInfluenceTraceReport'>, 'PerfInfluenceTraceReportAggregate': <class 'varats.data.reports.performance_influence_trace_report.PerfInfluenceTraceReportAggregate'>, 'PerfProfileReport': <class 'varats.data.reports.perf_profile_report.PerfProfileReport'>, 'PerfProfileReportAggregate': <class 'varats.data.reports.perf_profile_report.PerfProfileReportAggregate'>, 'PyDrillerSZZReport': <class 'varats.data.reports.szz_report.PyDrillerSZZReport'>, 'RegionVerificationReport': <class 'varats.data.reports.region_verification_report.RegionVerificationReport'>, 'ReportAggregate': <class 'varats.report.report.ReportAggregate'>, 'SZZReport': <class 'varats.data.reports.szz_report.SZZReport'>, 'SZZUnleashedReport': <class 'varats.data.reports.szz_report.SZZUnleashedReport'>, 'TEFReport': <class 'varats.report.tef_report.TEFReport'>, 'TEFReportAggregate': <class 'varats.report.tef_report.TEFReportAggregate'>, 'TaintPropagationReport': <class 'varats.data.reports.taint_report.TaintPropagationReport'>, 'TimeReport': <class 'varats.report.gnu_time_report.TimeReport'>, 'TimeReportAggregate': <class 'varats.report.gnu_time_report.TimeReportAggregate'>, 'WLHotFunctionAggregate': <class 'varats.report.hot_functions_report.WLHotFunctionAggregate'>, 'WLTimeReportAggregate': <class 'varats.report.gnu_time_report.WLTimeReportAggregate'>, 'WorkloadSpecificPITReportAggregate': <class 'varats.data.reports.performance_influence_trace_report.WorkloadSpecificPITReportAggregate'>, 'WorkloadSpecificReportAggregate': <class 'varats.experiment.workload_util.WorkloadSpecificReportAggregate'>, 'WorkloadSpecificTEFReportAggregate': <class 'varats.report.tef_report.WorkloadSpecificTEFReportAggregate'>}
SHORTHAND: str
FILE_TYPE: str
static lookup_report_type_from_file_name(file_name)[source]

Looks-up the correct report class from a given file_name.

Parameters:

file_name (str) – of the report file

Return type:

Optional[Type[BaseReport]]

Returns:

corresponding report class

static lookup_report_type_by_shorthand(shorthand)[source]

Looks-up the correct report class from a given report shorthand.

Parameters:

shorthand (str) – of the report file

Return type:

Optional[Type[BaseReport]]

Returns:

corresponding report class

classmethod get_file_name(experiment_shorthand, project_name, binary_name, project_revision, project_uuid, extension_type, config_id=None)[source]

Generates a filename for a report file.

Parameters:
  • experiment_shorthand (str) – unique shorthand of the experiment

  • project_name (str) – name of the project for which the report was generated

  • binary_name (str) – name of the binary for which the report was generated

  • project_revision (ShortCommitHash) – version of the analyzed project, i.e., commit hash

  • project_uuid (str) – benchbuild uuid for the experiment run

  • extension_type (FileStatusExtension) – to specify the status of the generated report

Return type:

ReportFilename

Returns:

name for the report file that can later be uniquly identified

property path: Path

Path to the report file.

property filename: ReportFilename

Filename of the report.

classmethod shorthand()[source]

Shorthand for this report.

Return type:

str

classmethod file_type()[source]

File type of this report.

Return type:

str

classmethod is_correct_report_type(file_name)[source]

Check if the passed file belongs to this report type.

Parameters:

file_name (str) – name of the file to check

Return type:

bool

Returns:

True, if the file belongs to this report type

class varats.report.report.ReportSpecification(*report_types)[source]

Bases: object

Groups together multiple report types into a specification that can be used, e.g., by experiments, to request multiple reports.

property report_types: List[Type[BaseReport]]

Report types in this report specification.

property main_report: Type[BaseReport]

Main report of this specification.

in_spec(report_type)[source]

Checks if a report type is specified in this spec.

Return type:

bool

get_report_type(shorthand)[source]

Look up a report type by its shorthand.

Parameters:

shorthand (str) – notation for the report

Return type:

Type[BaseReport]

Returns:

the report if it is part of this spec

class varats.report.report.KeyedReportAggregate(path, report_type, key_func, default_key=None)[source]

Bases: BaseReport, Generic[KeyTy, ReportTy]

Parses and categories multiple reports of the same type stored inside a zip file.

The key_func is used to divide the parsed reports into different categories/buckets.

remove()[source]
Return type:

None

property removed: bool
keys()[source]
Return type:

Collection[TypeVar(KeyTy)]

reports(key=None)[source]

Returns the list of parsed reports.

Return type:

List[TypeVar(ReportTy, bound= BaseReport)]

FILE_TYPE: str = 'zip'
SHORTHAND: str = 'Agg'
class varats.report.report.ReportAggregate(path, report_type)[source]

Bases: KeyedReportAggregate[int, ReportTy], Generic[ReportTy]

FILE_TYPE: str = 'zip'
SHORTHAND: str = 'Agg'

Handling utilities for generated report files

Module for handling revision specific files.

When analyzing a project, result files are generated for specific project revisions. This module provides functionality to manage and access these revision specific files, e.g., to get all files of a specific report that have been processed successfully.

varats.revision.revisions.is_revision_blocked(revision, project_cls)[source]

Checks if a revision is blocked on a given project.

Parameters:
  • revision (CommitHash) – the revision

  • project_cls (Type[Project]) – the project class the revision belongs to

Return type:

bool

Returns:

filtered revision list

varats.revision.revisions.filter_blocked_revisions(revisions, project_cls)[source]

Filter out all blocked revisions.

Parameters:
  • revisions (List[TypeVar(CommitHashTy, bound= CommitHash)]) – list of revisions

  • project_cls (Type[Project]) – the project class the revisions belong to

Return type:

List[TypeVar(CommitHashTy, bound= CommitHash)]

Returns:

filtered revision list

varats.revision.revisions.get_all_revisions_files(project_name, experiment_type=None, report_type=None, file_name_filter=<function <lambda>>, only_newest=True, config_id=None)[source]

Find all file paths to revision files.

Parameters:
  • project_name (str) – target project

  • file_name_filter (Callable[[str], bool]) – optional filter to exclude certain files; returns true if the file_name should not be checked

  • experiment_type (Optional[Type[VersionExperiment]]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

  • only_newest (bool) – whether to include all result files, or only the newest; if False, result files for the same revision are sorted descending by the file’s mtime

Return type:

List[ReportFilepath]

Returns:

a list of file paths to correctly processed revision files

varats.revision.revisions.get_processed_revisions_files(project_name, experiment_type=None, report_type=None, file_name_filter=<function <lambda>>, only_newest=True, config_id=None)[source]

Find all file paths to correctly processed revision files.

Parameters:
  • project_name (str) – target project

  • file_name_filter (Callable[[str], bool]) – optional filter to exclude certain files; returns true if the file_name should not be checked

  • experiment_type (Optional[Type[VersionExperiment]]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

  • only_newest (bool) – whether to include all result files, or only the newest; if False, result files for the same revision are sorted descending by the file’s mtime

Return type:

List[ReportFilepath]

Returns:

a list of file paths to correctly processed revision files

varats.revision.revisions.get_failed_revisions_files(project_name, experiment_type=None, report_type=None, file_name_filter=<function <lambda>>, only_newest=True, config_id=None)[source]

Find all file paths to failed revision files.

Parameters:
  • project_name (str) – target project

  • file_name_filter (Callable[[str], bool]) – optional filter to exclude certain files; returns True if the file_name should not be included

  • experiment_type (Optional[Type[VersionExperiment]]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

  • only_newest (bool) – whether to include all result files, or only the newest; if False, result files for the same revision are sorted descending by the file’s mtime

Return type:

List[ReportFilepath]

Returns:

a list of file paths to failed revision files

varats.revision.revisions.get_processed_revisions(project_name, experiment_type, report_type=None)[source]

Calculates a list of revisions of a project that have already been processed successfully.

Parameters:
  • project_name (str) – target project

  • experiment_type (Type[VersionExperiment]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

Return type:

List[ShortCommitHash]

Returns:

list of correctly process revisions

varats.revision.revisions.get_failed_revisions(project_name, experiment_type, report_type=None)[source]

Calculates a list of revisions of a project that have failed.

Parameters:
  • project_name (str) – target project

  • experiment_type (Type[VersionExperiment]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

Return type:

List[ShortCommitHash]

Returns:

list of failed revisions

varats.revision.revisions.get_tagged_revisions(project_cls, experiment_type, report_type=None, tag_blocked=True, revision_filter=None)[source]

Calculates a list of revisions of a project tagged with the file status. If two files exists the newest is considered for detecting the status.

Parameters:
  • project_cls (Type[Project]) – target project

  • experiment_type (Type[VersionExperiment]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

  • tag_blocked (bool) – whether to tag blocked revisions as blocked

  • revision_filter (Optional[Callable[[ReportFilepath], bool]]) – to select a specific subset of revisions

Return type:

Dict[ShortCommitHash, Dict[Optional[int], FileStatusExtension]]

Returns:

list of tuples (revision, FileStatusExtension)

varats.revision.revisions.get_tagged_revision(revision, project_name, experiment_type, report_type=None)[source]

Calculates the file status for a revision. If two files exists the newest is considered for detecting the status.

Parameters:
  • revision (ShortCommitHash) – the revision to get the status for

  • project_name (str) – target project

  • experiment_type (Type[VersionExperiment]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

Return type:

FileStatusExtension

Returns:

the status for the revision


Data management

Report data can be accessed via different Database classes. Each concrete database class offers its data in form of a pandas dataframe with a specific layout. Clients can query them for the data for a specific project or case study via the function get_data_for_project. The database class then takes care of loading and caching the relevant result files.

You can add new database classes by creating a subclass of Database in a separate module in the directory varats/data/databases.

Module: database

Module for the base Database class.

class varats.data.databases.evaluationdatabase.EvaluationDatabase[source]

Bases: ABC

Base class for accessing report data.

Subclasses have to provide the following:
  • a list of available columns in the variable COLUMNS; this list must start with Database.COLUMNS!

  • an identifier for cache files CACHE_ID

  • a function _load_dataframe() that loads and transparently caches report data

CACHE_ID: str
COLUMN_TYPES = {'revision': 'str', 'time_id': 'int32'}
COLUMNS: List[str]
classmethod get_data_for_project(project_name, columns, commit_map, *case_studies, **kwargs)[source]

Retrieve data for a given project and case study.

Parameters:
  • project_name (str) – the project to retrieve data for

  • columns (List[str]) – the columns the resulting dataframe should have; all column names must occur in the COLUMNS class variable

  • commit_map (CommitMap) – the commit map to use

  • case_studies (CaseStudy) – the case studies to retrieve data for

  • kwargs (Any) – additional arguments that are passed to _load_dataframe()

Return type:

DataFrame

Returns:

a pandas dataframe with the given columns and the


Module: cache_helper

Utility functions and class to allow easier caching of pandas dataframes and other data.

varats.data.cache_helper.get_data_file_path(data_id, project_name)[source]

Compose the identifier and project into a file path that points to the corresponding cache file in the cache directory.

Parameters:
  • data_id (str) – identifier or identifier_name of the dataframe

  • project_name (str) – name of the project

Return type:

Path

varats.data.cache_helper.load_cached_df_or_none(data_id, project_name, data_types)[source]

Load cached dataframe from disk, otherwise return None.

Parameters:
  • data_id (str) – identifier or identifier_name of the dataframe

  • project_name (str) – name of the project

  • data_types (Dict[str, str]) – dict of columns and types to pass to the dataframe loading

Return type:

Optional[DataFrame]

varats.data.cache_helper.cache_dataframe(data_id, project_name, dataframe)[source]

Cache a dataframe by persisting it to disk.

Parameters:
  • data_id (str) – identifier or identifier_name of the dataframe

  • project_name (str) – name of the project

  • dataframe (DataFrame) – pandas dataframe to store

Return type:

None

varats.data.cache_helper.build_cached_report_table(data_id, project_name, data_to_load, data_to_drop, create_empty_df, create_cache_entry_data, get_entry_id, get_entry_timestamp, is_newer_timestamp)[source]

Build up an automatically cached dataframe.

Parameters:
  • data_id (str) – graph cache identifier

  • project_name (str) – name of the project to work with

  • data_to_load (List[TypeVar(InDataTy)]) – list of data items to be loaded

  • data_to_drop (List[TypeVar(InDataTy)]) – list of data items to be discarded

  • create_empty_df (Callable[[], DataFrame]) – creates an empty layout of the dataframe

  • create_cache_entry_data (Callable[[TypeVar(InDataTy)], Tuple[DataFrame, str, str]]) – creates a dataframe from a data item

  • get_entry_id (Callable[[TypeVar(InDataTy)], str]) – returns a unique identifier for one data item

  • get_entry_timestamp (Callable[[TypeVar(InDataTy)], str]) – returns a string with information that can be used to determine which of two data items is newer

  • is_newer_timestamp (Callable[[str, str], bool]) – checks whether one data item is newer than another based on their timestamps

Return type:

DataFrame

varats.data.cache_helper.build_cached_graph(graph_id, create_graph)[source]

Create an automatically cached networkx graph.

Parameters:
  • graph_id (str) – graph cache identifier

  • create_graph (Callable[[], TypeVar(GraphTy, bound= Graph)]) – function that creates the graph

Return type:

TypeVar(GraphTy, bound= Graph)

Returns:

the cached or created graph


Module: data_manager

The DataManager module handles the loading, creation, and caching of data classes.

With the DataManager in the background, we can load files from multiple locations within the tool suite, without loading the same file twice. In addition, this speeds up reloading of files, for example, in interactive plots, like in jupyter notebooks, where we sometimes re-execute triggers a file load.

varats.data.data_manager.sha256_checksum(file_path, block_size=65536)[source]

Compute sha256 checksum of file.

Parameters:
  • file_path (Path) – path to the file

  • block_size (int) – amount of bytes read per cycle

Return type:

str

Returns:

sha256 hash of the file

class varats.data.data_manager.FileBlob(key, file_path, data)[source]

Bases: Generic[LoadableTy]

A FileBlob is a keyed data blob for everything that is loadable from a file and can be converted to a VaRA DataClass.

Parameters:
  • key (str) – identifier for the file

  • file_path (Path) – path to the file

  • data (TypeVar(LoadableTy, bound= BaseReport)) – a blob of data in memory

property key: str

The key used as an index to the blob.

property file_path: Path

File path to the loaded file.

property data: LoadableTy

The loaded DataClass from the file.

class varats.data.data_manager.FileSignal[source]

Bases: QObject

Emit signals after the file was loaded.

finished

int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

Type:

pyqtSignal(*types, name

Type:

str = …, revision

clean

int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

Type:

pyqtSignal(*types, name

Type:

str = …, revision

class varats.data.data_manager.FileLoader(func, file_path, class_type)[source]

Bases: QRunnable

Manages concurrent file loading in the background of the application.

run()[source]

Run the file loading method.

Return type:

None

class varats.data.data_manager.DataManager[source]

Bases: object

Manages data over the lifetime of the tool suite.

The DataManager handles the concurrent file loading, creation of DataClasses and caching of loaded files.

file_map: Dict[str, FileBlob[Any]]
load_data_class(file_path, DataClassTy, loaded_callback)[source]

Load a DataClass of type <DataClassTy> from a file asynchronosly.

Parameters:
  • file_path (TypeVar(PathLikeTy, Path, ReportFilepath)) – to the file

  • DataClassTy (Type[TypeVar(LoadableTy, bound= BaseReport)]) – type of the report class to be loaded

  • loaded_callback (Callable[[TypeVar(LoadableTy, bound= BaseReport)], None]) – that gets called after loading has finished

Return type:

None

load_data_class_sync(file_path, DataClassTy)[source]

Load a DataClass of type <DataClassTy> from a file synchronosly.

Parameters:
  • file_path (TypeVar(PathLikeTy, Path, ReportFilepath)) – to the file

  • DataClassTy (Type[TypeVar(LoadableTy, bound= BaseReport)]) – type of the report class to be loaded

Return type:

TypeVar(LoadableTy, bound= BaseReport)

Returns:

the loaded report file

clean_cache()[source]
Return type:

None

varats.data.data_manager.load_multiple_reports(file_paths, report_type)[source]
Parameters:
  • file_paths (List[Path]) – list of files to load

  • report_type (Type[BaseReport]) – type of the report class to be loaded

Return type:

List[Any]

Returns: a list of loaded reports


Module: version_header

This module provides a reusable version header for all yaml reports generated by VaRA.

The version header specifies the type of the following yaml file and the version.

exception varats.base.version_header.WrongYamlFileType(expected_type, actual_type)[source]

Bases: Exception

Exception raised for miss matches of the file type.

exception varats.base.version_header.WrongYamlFileVersion(expected_version, actual_version)[source]

Bases: Exception

Exception raised for miss matches of the file version.

exception varats.base.version_header.NoVersionHeader[source]

Bases: Exception

Exception raised for wrong yaml documents.

class varats.base.version_header.VersionHeader(yaml_doc)[source]

Bases: object

VersionHeader describing the type and version of the following yaml file.

classmethod from_yaml_doc(yaml_doc)[source]

Creates a VersionHeader object from a yaml dict.

Parameters:

yaml_doc (Dict[str, Any]) – version header yaml document

Return type:

VersionHeader

classmethod from_version_number(doc_type, version)[source]

Creates a new VersionHeader object from a doc_type string and a version number.

Parameters:
  • doc_type (str) – type of the document that should follow the version header

  • version (int) – the current version number

Return type:

VersionHeader

property doc_type: str

Type of the following yaml file.

is_type(type_name)[source]

Checks if the type of the following yaml file is type_name.

Parameters:

type_name (str) – of the possible following yaml document

Return type:

bool

raise_if_not_type(type_name)[source]

Checks if the type of the following yaml file is type_name, otherwise, raises an exception.

Parameters:

type_name (str) – of the possible following yaml document

Return type:

None

property version: int

Document version number.

raise_if_version_is_less_than(version_bound)[source]

Checks if the current version is equal or bigger that version_bound, otherwise, raises an exception.

Parameters:

version_bound (int) – minimal version that is expected

Return type:

None

get_dict()[source]

Returns the version header as a dict.

Return type:

Dict[str, Union[str, int]]


Data providers

Providers are a means to supply additional data for a project. For example, the CVE Provider allows access to all CVEs that are related to a project.

You can implement your own provider by creating a subclass of Provider in its own subdirectory of provider in varats-core. There is no restriction on the format in which data has to be provided. The Provider abstract class only requires you to specify how to create an instance of your provider for a specific project, as well as a fallback implementation (that most likely returns no data). If your provider needs some project-specific implementation, create a class with the name <your_provider_class>Hook and make the projects inherit from it, similar to the CVEProviderHook. If a project does not inherit from that hook, your provider’s create_provider_for_project() should return None. In that case, the provider factory method falls back to your default provider implementation and issues a warning. For an example provider implementation take a look at the CVE Provider.

List of supported providers

Provider module

Provider interface module for projects.

Providers are a means to supply additional data for a project.

class varats.provider.provider.Provider(project)[source]

Bases: ABC

A provider allows access to additional information about a project, e.g., which revisions of a project are releases, or which CVE’s are related to a project.

Parameters:

project (Type[Project]) – the project this provider is associated with

property project: Type[Project]

The project this provider is associated with.

abstractmethod classmethod create_provider_for_project(project)[source]

Creates a provider instance for the given project if possible.

Return type:

Optional[TypeVar(ProviderType, bound= Provider)]

Returns:

a provider instance for the given project if possible, otherwise, None

abstractmethod classmethod create_default_provider(project)[source]

Creates a default provider instance that can be used with any project.

Return type:

TypeVar(ProviderType, bound= Provider)

Returns:

a default provider instance

classmethod get_provider_for_project(project)[source]

Factory function for creating providers.

This function is guaranteed to return a valid instance of the requested provider by falling back to a default provider if necessary. A warning is issued in the latter case.

Parameters:

project (Type[Project]) – the project to create the provider for

Return type:

TypeVar(ProviderType, bound= Provider)

Returns:

an instance of this provider

Metrics

During data evaluation, one might wish to calculate different metrics for the data at hand. We collect the code for such metrics in a separate module to make these metrics reusable, e.g., in different plots.

Metrics module

This module contains functions that calculate various metrics on data.

varats.data.metrics.lorenz_curve(data)[source]

Calculates the values for the lorenz curve of the data.

For more information see online lorenz curve.

Parameters:

data (Series) – sorted series to calculate the lorenz curve for

Return type:

Series

Returns:

the values of the lorenz curve as a series

varats.data.metrics.gini_coefficient(distribution)[source]

Calculates the Gini coefficient of the data.

For more information see online Gini coefficient.

Parameters:

distribution (Series) – sorted series to calculate the Gini coefficient for

Return type:

float

Returns:

the Gini coefficient for the data

varats.data.metrics.normalized_gini_coefficient(distribution)[source]

Calculates the normalized Gini coefficient of the given data, , i.e.,

gini(data) * (n / n - 1) where n is the length of the data.

Parameters:

distribution (Series) – sorted series to calculate the normalized Gini coefficient for

Return type:

float

Returns:

the normalized Gini coefficient for the data

varats.data.metrics.apply_tukeys_fence(data, column, k)[source]

Removes rows which are outliers in the given column using Tukey’s fence.

Tukey’s fence defines all values to be outliers that are outside the range [q1 - k * (q3 - q1), q3 + k * (q3 - q1)], i.e., values that are further than k times the interquartile range away from the first or third quartile.

Common values for k:

2.2

(“Fine-Tuning Some Resistant Rules for Outlier Labeling”,

Hoaglin and Iglewicz (1987))

1.5

(outliers, “Exploratory Data Analysis”, John W. Tukey (1977))

3.0

(far out outliers, “Exploratory Data Analysis”,

John W. Tukey (1977))

Parameters:
  • data (DataFrame) – data to remove outliers from

  • column (str) – column to use for outlier detection

  • k (float) – multiplicative factor on the inter-quartile-range

Return type:

DataFrame

Returns:

the data without outliers

Test: >>> apply_tukeys_fence(pd.DataFrame({‘foo’: [1,1,2,2,10]}) … .rename_axis(‘cols’, axis=1), ‘foo’, 3) cols foo 0 1 1 1 2 2 3 2

varats.data.metrics.min_max_normalize(values)[source]

Min-Max normalize a series.

Parameters:

values (Series) – the series to normalize

Return type:

Series

Returns:

the normalized series

Test: >>> min_max_normalize(pd.Series([1,2,3])) 0 0.0 1 0.5 2 1.0 dtype: float64

class varats.data.metrics.ConfusionMatrix(actual_positive_values, actual_negative_values, predicted_positive_values, predicted_negative_values)[source]

Bases: Generic[T]

Helper class to automatically calculate classification results.

Predicted Positive (PP)

Predicted Negative (PN)

Actual Positive (P) Actual Negative (N)

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

Reference: https://en.wikipedia.org/wiki/Precision_and_recall

property P: int
property N: int
property PP: int
property PN: int
property TP: int
property TN: int
property FP: int
property FN: int
getTPs()[source]
Return type:

Set[TypeVar(T)]

getTNs()[source]
Return type:

Set[TypeVar(T)]

getFPs()[source]
Return type:

Set[TypeVar(T)]

getFNs()[source]
Return type:

Set[TypeVar(T)]

precision()[source]

Positive predictive value (PPV)

Return type:

float

recall()[source]

True positive rate (TPR)

Return type:

float

specificity()[source]

True negative rate (TNR)

Return type:

float

accuracy()[source]

Accuracy (ACC)

Return type:

float

balanced_accuracy()[source]

Balanced accuracy (BA)/(bACC)

Balanced accuracy can serve as an overall performance metric for a model, whether or not the true labels are imbalanced in the data, assuming the cost of FN is the same as FP.

Return type:

float

f1_score()[source]

In statistical analysis of binary classification, the F-score or F-measure is a measure of a test’s accuracy.

Return type:

float