Paper and case study handling

Whenever we design experiments that look at specific revisions of a project, we run into the problem that to re-evaluate our experiment data, we need to preserve the set of revisions. Exactly for this problem, the tool suite provides case studies that preserve the information about which revision of a project was analyzed. In addition, to fully preserve also the set of projects that were analyzed, we designed paper configs as a collection of different case studies. Furthermore, we can use paper configs and case studies not only to re-evaluate our own experiment, but we also allow others to reproduce our data or design their own experiment based on our project and revision selection.

How to use case studies

If one wants to analyze a particular set of revisions or wants to re-evaluate the same revision over and over again, we can fixate the revisions we are interested in by creating a CaseStudy. First, create a folder, where your config should be saved. Then, create a case study that specifies the revision to be analyzed. To ease the creation of case studies the tool suite offers different selection strategies to choose revisions from the projects history, e.g., based on a probability distribution.

For example, to get the latest revision of a project use:

vara-cs gen -p PROJECT_NAME select_latest

As another example, we can generate a new case study for the project gzip, drawing 10 revision from the projects history based on a half-normal distribution, with:

vara-cs gen -p gzip select_sample HalfNormalSamplingMethod --num-rev 10

Multiple case studies, e.g., a set of case studies used for a paper, can be grouped into a paper config so that they can be managed together more easily. For more information see How to use paper configs.

Extending an existing case study is easy, just select more revisions and they will be added automatically. Should you wish to drop the old revisions, just pass –override, this will remove the old ones and afterwards add the newly selected revisions.

Warning

The specified distribution only relates to the newly added revisions but does not include revisions previously added. If one wants to draw all revision according to the same distribution the old case study needs to be overwritten.

How to use paper configs

Paper configs are used to group different case studies together. Take, for example, the case where one wants to analyze the projects, gzip and git, for the evaluation of a paper that get’s submitted to ase-17. First, one creates different case studies for each project selecting the different revisions that should be analyzed. Second, all case studies related to the evaluation for ase-17 are grouped into a folder – the paper config – to relate them to the paper. Now we can design and run our experiment for ase-17 on all revisions added through case studies in the paper config and generate our experiment results.

The paper config now allows us to reproduce all the results for our paper with a single call to the tool suite. Furthermore, this is also helpful for other researchers that are now able to reproduce our results.

In more detail, our specified paper config allows the tool suite to tell BenchBuild which revisions should be analyzed to evaluate a set of case studies. For example, a setup could look like this:

paper_configs
    ├── ase-17
    │       ├── gzip_0.case_study
    │       ├── gzip_1.case_study
    │       └── git_0.case_study
    └── icse-18
            ├── gzip_0.case_study
            └── git_0.case_study

In this example, we got two paper configs, one for ase-17 another for icse-18. We see different case studies for gzip and git, notice here that we can create multiple case studies for one project. If we now want to evaluate our set for icse-18 we set the paper-config folder to the root of our config tree and select the icse-18 folder as our current config in the settings file .vara.yaml, like this:

paper_config:
current_config:
    value: icse-18
folder:
    value: /home/foo/vara/paper_configs/

Next, we can run our experiment with BenchBuild as usual. During experiment execution, BenchBuild will load our config and only evaluate the needed revisions.

The current status of all case studies belonging to the current paper config, can be visualized with vara-cs status:

>>> vara-cs status -s $EXPERIMENT_NAME
CS: gzip_0: (0/5) processed
CS: gzip_1: (2/5) processed
CS: gzip_2: (5/5) processed
CS: libvpx_0: (0/5) processed

The tool vara-pc provides a simple command line interface for creating and managing paper configs.

Artefacts

The artefacts module provides an easy way to attach descriptions of artefacts, like plots or result tables, to a paper config. This way, reproducing the exact same plots for a paper config over and over again becomes as easy as invoking a single command.

For more information about how to create and manage artefacts, refer to the documentation of the vara-art tool.

Paper and case study modules

Module: paper_config

The PaperConfig pins down a specific set of case studies, one or more for each project, where each encapsulates a fixed set of revision to evaluate.

This allows users to specify which revisions of what project have to be analyzed. Furthermore, it allows other users to reproduce the exact same set of projects and revisions, either with the old experiment automatically or with a new experiment to compare the results.

class varats.paper.paper_config.PaperConfig(folder_path)[source]

Bases: object

Paper config, a specific set of case studies, e.g., for a publication.

The paper config allows easy reevaluation of a set of case studies.

Parameters:

folder_path (Path) – path to the paper config folder

property path: Path

Path to the paper config folder.

get_case_studies(cs_name)[source]

Lookup all case studies with a given name.

Parameters:

cs_name (str) – name of the case study

Return type:

List[CaseStudy]

Returns:

case studies with project name cs_name.

get_all_case_studies()[source]

Generate a list of all case studies in the paper config.

Return type:

List[CaseStudy]

Returns:

full list of all case studies with all different version.

has_case_study(cs_name)[source]

Checks if a case study with cs_name was loaded.

Parameters:

cs_name (str) – name of the case study

Return type:

bool

Returns:

True, if a case study with cs_name was loaded

get_filter_for_case_study(cs_name)[source]

Return a case study specific revision filter. If one case study includes a revision the filter function will return True. This can be used to automatically filter out revisions that are not part of a case study, loaded by this paper config.

Parameters:

cs_name (str) – name of the case study

Return type:

Callable[[str], bool]

Returns:

a filter function that checks if a given revision is part of a case study with name cs_name and returns True if it was

add_case_study(case_study)[source]

Add a new case study to this paper config.

Parameters:

case_study (CaseStudy) – to be added

Return type:

None

store()[source]

Persist the current state of the paper config saving all case studies to their corresponding files in the paper config path.

Return type:

None

varats.paper.paper_config.project_filter_generator(project_name)[source]

Generate project specific revision filters.

  • if no paper config is loaded, we allow all revisions

  • otherwise the paper config generates a specific revision filter

Parameters:

project_name (str) – corresponding project name

Return type:

Callable[[str], bool]

Returns:

a filter function that returns True if a revision of the specified project is included in one of the related case studies.

varats.paper.paper_config.get_loaded_paper_config()[source]

Returns the currently active paper config, this requires a config to be loaded before use.

Return type:

PaperConfig

Returns:

currently active paper config

varats.paper.paper_config.is_paper_config_loaded()[source]

Check if a currently a paper config is loaded.

Return type:

bool

Returns:

True, if a paper config has been loaded

varats.paper.paper_config.load_paper_config(config_path=None)[source]

Loads a paper config from a yaml file, initializes the paper config and sets it to the currently active paper config. If no config path is provided, the paper config set in the vara settings yaml is loaded.

Note

Only one paper config can be active at a time

Parameters:

config_path (Optional[Path]) – path to a paper config folder

Return type:

None

varats.paper.paper_config.get_paper_config()[source]

Returns the current paper config and loads one if there is currently no active paper config.

Return type:

PaperConfig

Returns:

currently active paper config

class varats.paper.paper_config.PaperConfigSpecificGit(project_name, *args, **kwargs)[source]

Bases: Git

Paper config specific git to reduce the available versions.

The paper-config git filters out all revisions that are not specified in one of the case studies.

versions()[source]

List all available versions of this source.

Returns:

The list of all available versions.

Return type:

List[str]


Module: case_study

A case study is used to pin down the exact set of revisions that should be analysed for a project.

class varats.paper.case_study.CSEntry(commit_hash, commit_id, config_ids=None)[source]

Bases: object

Combining a commit hash with a unique and ordered id, starting with 0 for the first commit in the repository.

property commit_hash: FullCommitHash

A commit hash from the git repository.

property commit_id: int

The order ID of the commit hash.

property config_ids: List[int]

The order ID of the configuration.

get_dict()[source]

Get a dict representation of this commit and id.

Return type:

Dict[str, Union[str, int, List[int]]]

class varats.paper.case_study.CSStage(name=None, sampling_method=None, release_type=None, revisions=None)[source]

Bases: object

A stage in a case-study, i.e., a collection of revisions.

Stages are used to separate revisions into groups.

property revisions: List[FullCommitHash]

Project revisions that are part of this case study.

property name: str | None

Name of the stage.

property sampling_method: SamplingMethodBase[Any] | None

The sampling method used for this stage.

property release_type: ReleaseType | None

The sampling method used for this stage.

has_revision(revision)[source]

Check if a revision is part of this case study.

Parameters:

revision (CommitHash) – project revision to check

Return type:

bool

Returns:

True, in case the revision is part of the case study, False otherwise.

add_revision(revision, commit_id, config_ids=None)[source]

Add a new revision to this stage.

Parameters:
  • revision (FullCommitHash) – to add

  • commit_id (int) – unique ID for ordering of commits

  • config_ids (Optional[List[int]]) – list of configuration IDs

Return type:

None

get_config_ids_for_revision(revision)[source]

Returns a list of all configuration IDs specified for this revision.

Parameters:

revision (CommitHash) – i.e., a commit hash registered in this CSStage

Return type:

List[int]

Returns: list of config IDs

sort(reverse=True)[source]

Sort the revisions of the case study by commit ID inplace.

Return type:

None

get_dict()[source]

Get a dict representation of this stage.

Return type:

Dict[str, Union[str, List[Dict[str, Union[str, int, List[int]]]]]]

class varats.paper.case_study.CaseStudy(project_name, version, stages=None)[source]

Bases: object

A case study persists a set of revisions of a project to allow easy reevaluation.

Stored values:
  • name of the related benchbuild.project

  • a set of revisions

property project_name: str

Name of the related project.

!! This name must match the name of the BB project !!

property project_cls: Type[Project]

Look up the BenchBuild project for this case study.

Returns:

project class

property version: int

Version ID for this case study.

The version differentiates case studies of the same project.

property revisions: List[FullCommitHash]

Project revisions that are part of this case study.

property stages: List[CSStage]

Get a list with all stages.

property num_stages: int

Get nummer of stages.

get_stage_by_name(stage_name)[source]

Get a stage by its name. Since multiple stages can have the same name, the first matching stage is returned.

Parameters:

stage_name (str) – name of the stage to lookup

Return type:

Optional[CSStage]

Returns:

the stage, corresponding with the ‘stage_name’, or None

get_stage_index_by_name(stage_name)[source]

Get a stage’s index by its name. Since multiple stages can have the same name, the first matching stage is returned.

Parameters:

stage_name (str) – name of the stage to lookup

Return type:

Optional[int]

Returns:

the stage index, corresponding with the ‘stage_name’, or None

has_revision(revision)[source]

Check if a revision is part of this case study.

Return type:

bool

Returns:

True, if the revision was found in one of the stages, False otherwise

has_revision_in_stage(revision, num_stage)[source]

Checks if a revision is in a specific stage.

Return type:

bool

Returns:

True, if the revision was found in the specified stage, False otherwise

has_revision_configs_specified(revision)[source]

Checks whether a revision specifies different configurations.

Parameters:

revision (CommitHash) – i.e., a commit hash registed in this case study

Return type:

bool

Returns: True, if configurations have been specified for this revision

get_config_ids_for_revision(revision)[source]

Returns a list of all configuration IDs specified for this revision.

Parameters:

revision (CommitHash) – i.e., a commit hash registed in this case study

Return type:

List[int]

Returns: list of config IDs

get_config_ids_for_revision_in_stage(revision, num_stage)[source]

Returns a list of all configuration IDs specified for this revision.

Parameters:
  • revision (CommitHash) – i.e., a commit hash registed in this case study

  • num_stage (int) – number of the stage to search in

Return type:

List[int]

Returns: list of config IDs

shift_stage(from_index, offset)[source]

Shift a stage in the case-studies’ stage list by an offset. Beware that shifts to the left (offset<0) will destroy stages.

Parameters:
  • from_index (int) – index of the first stage to shift

  • offset (int) – amount to stages should be shifted

Return type:

None

insert_empty_stage(pos)[source]

Insert a new stage at the given index, shifting the list elements to the right. The newly created stage is returned.

Parameters:

pos (int) – index position to insert an empty stage

Return type:

CSStage

include_revision(revision, commit_id, stage_num=0, sort_revs=True)[source]

Add a revision to this case study.

Parameters:
  • revision (FullCommitHash) – to add

  • commit_id (int) – unique ID for ordering of commits

  • stage_num (int) – index number of the stage to add the revision to

  • sort_revs (bool) – if True, the modified stage will be sorted afterwards

Return type:

None

include_revisions(revisions, stage_num=0, sort_revs=True)[source]

Add multiple revisions to this case study.

Parameters:
  • revisions (List[Tuple[FullCommitHash, int]]) – List of tuples with (commit_hash, id) to be inserted

  • stage_num (int) – The stage to insert the revisions

  • sort_revs (bool) – True if the stage should be kept sorted

Return type:

None

name_stage(stage_num, name)[source]

Names an already existing stage.

Parameters:
  • stage_num (int) – The number of the stage to name

  • name (str) – The new name of the stage

Return type:

None

get_revision_filter()[source]

Generate a case study specific revision filter that only allows revision that are part of the case study.

Return type:

Callable[[CommitHash], bool]

Returns:

a callable filter function

get_dict()[source]

Get a dict representation of this case study.

Return type:

Dict[str, Union[str, int, List[Dict[str, Union[str, List[Dict[str, Union[str, int, List[int]]]]]]]]]

varats.paper.case_study.load_case_study_from_file(file_path)[source]

Load a case study from a file.

Parameters:

file_path (Path) – path to the case study file

Return type:

CaseStudy

varats.paper.case_study.load_configuration_map_from_case_study_file(file_path, concrete_config_type)[source]

Load a configuration map from a case-study file.

Parameters:
  • file_path (Path) – to the configuration map file

  • concrete_config_type (Type[Configuration]) – type of the configuration objects that should be created

Return type:

ConfigurationMap

Returns: a new ConfigurationMap based on the parsed file

varats.paper.case_study.store_case_study(case_study, case_study_location)[source]

Store case study to file in the specified paper_config.

Parameters:
  • case_study (CaseStudy) – the case study to store

  • case_study_location (Path) – can be either a path to a paper_config or a direct path to a .case_study file

Return type:

None


Module: artefacts

This module allows to attach artefact definitions to a.

paper config. This way, the artefacts, like plots or result tables, can be generated from result files automatically.

Typically, a paper config has a file artefacts.yaml that manages artefact definitions.

class varats.paper_mgmt.artefacts.ArtefactFileInfo(file_name, case_study=None)[source]

Bases: object

Class containing metadata about a file generated by an artefact.

property file_name: str

The name of the generated file.

property case_study: CaseStudy | None

The used case study if available.

class varats.paper_mgmt.artefacts.Artefact(name, output_dir)[source]

Bases: ABC

An Artefact contains all information that is necessary to generate a certain artefact. Subclasses of this class specify concrete artefact types, like plots, that require additional attributes.

Parameters:
  • name (str) – name of this artefact

  • output_dir (Path) – output dir relative to config value ‘artefacts/artefacts_dir’

ARTEFACT_TYPE = 'Artefact'
ARTEFACT_TYPE_VERSION = 0
ARTEFACT_TYPES: Dict[str, Type[Artefact]] = {'plot': <class 'varats.plot.plots.PlotArtefact'>, 'table': <class 'varats.table.tables.TableArtefact'>}
static base_output_dir()[source]

Base output dir for artefacts.

Return type:

Path

property name: str

The name of this artefact.

This uniquely identifies an artefact in an Artefacts collection.

property output_dir: Path

Absolute path to the artefact’s output directory.

get_dict()[source]

Construct a dict from this artefact for easy export to yaml.

Subclasses should first call this function on super() and then extend the returned dict with their own properties.

Return type:

Dict[str, Any]

Returns:

A dict representation of this artefact.

abstract static create_artefact(name, output_dir, **kwargs)[source]

Instantiate an artefact from its dict representation.

Parameters:
  • name (str) – name of this artefact

  • output_dir (Path) – output dir relative to config value ‘artefacts/artefacts_dir’

  • **kwargs (Any) – artefact-specific arguments

Return type:

Artefact

Returns:

an instantiated artefact

abstract generate_artefact(progress=None)[source]

Generate the specified artefact.

Return type:

None

abstract get_artefact_file_infos()[source]

Retrieve information about files generated by this artefact.

Return type:

List[ArtefactFileInfo]

Returns:

a list of file info objects

class varats.paper_mgmt.artefacts.Artefacts(file_path, artefacts)[source]

Bases: object

A collection of Artefacts.

property artefacts: Iterable[Artefact]

An iterator of the Artefacts in this collection.

get_artefact(name)[source]

Lookup an artefact by its name.

Parameters:

name (str) – the name of the artefact to retrieve

Return type:

Optional[Artefact]

Returns:

the artefact with the name name if available, else None

add_artefact(artefact)[source]

Add an Artefact to this collection.

If there already exists an artefact with the same name it is overridden.

Parameters:

artefact (Artefact) – the artefact to add

Return type:

None

store()[source]

Store artefacts in their artefacts file.

Return type:

None

get_dict()[source]

Construct a dict from these artefacts for easy export to yaml.

Return type:

Dict[str, List[Dict[str, Union[str, int]]]]

varats.paper_mgmt.artefacts.load_artefacts(paper_config)[source]

Load the artefacts for a paper config.

Parameters:

paper_config (PaperConfig) – the paper config to load the artefacts for

Return type:

Artefacts

Returns:

the artefacts object for the given paper config

varats.paper_mgmt.artefacts.load_artefacts_from_file(file_path)[source]

Load an artefacts file.

Parameters:

file_path (Path) – path to the artefacts file

Return type:

Artefacts

Returns:

the artefacts created from the given file

varats.paper_mgmt.artefacts.initialize_artefact_types()[source]

Import plots and tables module to register artefact types.

Return type:

None


Module: paper_config_manager

Module for interacting and managing paper configs and case studies, e.g., this module provides functionality to visualize the status of case studies or to package a whole paper config into a zip folder.

varats.paper_mgmt.paper_config_manager.show_status_of_case_studies(experiment_type, filter_regex, short_status, sort, print_rev_list, sep_stages, print_legend)[source]

Prints the status of all matching case studies to the console.

Parameters:
  • experiment_type (Type[VersionExperiment]) – experiment type whose files will be considered

  • filter_regex (str) – applied to a name_version string for filtering the amount of case studies to be shown

  • short_status (bool) – print only a short version of the status information

  • sort (bool) – sort the output order of the case studies

  • print_rev_list (bool) – print a list of revisions for every case study

  • sep_stages (bool) – print each stage separated

  • print_legend (bool) – print a legend for the different types

Return type:

None

varats.paper_mgmt.paper_config_manager.get_revision_list(case_study)[source]

Returns a string with a list of revsion from the case-study, group by case- study stages.

Parameters:

case_study (CaseStudy) – to print revisions for

Return type:

str

Returns:

formated string that lists all revisions

varats.paper_mgmt.paper_config_manager.get_result_files(project_name, experiment_type, report_type, commit_hash, only_newest)[source]

Returns a list of result files that (partially) match the given commit hash.

Parameters:
  • project_name (str) – target project

  • experiment_type (Type[VersionExperiment]) – the experiment type that created the result files

  • report_type (Optional[Type[BaseReport]]) – the report type of the result files; defaults to experiment’s main report

  • commit_hash (ShortCommitHash) – the commit hash to search result files for

  • only_newest (bool) – whether to include all result files, or only the newest; if False, result files for the same revision are sorted descending by the file’s mtime

Return type:

List[ReportFilepath]

Returns:

a list of matching result file paths; result files for the same revision are sorted descending by the file’s mtime

varats.paper_mgmt.paper_config_manager.get_occurrences(status_occurrences, use_color=False)[source]

Returns a string with all status occurrences of a case study.

Parameters:
  • status_occurrences (DefaultDict[FileStatusExtension, Set[ShortCommitHash]]) – mapping from all occurred status to a set of revisions

  • use_color (bool) – add color escape sequences for highlighting

Return type:

str

Returns:

a string with all status occurrences of a case study

varats.paper_mgmt.paper_config_manager.get_total_status(total_status_occurrences, longest_cs_name, use_color=False)[source]

Returns a status string showing the total amount of occurrences.

Parameters:
  • total_status_occurrences (DefaultDict[FileStatusExtension, Set[ShortCommitHash]]) – mapping from all occured status to a set of all revisions (total amount of revisions)

  • longest_cs_name (int) – amount of chars that should be considered for

  • use_color (bool) – add color escape sequences for highlighting

Return type:

str

Returns:

a string with all status occurrences of all case studies

varats.paper_mgmt.paper_config_manager.get_short_status(case_study, experiment_type, longest_cs_name, use_color=False, total_status_occurrences=None)[source]

Return a short string representation that describes the current status of the case study.

Parameters:
  • case_study (CaseStudy) – to print

  • experiment_type (Type[VersionExperiment]) – experiment type to print files for

  • longest_cs_name (int) – amount of chars that should be considered for offsetting to allow case study name alignment

  • use_color (bool) – add color escape sequences for highlighting

  • total_status_occurrences (Optional[DefaultDict[FileStatusExtension, Set[ShortCommitHash]]]) – mapping from all occured status to a set of all revisions (total amount of revisions)

Return type:

str

Returns:

a short string representation of a case study

varats.paper_mgmt.paper_config_manager.get_status(case_study, experiment_type, longest_cs_name, sep_stages, sort, use_color=False, total_status_occurrences=None)[source]

Return a string representation that describes the current status of the case study.

Parameters:
  • case_study (CaseStudy) – to print the status for

  • experiment_type (Type[VersionExperiment]) – experiment type to print files for

  • longest_cs_name (int) – amount of chars that should be considered for

  • sep_stages (bool) – print each stage separated

  • sort (bool) – sort the output order of the case studies

  • use_color (bool) – add color escape sequences for highlighting

  • total_status_occurrences (Optional[DefaultDict[FileStatusExtension, Set[ShortCommitHash]]]) – mapping from all occurred status to a set of all revisions (total amount of revisions)

Return type:

str

Returns:

a full string representation of all case studies

varats.paper_mgmt.paper_config_manager.get_legend(use_color=False)[source]

Builds up a complete legend that explains all status numbers and their colors.

Parameters:

use_color (bool) – add color escape sequences for highlighting

Return type:

str

Returns:

a legend to explain different status

varats.paper_mgmt.paper_config_manager.package_paper_config(output_file, cs_filter_regex, experiment_types)[source]

Package all files from a paper config into a zip folder.

Parameters:
  • output_file (Path) – file to write to

  • cs_filter_regex (Pattern[str]) – applied to a name_version string for filtering the case studies to be included in the zip archive

  • experiment_types (List[Type[VersionExperiment]]) – list of report names that should be added

Return type:

None