Paper and case study handling¶
Whenever we design experiments that look at specific revisions of a project, we run into the problem that to re-evaluate our experiment data, we need to preserve the set of revisions. Exactly for this problem, the tool suite provides case studies that preserve the information about which revision of a project was analyzed. In addition, to fully preserve also the set of projects that were analyzed, we designed paper configs as a collection of different case studies. Furthermore, we can use paper configs and case studies not only to re-evaluate our own experiment, but we also allow others to reproduce our data or design their own experiment based on our project and revision selection.
How to use case studies¶
If one wants to analyze a particular set of revisions or wants to re-evaluate the same revision over and over again, we can fixate the revisions we are interested in by creating a CaseStudy
.
First, create a folder, where your config should be saved.
Then, create a case study that specifies the revision to be analyzed.
To ease the creation of case studies the tool suite offers different selection strategies to choose revisions from the projects history, e.g., based on a probability distribution.
For example, to get the latest revision of a project use:
vara-cs gen -p PROJECT_NAME select_latest
As another example, we can generate a new case study for the project gzip
, drawing 10 revision from the projects history based on a half-normal distribution, with:
vara-cs gen -p gzip select_sample HalfNormalSamplingMethod --num-rev 10
Multiple case studies, e.g., a set of case studies used for a paper, can be grouped into a paper config so that they can be managed together more easily. For more information see How to use paper configs.
Extending an existing case study is easy, just select more revisions and they will be added automatically. Should you wish to drop the old revisions, just pass –override, this will remove the old ones and afterwards add the newly selected revisions.
Warning
The specified distribution only relates to the newly added revisions but does not include revisions previously added. If one wants to draw all revision according to the same distribution the old case study needs to be overwritten.
How to use paper configs¶
Paper configs are used to group different case studies together. Take, for example, the case where one wants to analyze the projects, gzip and git, for the evaluation of a paper that get’s submitted to ase-17. First, one creates different case studies for each project selecting the different revisions that should be analyzed. Second, all case studies related to the evaluation for ase-17 are grouped into a folder – the paper config – to relate them to the paper. Now we can design and run our experiment for ase-17 on all revisions added through case studies in the paper config and generate our experiment results.
The paper config now allows us to reproduce all the results for our paper with a single call to the tool suite. Furthermore, this is also helpful for other researchers that are now able to reproduce our results.
In more detail, our specified paper config allows the tool suite to tell BenchBuild which revisions should be analyzed to evaluate a set of case studies. For example, a setup could look like this:
paper_configs
├── ase-17
│ ├── gzip_0.case_study
│ ├── gzip_1.case_study
│ └── git_0.case_study
└── icse-18
├── gzip_0.case_study
└── git_0.case_study
In this example, we got two paper configs, one for ase-17
another for icse-18
.
We see different case studies for gzip
and git
, notice here that we can create multiple case studies for one project.
If we now want to evaluate our set for icse-18
we set the paper-config folder to the root of our config tree and select the icse-18
folder as our current config in the settings file .vara.yaml
, like this:
paper_config:
current_config:
value: icse-18
folder:
value: /home/foo/vara/paper_configs/
Next, we can run our experiment with BenchBuild as usual. During experiment execution, BenchBuild will load our config and only evaluate the needed revisions.
The current status of all case studies belonging to the current paper config, can be visualized with vara-cs status:
>>> vara-cs status -s $EXPERIMENT_NAME
CS: gzip_0: (0/5) processed
CS: gzip_1: (2/5) processed
CS: gzip_2: (5/5) processed
CS: libvpx_0: (0/5) processed
The tool vara-pc provides a simple command line interface for creating and managing paper configs.
Artefacts¶
The artefacts module provides an easy way to attach descriptions of artefacts, like plots or result tables, to a paper config. This way, reproducing the exact same plots for a paper config over and over again becomes as easy as invoking a single command.
For more information about how to create and manage artefacts, refer to the documentation of the vara-art tool.
Paper and case study modules¶
Module: paper_config¶
The PaperConfig pins down a specific set of case studies, one or more for each project, where each encapsulates a fixed set of revision to evaluate.
This allows users to specify which revisions of what project have to be analyzed. Furthermore, it allows other users to reproduce the exact same set of projects and revisions, either with the old experiment automatically or with a new experiment to compare the results.
- class varats.paper.paper_config.PaperConfig(folder_path)[source]¶
Bases:
object
Paper config, a specific set of case studies, e.g., for a publication.
The paper config allows easy reevaluation of a set of case studies.
- Parameters:
folder_path (
Path
) – path to the paper config folder
- property path: Path¶
Path to the paper config folder.
- get_case_studies(cs_name)[source]¶
Lookup all case studies with a given name.
- Parameters:
cs_name (
str
) – name of the case study- Return type:
List
[CaseStudy
]- Returns:
case studies with project name cs_name.
- get_all_case_studies()[source]¶
Generate a list of all case studies in the paper config.
- Return type:
List
[CaseStudy
]- Returns:
full list of all case studies with all different version.
- has_case_study(cs_name)[source]¶
Checks if a case study with cs_name was loaded.
- Parameters:
cs_name (
str
) – name of the case study- Return type:
bool
- Returns:
True
, if a case study withcs_name
was loaded
- get_filter_for_case_study(cs_name)[source]¶
Return a case study specific revision filter. If one case study includes a revision the filter function will return
True
. This can be used to automatically filter out revisions that are not part of a case study, loaded by this paper config.- Parameters:
cs_name (
str
) – name of the case study- Return type:
Callable
[[str
],bool
]- Returns:
a filter function that checks if a given revision is part of a case study with name
cs_name
and returnsTrue
if it was
- varats.paper.paper_config.project_filter_generator(project_name)[source]¶
Generate project specific revision filters.
if no paper config is loaded, we allow all revisions
otherwise the paper config generates a specific revision filter
- Parameters:
project_name (
str
) – corresponding project name- Return type:
Callable
[[str
],bool
]- Returns:
a filter function that returns
True
if a revision of the specified project is included in one of the related case studies.
- varats.paper.paper_config.get_loaded_paper_config()[source]¶
Returns the currently active paper config, this requires a config to be loaded before use.
- Return type:
- Returns:
currently active paper config
- varats.paper.paper_config.is_paper_config_loaded()[source]¶
Check if a currently a paper config is loaded.
- Return type:
bool
- Returns:
True
, if a paper config has been loaded
- varats.paper.paper_config.load_paper_config(config_path=None)[source]¶
Loads a paper config from a yaml file, initializes the paper config and sets it to the currently active paper config. If no config path is provided, the paper config set in the vara settings yaml is loaded.
Note
Only one paper config can be active at a time
- Parameters:
config_path (
Optional
[Path
]) – path to a paper config folder- Return type:
None
- varats.paper.paper_config.get_paper_config()[source]¶
Returns the current paper config and loads one if there is currently no active paper config.
- Return type:
- Returns:
currently active paper config
Module: case_study¶
A case study is used to pin down the exact set of revisions that should be analysed for a project.
- class varats.paper.case_study.CSEntry(commit_hash, commit_id, config_ids=None)[source]¶
Bases:
object
Combining a commit hash with a unique and ordered id, starting with 0 for the first commit in the repository.
- property commit_hash: FullCommitHash¶
A commit hash from the git repository.
- property commit_id: int¶
The order ID of the commit hash.
- property config_ids: List[int]¶
The order ID of the configuration.
- class varats.paper.case_study.CSStage(name=None, sampling_method=None, release_type=None, revisions=None)[source]¶
Bases:
object
A stage in a case-study, i.e., a collection of revisions.
Stages are used to separate revisions into groups.
- property revisions: List[FullCommitHash]¶
Project revisions that are part of this case study.
- property name: str | None¶
Name of the stage.
- property sampling_method: SamplingMethodBase[Any] | None¶
The sampling method used for this stage.
- property release_type: ReleaseType | None¶
The sampling method used for this stage.
- has_revision(revision)[source]¶
Check if a revision is part of this case study.
- Parameters:
revision (
CommitHash
) – project revision to check- Return type:
bool
- Returns:
True
, in case the revision is part of the case study,False
otherwise.
- add_revision(revision, commit_id, config_ids=None)[source]¶
Add a new revision to this stage.
- Parameters:
revision (
FullCommitHash
) – to addcommit_id (
int
) – unique ID for ordering of commitsconfig_ids (
Optional
[List
[int
]]) – list of configuration IDs
- Return type:
None
- get_config_ids_for_revision(revision)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters:
revision (
CommitHash
) – i.e., a commit hash registered in thisCSStage
- Return type:
List
[int
]
Returns: list of config IDs
- class varats.paper.case_study.CaseStudy(project_name, version, stages=None)[source]¶
Bases:
object
A case study persists a set of revisions of a project to allow easy reevaluation.
- Stored values:
name of the related benchbuild.project
a set of revisions
- property project_name: str¶
Name of the related project.
!! This name must match the name of the BB project !!
- property project_cls: Type[Project]¶
Look up the BenchBuild project for this case study.
- Returns:
project class
- property version: int¶
Version ID for this case study.
The version differentiates case studies of the same project.
- property revisions: List[FullCommitHash]¶
Project revisions that are part of this case study.
- property num_stages: int¶
Get nummer of stages.
- get_stage_by_name(stage_name)[source]¶
Get a stage by its name. Since multiple stages can have the same name, the first matching stage is returned.
- Parameters:
stage_name (
str
) – name of the stage to lookup- Return type:
Optional
[CSStage
]- Returns:
the stage, corresponding with the ‘stage_name’, or
None
- get_stage_index_by_name(stage_name)[source]¶
Get a stage’s index by its name. Since multiple stages can have the same name, the first matching stage is returned.
- Parameters:
stage_name (
str
) – name of the stage to lookup- Return type:
Optional
[int
]- Returns:
the stage index, corresponding with the ‘stage_name’, or
None
- has_revision(revision)[source]¶
Check if a revision is part of this case study.
- Return type:
bool
- Returns:
True
, if the revision was found in one of the stages,False
otherwise
- has_revision_in_stage(revision, num_stage)[source]¶
Checks if a revision is in a specific stage.
- Return type:
bool
- Returns:
True
, if the revision was found in the specified stage,False
otherwise
- has_revision_configs_specified(revision)[source]¶
Checks whether a revision specifies different configurations.
- Parameters:
revision (
CommitHash
) – i.e., a commit hash registed in this case study- Return type:
bool
Returns: True, if configurations have been specified for this revision
- get_config_ids_for_revision(revision)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters:
revision (
CommitHash
) – i.e., a commit hash registed in this case study- Return type:
List
[int
]
Returns: list of config IDs
- get_config_ids_for_revision_in_stage(revision, num_stage)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters:
revision (
CommitHash
) – i.e., a commit hash registed in this case studynum_stage (
int
) – number of the stage to search in
- Return type:
List
[int
]
Returns: list of config IDs
- shift_stage(from_index, offset)[source]¶
Shift a stage in the case-studies’ stage list by an offset. Beware that shifts to the left (offset<0) will destroy stages.
- Parameters:
from_index (
int
) – index of the first stage to shiftoffset (
int
) – amount to stages should be shifted
- Return type:
None
- insert_empty_stage(pos)[source]¶
Insert a new stage at the given index, shifting the list elements to the right. The newly created stage is returned.
- Parameters:
pos (
int
) – index position to insert an empty stage- Return type:
- include_revision(revision, commit_id, stage_num=0, sort_revs=True)[source]¶
Add a revision to this case study.
- Parameters:
revision (
FullCommitHash
) – to addcommit_id (
int
) – unique ID for ordering of commitsstage_num (
int
) – index number of the stage to add the revision tosort_revs (
bool
) – if True, the modified stage will be sorted afterwards
- Return type:
None
- include_revisions(revisions, stage_num=0, sort_revs=True)[source]¶
Add multiple revisions to this case study.
- Parameters:
revisions (
List
[Tuple
[FullCommitHash
,int
]]) – List of tuples with (commit_hash, id) to be insertedstage_num (
int
) – The stage to insert the revisionssort_revs (
bool
) – True if the stage should be kept sorted
- Return type:
None
- name_stage(stage_num, name)[source]¶
Names an already existing stage.
- Parameters:
stage_num (
int
) – The number of the stage to namename (
str
) – The new name of the stage
- Return type:
None
- get_revision_filter()[source]¶
Generate a case study specific revision filter that only allows revision that are part of the case study.
- Return type:
Callable
[[CommitHash
],bool
]- Returns:
a callable filter function
- varats.paper.case_study.load_case_study_from_file(file_path)[source]¶
Load a case study from a file.
- Parameters:
file_path (
Path
) – path to the case study file- Return type:
- varats.paper.case_study.load_configuration_map_from_case_study_file(file_path, concrete_config_type)[source]¶
Load a configuration map from a case-study file.
- Parameters:
file_path (
Path
) – to the configuration map fileconcrete_config_type (
Type
[Configuration
]) – type of the configuration objects that should be created
- Return type:
ConfigurationMap
Returns: a new ConfigurationMap based on the parsed file
- varats.paper.case_study.store_case_study(case_study, case_study_location)[source]¶
Store case study to file in the specified paper_config.
- Parameters:
case_study (
CaseStudy
) – the case study to storecase_study_location (
Path
) – can be either a path to a paper_config or a direct path to a .case_study file
- Return type:
None
Module: artefacts¶
This module allows to attach artefact definitions
to a.
paper config
. This way, the artefacts,
like plots
or result tables, can be generated from
result files automatically.
Typically, a paper config has a file artefacts.yaml
that manages artefact
definitions.
- class varats.paper_mgmt.artefacts.ArtefactFileInfo(file_name, case_study=None)[source]¶
Bases:
object
Class containing metadata about a file generated by an artefact.
- property file_name: str¶
The name of the generated file.
- class varats.paper_mgmt.artefacts.Artefact(name, output_dir)[source]¶
Bases:
ABC
An
Artefact
contains all information that is necessary to generate a certain artefact. Subclasses of this class specify concrete artefact types, likeplots
, that require additional attributes.- Parameters:
name (
str
) – name of this artefactoutput_dir (
Path
) – output dir relative to config value ‘artefacts/artefacts_dir’
- ARTEFACT_TYPE = 'Artefact'¶
- ARTEFACT_TYPE_VERSION = 0¶
-
ARTEFACT_TYPES:
Dict
[str
,Type
[Artefact
]] = {'plot': <class 'varats.plot.plots.PlotArtefact'>, 'table': <class 'varats.table.tables.TableArtefact'>}¶
- property name: str¶
The name of this artefact.
This uniquely identifies an artefact in an
Artefacts
collection.
- property output_dir: Path¶
Absolute path to the artefact’s output directory.
- get_dict()[source]¶
Construct a dict from this artefact for easy export to yaml.
Subclasses should first call this function on
super()
and then extend the returned dict with their own properties.- Return type:
Dict
[str
,Any
]- Returns:
A dict representation of this artefact.
- abstract static create_artefact(name, output_dir, **kwargs)[source]¶
Instantiate an artefact from its dict representation.
- Parameters:
name (
str
) – name of this artefactoutput_dir (
Path
) – output dir relative to config value ‘artefacts/artefacts_dir’**kwargs (
Any
) – artefact-specific arguments
- Return type:
- Returns:
an instantiated artefact
- abstract generate_artefact(progress=None)[source]¶
Generate the specified artefact.
- Return type:
None
- abstract get_artefact_file_infos()[source]¶
Retrieve information about files generated by this artefact.
- Return type:
List
[ArtefactFileInfo
]- Returns:
a list of file info objects
- class varats.paper_mgmt.artefacts.Artefacts(file_path, artefacts)[source]¶
Bases:
object
A collection of
Artefact
s.- get_artefact(name)[source]¶
Lookup an artefact by its name.
- Parameters:
name (
str
) – the name of the artefact to retrieve- Return type:
Optional
[Artefact
]- Returns:
the artefact with the name
name
if available, elseNone
- varats.paper_mgmt.artefacts.load_artefacts(paper_config)[source]¶
Load the artefacts for a paper config.
- Parameters:
paper_config (
PaperConfig
) – the paper config to load the artefacts for- Return type:
- Returns:
the artefacts object for the given paper config
Module: paper_config_manager¶
Module for interacting and managing paper configs and case studies, e.g., this module provides functionality to visualize the status of case studies or to package a whole paper config into a zip folder.
- varats.paper_mgmt.paper_config_manager.show_status_of_case_studies(experiment_type, filter_regex, short_status, sort, print_rev_list, sep_stages, print_legend)[source]¶
Prints the status of all matching case studies to the console.
- Parameters:
experiment_type (
Type
[VersionExperiment
]) – experiment type whose files will be consideredfilter_regex (
str
) – applied to aname_version
string for filtering the amount of case studies to be shownshort_status (
bool
) – print only a short version of the status informationsort (
bool
) – sort the output order of the case studiesprint_rev_list (
bool
) – print a list of revisions for every case studysep_stages (
bool
) – print each stage separatedprint_legend (
bool
) – print a legend for the different types
- Return type:
None
- varats.paper_mgmt.paper_config_manager.get_revision_list(case_study)[source]¶
Returns a string with a list of revsion from the case-study, group by case- study stages.
- Parameters:
case_study (
CaseStudy
) – to print revisions for- Return type:
str
- Returns:
formated string that lists all revisions
- varats.paper_mgmt.paper_config_manager.get_result_files(project_name, experiment_type, report_type, commit_hash, only_newest)[source]¶
Returns a list of result files that (partially) match the given commit hash.
- Parameters:
project_name (
str
) – target projectexperiment_type (
Type
[VersionExperiment
]) – the experiment type that created the result filesreport_type (
Optional
[Type
[BaseReport
]]) – the report type of the result files; defaults to experiment’s main reportcommit_hash (
ShortCommitHash
) – the commit hash to search result files foronly_newest (
bool
) – whether to include all result files, or only the newest; ifFalse
, result files for the same revision are sorted descending by the file’s mtime
- Return type:
List
[ReportFilepath
]- Returns:
a list of matching result file paths; result files for the same revision are sorted descending by the file’s mtime
- varats.paper_mgmt.paper_config_manager.get_occurrences(status_occurrences, use_color=False)[source]¶
Returns a string with all status occurrences of a case study.
- Parameters:
status_occurrences (
DefaultDict
[FileStatusExtension
,Set
[ShortCommitHash
]]) – mapping from all occurred status to a set of revisionsuse_color (
bool
) – add color escape sequences for highlighting
- Return type:
str
- Returns:
a string with all status occurrences of a case study
- varats.paper_mgmt.paper_config_manager.get_total_status(total_status_occurrences, longest_cs_name, use_color=False)[source]¶
Returns a status string showing the total amount of occurrences.
- Parameters:
total_status_occurrences (
DefaultDict
[FileStatusExtension
,Set
[ShortCommitHash
]]) – mapping from all occured status to a set of all revisions (total amount of revisions)longest_cs_name (
int
) – amount of chars that should be considered foruse_color (
bool
) – add color escape sequences for highlighting
- Return type:
str
- Returns:
a string with all status occurrences of all case studies
- varats.paper_mgmt.paper_config_manager.get_short_status(case_study, experiment_type, longest_cs_name, use_color=False, total_status_occurrences=None)[source]¶
Return a short string representation that describes the current status of the case study.
- Parameters:
case_study (
CaseStudy
) – to printexperiment_type (
Type
[VersionExperiment
]) – experiment type to print files forlongest_cs_name (
int
) – amount of chars that should be considered for offsetting to allow case study name alignmentuse_color (
bool
) – add color escape sequences for highlightingtotal_status_occurrences (
Optional
[DefaultDict
[FileStatusExtension
,Set
[ShortCommitHash
]]]) – mapping from all occured status to a set of all revisions (total amount of revisions)
- Return type:
str
- Returns:
a short string representation of a case study
- varats.paper_mgmt.paper_config_manager.get_status(case_study, experiment_type, longest_cs_name, sep_stages, sort, use_color=False, total_status_occurrences=None)[source]¶
Return a string representation that describes the current status of the case study.
- Parameters:
case_study (
CaseStudy
) – to print the status forexperiment_type (
Type
[VersionExperiment
]) – experiment type to print files forlongest_cs_name (
int
) – amount of chars that should be considered forsep_stages (
bool
) – print each stage separatedsort (
bool
) – sort the output order of the case studiesuse_color (
bool
) – add color escape sequences for highlightingtotal_status_occurrences (
Optional
[DefaultDict
[FileStatusExtension
,Set
[ShortCommitHash
]]]) – mapping from all occurred status to a set of all revisions (total amount of revisions)
- Return type:
str
- Returns:
a full string representation of all case studies
- varats.paper_mgmt.paper_config_manager.get_legend(use_color=False)[source]¶
Builds up a complete legend that explains all status numbers and their colors.
- Parameters:
use_color (
bool
) – add color escape sequences for highlighting- Return type:
str
- Returns:
a legend to explain different status
- varats.paper_mgmt.paper_config_manager.package_paper_config(output_file, cs_filter_regex, experiment_types)[source]¶
Package all files from a paper config into a zip folder.
- Parameters:
output_file (
Path
) – file to write tocs_filter_regex (
Pattern
[str
]) – applied to aname_version
string for filtering the case studies to be included in the zip archiveexperiment_types (
List
[Type
[VersionExperiment
]]) – list of report names that should be added
- Return type:
None