Paper and case study handling¶
Whenever we desing experiments that look at specific revisions of a project, we run into the problem that to re-evaluate our experiment data, we need to preserve the set of revisions. Exactly for this problem, the tool suite provides case studies that preserve the information about which revision of a project was analyzed. In addition, to fully preserve also the set of projects that were analyzed, we designed paper configs as a collection of different case studies. Furthermore, we can use paper configs and case studies not only to re-evaluate our own experiment, but we also allow others to reproduce our data or design their own experiment based on our project and revision selection.
How to use case studies¶
If one wants to analyze a particular set of revisions or wants to re-evaluate the same revision over and over again, we can fixate the revisions we are interested in by creating a CaseStudy.
First, create a folder, where your config should be saved.
Then, create a case study that specifies the revision to be analyzed.
In order to ease the creation of case studies the tool suite offers different sampling methods to choose revisions from the projects history based on a probability distribution.
For example, we can generate a new case study for the project gzip, drawing 10 revision from the projects history based on a half-normal distribution, with:
vara-cs gen PATH_TO_PAPER_CONF_DIR/ HalfNormalSamplingMethod PATH_TO_REPO/ --num-rev 10
To easy handling of multiple projects, created case studies should be grouped into folders, e.g., a set of case studies used for a paper, called paper config. For more information see How to use paper configs.
Extending case studies¶
Case studies group together revisions but sometimes these groups need to be changed or extended, e.g., when we want to sample a few more revisions to gather data for a specific revision range. To simplify that, our tool suite provides vara-cs ext, a tool for extending and changing case studies.
For example:
vara-cs ext paper_configs/ase-17/gzip_0.case_study distrib_add gzip/ --distribution UniformSamplingMethod --num-rev 5
will add 5 new revision, sampled uniformly from all revisions, to the case study.
Warning
The specified distribution only relates to the newly added revisions but does not include revisions previously added. If one wants to draw all revision according to the same distribution a new case study has to be created.
In more detail, case studies have different stages that are separated from each other. This allows us, for example, to extend a case study with a specific revision without changing the initial set of revisions, e.g., stage 0.
For example:
vara-cs ext paper_configs/ase-17/gzip_0.case_study simple_add gzip/ --extra-revs 0dd8313ea7bce --merge-stage 3
will add revision 0dd8313ea7bce to the stage 3 of the gzip case study, allowing us to analyze it and draw different plots, e.g., one containing only stage 0 data and another with all stages included
How to use paper configs¶
Paper configs are used to group different case studies together. Take, for example, the case where one wants to analyze the projects, gzip and git, for the evaluation of a paper that get’s submitted to ase-17. First, one creates different case studies for each project selecting the different revisions that should be analyzed. Second, all case studies related to the evaluation for ase-17 are grouped into a folder – the paper config – to relate them to the paper. Now we can design and run our experiment for ase-17 on all revisions added through case studies in the paper config and generate our experiment results.
The paper config now allows us to reproduce all the results for our paper with a single call to the tool suite. Furthermore, this is also helpfull for other researchers that are now able to reproduce our results.
In more detail, our specified paper config allows the tool suite to tell BenchBuild which revisions should be analyzed to evaluate a set of case studies. For example, a setup could look like this:
paper_configs
├── ase-17
│ ├── gzip_0.case_study
│ ├── gzip_1.case_study
│ └── git_0.case_study
└── icse-18
├── gzip_0.case_study
└── git_0.case_study
In this example, we got two paper configs, one for ase-17 another for icse-18.
We see different case studies for gzip and git, notice here that we can create multiple case studies for one project.
If we now want to evaluate our set for icse-18 we set the paper-config folder to the root of our config tree and select the icse-18 folder as our current config in the settings file .vara.yaml, like this:
paper_config:
current_config:
value: icse-18
folder:
value: /home/foo/vara/paper_configs/
Next, we can run our experiment with BenchBuild as usual. During experiment execution, BenchBuild will load our config and only evaluate the needed revisions.
The current status of all case studies belonging to the current paper config, can be visualized with vara-cs status:
>>> vara-cs status -s
CS: gzip_0: (0/5) processed
CS: gzip_1: (2/5) processed
CS: gzip_2: (5/5) processed
CS: libvpx_0: (0/5) processed
The tool vara-pc provides a simple command line interface for creating and managing paper configs.
Artefacts¶
The artefacts module provides an easy way to attach descriptions of artefacts, like plots or result tables, to a paper config. This way, reproducing the exact same plots for a paper config over and over again becomes as easy as invoking a single command.
For more information about how to create and manage artefacts, refer to the documentation of the vara-art tool.
Paper and case study modules¶
Module: paper_config¶
The PaperConfig pins down a specific set of case studies, one or more for each project, where each encaspulates a fixed set of revision to evaluate.
This allows users to specify which revisions of what project have to be analyzed. Furthermore, it allows other users to reproduce the exact same set of projects and revisions, either with the old experiment automatically or with a new experiment to compare the results.
- class varats.paper_mgmt.paper_config.PaperConfig(folder_path)[source]¶
Bases:
objectPaper config, a specific set of case studies, e.g., for a publication.
The paper config allows easy reevaluation of a set of case studies.
- Parameters
folder_path (
Path) – path to the paper config folder
- property path: pathlib.Path¶
Path to the paper config folder.
- Return type
Path
- property artefacts: varats.paper_mgmt.artefacts.Artefacts¶
The artefacts of this paper config.
- Return type
- get_case_studies(cs_name)[source]¶
Lookup all case studies with a given name.
- Parameters
cs_name (
str) – name of the case study- Return type
List[CaseStudy]- Returns
case studies with project name cs_name.
- get_all_case_studies()[source]¶
Generate a list of all case studies in the paper config.
- Return type
List[CaseStudy]- Returns
full list of all case studies with all different version.
- get_all_artefacts()[source]¶
Returns an iterable of the artefacts of this paper config.
- Return type
Iterable[Artefact]
- has_case_study(cs_name)[source]¶
Checks if a case study with cs_name was loaded.
- Parameters
cs_name (
str) – name of the case study- Return type
bool- Returns
True, if a case study withcs_namewas loaded
- get_filter_for_case_study(cs_name)[source]¶
Return a case study specific revision filter. If one case study includes a revision the filter function will return
True. This can be used to automatically filter out revisions that are not part of a case study, loaded by this paper config.- Parameters
cs_name (
str) – name of the case study- Return type
Callable[[str],bool]- Returns
a filter function that checks if a given revision is part of a case study with name
cs_nameand returnsTrueif it was
- add_case_study(case_study)[source]¶
Add a new case study to this paper config.
- Parameters
case_study (
CaseStudy) – to be added- Return type
None
- varats.paper_mgmt.paper_config.project_filter_generator(project_name)[source]¶
Generate project specific revision filters.
if no paper config is loaded, we allow all revisions
otherwise the paper config generates a specific revision filter
- Parameters
project_name (
str) – corresponding project name- Return type
Callable[[str],bool]- Returns
a filter function that returns
Trueif a revision of the specified project is included in one of the related case studies.
- varats.paper_mgmt.paper_config.get_loaded_paper_config()[source]¶
Returns the currently active paper config, this requires a config to be loaded before use.
- Return type
- Returns
currently active paper config
- varats.paper_mgmt.paper_config.is_paper_config_loaded()[source]¶
Check if a currently a paper config is loaded.
- Return type
bool- Returns
True, if a paper config has been loaded
- varats.paper_mgmt.paper_config.load_paper_config(config_path=None)[source]¶
Loads a paper config from a yaml file, initializes the paper config and sets it to the currently active paper config. If no config path is provided, the paper config set in the vara settings yaml is loaded.
Note
Only one paper config can be active at a time
- Parameters
config_path (
Optional[Path]) – path to a paper config folder- Return type
None
Module: case_study¶
A case study is used to pin down the exact set of revisions that should be analysed for a project.
- class varats.paper.case_study.CSEntry(commit_hash, commit_id, config_ids=None)[source]¶
Bases:
objectCombining a commit hash with a unique and ordered id, starting with 0 for the first commit in the repository.
- property commit_hash: str¶
A commit hash from the git repository.
- Return type
str
- property commit_id: int¶
The order ID of the commit hash.
- Return type
int
- property config_ids: List[int]¶
The order ID of the configuration.
- Return type
List[int]
- class varats.paper.case_study.CSStage(name=None, sampling_method=None, release_type=None, revisions=None)[source]¶
Bases:
objectA stage in a case-study, i.e., a collection of revisions.
Stages are used to separate revisions into groups.
- property revisions: List[str]¶
Project revisions that are part of this case study.
- Return type
List[str]
- property name: Optional[str]¶
Name of the stage.
- Return type
Optional[str]
- property sampling_method: Optional[varats.base.sampling_method.SamplingMethodBase[Any]]¶
The sampling method used for this stage.
- Return type
Optional[SamplingMethodBase[Any]]
- property release_type: Optional[varats.provider.release.release_provider.ReleaseType]¶
The sampling method used for this stage.
- Return type
Optional[ReleaseType]
- has_revision(revision)[source]¶
Check if a revision is part of this case study.
- Parameters
revision (
str) – project revision to check- Return type
bool- Returns
True, in case the revision is part of the case study,Falseotherwise.
- add_revision(revision, commit_id, config_ids=None)[source]¶
Add a new revision to this stage.
- Parameters
revision (
str) – to addcommit_id (
int) – unique ID for ordering of commitsconfig_ids (
Optional[List[int]]) – list of configuration IDs
- Return type
None
- get_config_ids_for_revision(revision)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters
revision (
str) – i.e., a commit hash registed in thisCSStage
Returns: list of config IDs
- Return type
List[int]
- class varats.paper.case_study.CaseStudy(project_name, version, stages=None)[source]¶
Bases:
objectA case study persists a set of revisions of a project to allow easy reevaluation.
- Stored values:
name of the related benchbuild.project
a set of revisions
- property project_name: str¶
Name of the related project.
!! This name must match the name of the BB project !!
- Return type
str
- property version: int¶
Version ID for this case study.
The version differentiates case studies of the same project.
- Return type
int
- property revisions: List[str]¶
Project revisions that are part of this case study.
- Return type
List[str]
- property stages: List[varats.paper.case_study.CSStage]¶
Get a list with all stages.
- Return type
List[CSStage]
- property num_stages: int¶
Get nummer of stages.
- Return type
int
- get_stage_by_name(stage_name)[source]¶
Get a stage by its name. Since multiple stages can have the same name, the first matching stage is returned.
- Parameters
stage_name (
str) – name of the stage to lookup- Return type
Optional[CSStage]- Returns
the stage, corresponding with the ‘stage_name’, or
None
- get_stage_index_by_name(stage_name)[source]¶
Get a stage’s index by its name. Since multiple stages can have the same name, the first matching stage is returned.
- Parameters
stage_name (
str) – name of the stage to lookup- Return type
Optional[int]- Returns
the stage index, corresponding with the ‘stage_name’, or
None
- has_revision(revision)[source]¶
Check if a revision is part of this case study.
- Return type
bool- Returns
True, if the revision was found in one of the stages,Falseotherwise
- has_revision_in_stage(revision, num_stage)[source]¶
Checks if a revision is in a specific stage.
- Return type
bool- Returns
True, if the revision was found in the specified stage,Falseotherwise
- get_config_ids_for_revision(revision)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters
revision (
str) – i.e., a commit hash registed in this case study
Returns: list of config IDs
- Return type
List[int]
- get_config_ids_for_revision_in_stage(revision, num_stage)[source]¶
Returns a list of all configuration IDs specified for this revision.
- Parameters
revision (
str) – i.e., a commit hash registed in this case studynum_stage (
int) – number of the stage to search in
Returns: list of config IDs
- Return type
List[int]
- shift_stage(from_index, offset)[source]¶
Shift a stage in the case-studie’s stage list by an offset. Beware that shifts to the left (offset<0) will destroy stages.
- Parameters
from_index (
int) – index of the first stage to shiftoffset (
int) – amount to stages should be shifted
- Return type
None
- insert_empty_stage(pos)[source]¶
Insert a new stage at the given index, shifting the list elements to the right. The newly created stage is returned.
- Parameters
pos (
int) – index position to insert an empty stage- Return type
- include_revision(revision, commit_id, stage_num=0, sort_revs=True)[source]¶
Add a revision to this case study.
- Parameters
revision (
str) – to addcommit_id (
int) – unique ID for ordering of commitsstage_num (
int) – index number of the stage to add the revision tosort_revs (
bool) – if True, the modified stage will be sorted afterwards
- Return type
None
- include_revisions(revisions, stage_num=0, sort_revs=True)[source]¶
Add multiple revisions to this case study.
- Parameters
revisions (
List[Tuple[str,int]]) – List of tuples with (commit_hash, id) to be insertedstage_num (
int) – The stage to insert the revisionssort_revs (
bool) – True if the stage should be kept sorted
- Return type
None
- name_stage(stage_num, name)[source]¶
Names an already existing stage.
- Parameters
stage_num (
int) – The number of the stage to namename (
str) – The new name of the stage
- Return type
None
- varats.paper.case_study.load_case_study_from_file(file_path)[source]¶
Load a case study from a file.
- Parameters
file_path (
Path) – path to the case study file- Return type
- varats.paper.case_study.load_configuration_map_from_case_study_file(file_path, concrete_config_type)[source]¶
Load a configuration map from a case-study file.
- Parameters
file_path (
Path) – to the configuration map fileconcrete_config_type (
Type[Configuration]) – type of the configuration objects that should be created
Returns: a new ConfigurationMap based on the parsed file
- Return type
ConfigurationMap
- varats.paper.case_study.store_case_study(case_study, case_study_location)[source]¶
Store case study to file in the specified paper_config.
- Parameters
case_study (
CaseStudy) – the case study to storecase_study_location (
Path) – can be either a path to a paper_config or a direct path to a .case_study file
- Return type
None
Module: artefacts¶
This module allows to attach artefact definitions to a.
paper config. This way, the artefacts,
like plots or result tables, can be generated from
result files automatically.
Typically, a paper config has a file artefacts.yaml that manages artefact
definitions.
- class varats.paper_mgmt.artefacts.Artefact(artefact_type, name, output_path)[source]¶
Bases:
abc.ABCAn
Artefactcontains all information that is necessary to generate a certain artefact. Subclasses of this class specify concrete artefact types, likeplots, that require additional attributes.- Parameters
artefact_type (
ArtefactType) – Thetypeof this artefact.name (
str) – The name of this artefact.output_path (
Path) – The output path for this artefact.
- property artefact_type: varats.paper_mgmt.artefacts.ArtefactType¶
The
typeof this artefact.- Return type
- property name: str¶
The name of this artefact.
This uniquely identifies an artefact in an
Artefactscollection.- Return type
str
- property output_path: pathlib.Path¶
The output path for this artefact.
The output path is relative to the directory specified as
artefacts.artefacts_dirin the current varats config.- Return type
Path
- class varats.paper_mgmt.artefacts.PlotArtefact(name, output_path, plot_type, file_format, **kwargs)[source]¶
Bases:
varats.paper_mgmt.artefacts.ArtefactAn artefact defining a
plot.- Parameters
name (
str) – The name of this artefact.output_path (
Path) – the path where the plot this artefact produces will be storedplot_type (
str) – thetype of plotthat will be generatedfile_format (
str) – the file format of the generated plotkwargs (
Any) – additional arguments that will be passed to the plot class
- property plot_type: str¶
The
type of plotthat will be generated.- Return type
str
- property plot_type_class: Type[varats.plot.plot.Plot]¶
The class associated with
plot_type().- Return type
Type[Plot]
- property file_format: str¶
The file format of the generated plot.
- Return type
str
- property plot_kwargs: Any¶
Additional arguments that will be passed to the plot_type_class.
- Return type
Any
- class varats.paper_mgmt.artefacts.TableArtefact(name, output_path, table_type, table_format, **kwargs)[source]¶
Bases:
varats.paper_mgmt.artefacts.ArtefactAn artefact defining a
table.- Parameters
name (
str) – The name of this artefact.output_path (
Path) – the path where the table this artefact produces will be storedtable_type (
str) – thetype of tablethat will be generatedtable_format (
TableFormat) – the format of the generated tablekwargs (
Any) – additional arguments that will be passed to the table class
- property table_type: str¶
The
type of tablethat will be generated.- Return type
str
- property table_type_class: Type[varats.table.table.Table]¶
The class associated with
table_type().- Return type
Type[Table]
- property file_format: varats.table.table.TableFormat¶
The file format of the generated table.
- Return type
TableFormat
- property table_kwargs: Any¶
Additional arguments that will be passed to the table_type_class.
- Return type
Any
- class varats.paper_mgmt.artefacts.ArtefactType(value)[source]¶
Bases:
enum.EnumEnum for the different artefact types.
The name is used in the
artefacts.yamlto identify what kind of artefact is described. The values are tuples(artefact_class, version)consisting of the class responsible for that kind of artefact and a version number to allow evolution of artefacts.- value: Tuple[varats.paper_mgmt.artefacts.Artefact, int]¶
- plot = (<class 'varats.paper_mgmt.artefacts.PlotArtefact'>, 1)¶
- table = (<class 'varats.paper_mgmt.artefacts.TableArtefact'>, 1)¶
- class varats.paper_mgmt.artefacts.Artefacts(artefacts)[source]¶
Bases:
objectA collection of
Artefacts.- property artefacts: Iterable[varats.paper_mgmt.artefacts.Artefact]¶
An iterator of the
Artefacts in this collection.- Return type
Iterable[Artefact]
- get_artefact(name)[source]¶
Lookup an artefact by its name.
- Parameters
name (
str) – the name of the artefact to retrieve- Return type
Optional[Artefact]- Returns
the artefact with the name
nameif available, elseNone
- varats.paper_mgmt.artefacts.create_artefact(artefact_type, name, output_path, **kwargs)[source]¶
Create a new
Artefactfrom the provided parameters.- Parameters
artefact_type (
ArtefactType) – thetypefor the artefactname (
str) – the name of the artefactoutput_path (
Path) – the output path for the artefact**kwargs – additional arguments that are passed to the class selected by
artefact_type
- Return type
- Returns
the created artefact
- varats.paper_mgmt.artefacts.load_artefacts_from_file(file_path)[source]¶
Load an artefacts file.
- Parameters
file_path (
Path) – the path to the artefacts file- Return type
- Returns
the artefacts created from the given file
- varats.paper_mgmt.artefacts.store_artefacts(artefacts, artefacts_location)[source]¶
Store artefacts to file in the specified paper_config.
- Parameters
artefacts (
Artefacts) – the artefacts to storeartefacts_location (
Path) – the location for the artefacts file. Can be either a path to a paper_config or a direct path to an artefacts.yaml file.
- Return type
None
- varats.paper_mgmt.artefacts.filter_plot_artefacts(artefacts)[source]¶
Filter all plot artefacts from a list of artefacts.
- Parameters
artefacts (
Iterable[Artefact]) – the artefacts to filter- Return type
Iterable[PlotArtefact]- Returns
all plot artefacts
- varats.paper_mgmt.artefacts.filter_table_artefacts(artefacts)[source]¶
Filter all table artefacts from a list of artefacts.
- Parameters
artefacts (
Iterable[Artefact]) – the artefacts to filter- Return type
Iterable[TableArtefact]- Returns
all table artefacts
Module: paper_config_manager¶
Module for interacting and managing paper configs and case studies, e.g., this modules provides functionality to visualize the status of case studies or to package a whole paper config into a zip folder.
- varats.paper_mgmt.paper_config_manager.show_status_of_case_studies(report_name, filter_regex, short_status, sort, print_rev_list, sep_stages, print_legend)[source]¶
Prints the status of all matching case studies to the console.
- Parameters
report_name (
str) – name of the report whose files will be consideredfilter_regex (
str) – applied to aname_versionstring for filtering the amount of case studies to be shownshort_status (
bool) – print only a short version of the status informationsort (
bool) – sort the output order of the case studiesprint_rev_list (
bool) – print a list of revisions for every case studysep_stages (
bool) – print each stage separetedprint_legend (
bool) – print a legend for the different types
- Return type
None
- varats.paper_mgmt.paper_config_manager.get_revision_list(case_study)[source]¶
Returns a string with a list of revsion from the case-study, group by case- study stages.
- Parameters
case_study (
CaseStudy) – to print revisions for- Return type
str- Returns
formated string that lists all revisions
- varats.paper_mgmt.paper_config_manager.get_result_files(result_file_type, project_name, commit_hash, only_newest)[source]¶
Returns a list of result files that (partially) match the given commit hash.
- Parameters
result_file_type (
MetaReport) – the type of the result fileproject_name (
str) – target projectcommit_hash (
str) – the commit hash to search result files foronly_newest (
bool) – whether to include all result files, or only the newest; ifFalse, result files for the same revision are sorted descending by the file’s mtime
- Return type
List[Path]- Returns
a list of matching result file paths; result files for the same revision are sorted descending by the file’s mtime
- varats.paper_mgmt.paper_config_manager.get_occurrences(status_occurrences, use_color=False)[source]¶
Returns a string with all status occurrences of a case study.
- Parameters
status_occurrences (
DefaultDict[FileStatusExtension,Set[str]]) – mapping from all occured status to a set of revisionsuse_color (
bool) – add color escape sequences for highlighting
- Return type
str- Returns
a string with all status occurrences of a case study
- varats.paper_mgmt.paper_config_manager.get_total_status(total_status_occurrences, longest_cs_name, use_color=False)[source]¶
Returns a status string showing the total amount of occurrences.
- Parameters
total_status_occurrences (
DefaultDict[FileStatusExtension,Set[str]]) – mapping from all occured status to a set of all revisions (total amount of revisions)longest_cs_name (
int) – amount of chars that should be considered foruse_color (
bool) – add color escape sequences for highlighting
- Return type
str- Returns
a string with all status occurrences of all case studies
- varats.paper_mgmt.paper_config_manager.get_short_status(case_study, result_file_type, longest_cs_name, use_color=False, total_status_occurrences=None)[source]¶
Return a short string representation that describes the current status of the case study.
- Parameters
case_study (
CaseStudy) – to printresult_file_type (
MetaReport) – report type to printlongest_cs_name (
int) – amount of chars that should be considered for offsetting to allow case study name alignmentuse_color (
bool) – add color escape sequences for highlightingtotal_status_occurrences (
Optional[DefaultDict[FileStatusExtension,Set[str]]]) – mapping from all occured status to a set of all revisions (total amount of revisions)
- Return type
str- Returns
a short string representation of a case study
- varats.paper_mgmt.paper_config_manager.get_status(case_study, result_file_type, longest_cs_name, sep_stages, sort, use_color=False, total_status_occurrences=None)[source]¶
Return a string representation that describes the current status of the case study.
- Parameters
case_study (
CaseStudy) – to print the status forresult_file_type (
MetaReport) – report type to printlongest_cs_name (
int) – amount of chars that should be considered forsep_stages (
bool) – print each stage separetedsort (
bool) – sort the output order of the case studiesuse_color (
bool) – add color escape sequences for highlightingtotal_status_occurrences (
Optional[DefaultDict[FileStatusExtension,Set[str]]]) – mapping from all occured status to a set of all revisions (total amount of revisions)
- Return type
str- Returns
a full string representation of all case studies
- varats.paper_mgmt.paper_config_manager.get_legend(use_color=False)[source]¶
Builds up a complete legend that explains all status numbers and their colors.
- Parameters
use_color (
bool) – add color escape sequences for highlighting- Return type
str- Returns
a legend to explain different status
- varats.paper_mgmt.paper_config_manager.package_paper_config(output_file, cs_filter_regex, report_names)[source]¶
Package all files from a paper config into a zip folder.
- Parameters
output_file (
Path) – file to write tocs_filter_regex (
Pattern[str]) – applied to aname_versionstring for filtering the case studies to be included in the zip archivereport_names (
List[str]) – list of report names that should be added
- Return type
None