Comparing Energy Consumption of Popular RegEx Engines in IDEs and Text Editors

This repository is home to the Project 1 source code of Group 1, for the 2025 edition of Sustainable Software Engineering course at TU Delft. The core purpose of this source code is to analyse Energy Consumption of Regular Expression (RegEx) engines, commonly used by modern IDEs and text editors.

By utilising this repository you should be able to replicate our study.

Setup Environment

Clone this repository:

git clone https://github.com/ianjoshi/regex-energy.git
cd regex-energy

Set up Python environment (choose one method):

Using venv (Virtual Environment)

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Using Conda

# Create and activate conda environment
conda env create -f environment.yml
conda activate regex_energy_experiment

Environment Variables

To run the experiment, you need to set the following environment variables:

ENERGIBRIDGE_DRIVER_PATH: Path to the Energibridge driver. This is required for the energy measurements.
BOOST_PATH: Path to the Boost library. This is required for the C++ Boost RegEx engine. Details on how to download Boost-Regex are provided in the Verify the different RegEx Engines work section.

You can set these environment variables by renaming the .env.template file to .env in the root directory of the project and filling in the following lines:

ENERGIBRIDGE_DRIVER_PATH="<PROJECT_ROOT>\energibridge\LibreHardwareMonitor.sys"
BOOST_PATH="<BOOST_PATH>"

NOTE: The BOOST_PATH should contain forward slashes in its path (i.e. / instead of \).

Corpus Data

To download the corpus data where the engine will run the regex pattern on, corpus_generator.py has a variable that takes URLs and downloads the raw code contents. By default, the URLs are set to download the code from the Numpy multiarray test. This file is then amplified to create a larger corpus of 100MB, so that the RegEx engines take longer to run and we can better measure the energy consumption.

Note: The corpus data is not included in the repository, due to the large size of the file (100MB). To generate it, run the corpus_generator.py script. The corpus data will be stored in the data/ directory.

Verify the different RegEx Engines work

Ensure you are able to run Java, Node.js, C++, and .NET files from this directory on your computer using the following steps.

Find what is missing

Run the test file to check if all engines are working:

python test_regex_engines.py

Note: You may notice in the test_regex_engines.py that the following code snippet has comments 'Depends on compiler' and 'Depends on vcpkg installation path'.

    def test_boost_engine_pipe_interaction(self):
        # Get Boost path from environment
        load_dotenv()
        boost_path = os.getenv("BOOST_PATH") # Depends on vcpkg installation path
        if not boost_path:
            raise ValueError("BOOST_PATH environment variable not set")
        
        # Print the directory contents to debug
        print("Checking library directory:")
        subprocess.run(["dir", f"{boost_path}/lib"], shell=True)
        
        compile_result = subprocess.run([
            "g++", # Depends on compiler
            f"{self.factory.directory_to_store_engines}/regex_matcher.cpp",
            "-o", f"{self.factory.directory_to_store_engines}/regex_matcher.exe",
            f"-I{boost_path}/include",
            f"-L{boost_path}/lib",
            "-Wl,-rpath," + boost_path + "/bin",
            "-lboost_regex-vc143-mt-x64-1_86",  # Depends on compiler
            "--verbose"
        ], capture_output=True, text=True)

        # Rest of function

These lines you might have to adjust based on the system you're running the script on.

Install Java Development Kit (JDK)

Install C++ compiler & Boost-Regex

You have a few options in terms of the compiler you choose to install here. This will affect the exact installation of the boost-regex library.

Compilers:

VCPKG:

Install:

cd C:\dev
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
.\bootstrap-vcpkg.bat

For windows: Add to path by running Powershell as Administrator:

[Environment]::SetEnvironmentVariable(
    "Path",
    [Environment]::GetEnvironmentVariable("Path", "Machine") + ";C:\path\to\vcpkg",
    "Machine"
)

For Linux/macOS: Add to path by adding this line to your shell configuration file (~/.bashrc, ~/.zshrc, etc.):

export PATH=$PATH:/path/to/vcpkg

Then reload your shell configuration:

source ~/.bashrc  # If using bash
# OR
source ~/.zshrc   # If using zsh

Test if it works in a new terminal window:

vcpkg version

Boost-Regex:

If TDM-GCC is compiler: run

vcpkg install boost-regex:x64-mingw-dynamic

Install Node.js

Go to: https://nodejs.org/en/download

Install .NET

Go to: https://dotnet.microsoft.com/en-us/download

Run the experiment

First, run the corpus_generator.py script to generate the corpus data:

python corpus_generator.py

Then, after making sure you followed all the steps described above, you should run your IDE as an Administrator and put your computer in "Zen Mode", which is described by the following steps:

1. Close all unnecessary applications.
2. Kill unnecessary services.
3. Turn off notifications.
4. Disconnect any unnecessary hardware.
5. Disconnect Wi-Fi.
6. Switch off auto-brightness on your display.
7. Set room temperature (if possible) to 25°C. Else stabilize room temperature if possible.

Finally, you can run the experiment by running:

python main.py

When running main.py, the results and visualisations will be generated in the results/ directory.

Visualisation of Results

Low Complexity Energy	Low Complexity Time

Medium Complexity Energy	Medium Complexity Time

High Complexity Energy	High Complexity Time

Authors

Marina Escribano Esteban
Kevin Hoxha
Inaesh Joshi
Todor Mladenović

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
analysis		analysis
data		data
energibridge		energibridge
results		results
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
corpus_generator.py		corpus_generator.py
energibridge_executor.py		energibridge_executor.py
energy_experiment.py		energy_experiment.py
environment.yml		environment.yml
main.py		main.py
regex_engine_factory.py		regex_engine_factory.py
regex_matching.py		regex_matching.py
requirements.txt		requirements.txt
run_regex_engines.py		run_regex_engines.py
test_regex_engines.py		test_regex_engines.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Energy Consumption of Popular RegEx Engines in IDEs and Text Editors

Table of Contents

Setup Environment

Using venv (Virtual Environment)

Using Conda

Environment Variables

Corpus Data

Verify the different RegEx Engines work

Find what is missing

Install Java Development Kit (JDK)

Install C++ compiler & Boost-Regex

Install Node.js

Install .NET

Run the experiment

Visualisation of Results

Authors

About

Contributors 4

Languages

ianjoshi/regex-energy

Folders and files

Latest commit

History

Repository files navigation

Comparing Energy Consumption of Popular RegEx Engines in IDEs and Text Editors

Table of Contents

Setup Environment

Using venv (Virtual Environment)

Using Conda

Environment Variables

Corpus Data

Verify the different RegEx Engines work

Find what is missing

Install Java Development Kit (JDK)

Install C++ compiler & Boost-Regex

Install Node.js

Install .NET

Run the experiment

Visualisation of Results

Authors

About

Topics

Resources

Stars

Watchers

Forks

Contributors 4

Languages