This repository is home to the Project 1 source code of Group 1, for the 2025 edition of Sustainable Software Engineering course at TU Delft. The core purpose of this source code is to analyse Energy Consumption of Regular Expression (RegEx) engines, commonly used by modern IDEs and text editors.
By utilising this repository you should be able to replicate our study.
- Setup Environment
- Environment Variables
- Corpus Data
- Verify the different RegEx Engines work
- Run the experiment
- Visualisation of Results
- Authors
-
Clone this repository:
git clone https://github.com/ianjoshi/regex-energy.git cd regex-energy
-
Set up Python environment (choose one method):
# Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate # Install dependencies pip install -r requirements.txt
# Create and activate conda environment conda env create -f environment.yml conda activate regex_energy_experiment
To run the experiment, you need to set the following environment variables:
ENERGIBRIDGE_DRIVER_PATH
: Path to the Energibridge driver. This is required for the energy measurements.BOOST_PATH
: Path to the Boost library. This is required for the C++ Boost RegEx engine. Details on how to download Boost-Regex are provided in the Verify the different RegEx Engines work section.
You can set these environment variables by renaming the .env.template
file to .env
in the root directory of the project and filling in the following lines:
ENERGIBRIDGE_DRIVER_PATH="<PROJECT_ROOT>\energibridge\LibreHardwareMonitor.sys"
BOOST_PATH="<BOOST_PATH>"
NOTE: The BOOST_PATH should contain forward slashes in its path (i.e. /
instead of \
).
To download the corpus data where the engine will run the regex pattern on, corpus_generator.py
has a variable that takes URLs and downloads the raw code contents. By default, the URLs are set to download the code from the Numpy multiarray test. This file is then amplified to create a larger corpus of 100MB, so that the RegEx engines take longer to run and we can better measure the energy consumption.
Note: The corpus data is not included in the repository, due to the large size of the file (100MB). To generate it, run the corpus_generator.py
script. The corpus data will be stored in the data/
directory.
Ensure you are able to run Java, Node.js, C++, and .NET files from this directory on your computer using the following steps.
Run the test file to check if all engines are working:
python test_regex_engines.py
Note: You may notice in the test_regex_engines.py
that the following code snippet has comments 'Depends on compiler' and 'Depends on vcpkg installation path'.
def test_boost_engine_pipe_interaction(self):
# Get Boost path from environment
load_dotenv()
boost_path = os.getenv("BOOST_PATH") # Depends on vcpkg installation path
if not boost_path:
raise ValueError("BOOST_PATH environment variable not set")
# Print the directory contents to debug
print("Checking library directory:")
subprocess.run(["dir", f"{boost_path}/lib"], shell=True)
compile_result = subprocess.run([
"g++", # Depends on compiler
f"{self.factory.directory_to_store_engines}/regex_matcher.cpp",
"-o", f"{self.factory.directory_to_store_engines}/regex_matcher.exe",
f"-I{boost_path}/include",
f"-L{boost_path}/lib",
"-Wl,-rpath," + boost_path + "/bin",
"-lboost_regex-vc143-mt-x64-1_86", # Depends on compiler
"--verbose"
], capture_output=True, text=True)
# Rest of function
These lines you might have to adjust based on the system you're running the script on.
-
Windows: https://docs.oracle.com/en/java/javase/22/install/installation-jdk-microsoft-windows-platforms.html
-
macOS: https://docs.oracle.com/en/java/javase/22/install/installation-jdk-macos.html
-
Linux: https://docs.oracle.com/en/java/javase/22/install/installation-jdk-linux-platforms.html
You have a few options in terms of the compiler you choose to install here. This will affect the exact installation of the boost-regex library.
Compilers:
VCPKG:
Install:
cd C:\dev
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
.\bootstrap-vcpkg.bat
For windows: Add to path by running Powershell as Administrator:
[Environment]::SetEnvironmentVariable(
"Path",
[Environment]::GetEnvironmentVariable("Path", "Machine") + ";C:\path\to\vcpkg",
"Machine"
)
For Linux/macOS: Add to path by adding this line to your shell configuration file (~/.bashrc, ~/.zshrc, etc.):
export PATH=$PATH:/path/to/vcpkg
Then reload your shell configuration:
source ~/.bashrc # If using bash
# OR
source ~/.zshrc # If using zsh
Test if it works in a new terminal window:
vcpkg version
Boost-Regex:
If TDM-GCC is compiler: run
vcpkg install boost-regex:x64-mingw-dynamic
Go to: https://nodejs.org/en/download
Go to: https://dotnet.microsoft.com/en-us/download
First, run the corpus_generator.py
script to generate the corpus data:
python corpus_generator.py
Then, after making sure you followed all the steps described above, you should run your IDE as an Administrator and put your computer in "Zen Mode", which is described by the following steps:
1. Close all unnecessary applications.
2. Kill unnecessary services.
3. Turn off notifications.
4. Disconnect any unnecessary hardware.
5. Disconnect Wi-Fi.
6. Switch off auto-brightness on your display.
7. Set room temperature (if possible) to 25°C. Else stabilize room temperature if possible.
Finally, you can run the experiment by running:
python main.py
When running main.py
, the results and visualisations will be generated in the results/
directory.
Low Complexity Energy | Low Complexity Time |
---|---|
![]() |
![]() |
Medium Complexity Energy | Medium Complexity Time |
---|---|
![]() |
![]() |
High Complexity Energy | High Complexity Time |
---|---|
![]() |
![]() |
- Marina Escribano Esteban
- Kevin Hoxha
- Inaesh Joshi
- Todor Mladenović