At Scalytics, we’re excited to introduce LST-E (Last Energy), a Long Short-Term Memory (LSTM) model designed specifically for energy consumption forecasting. Built using real-world smart meter data from 2020 and trained on Scalytics Connect, LST-E is now available for download on GitHub and HuggingFace.
Why we created a industry specific model
AI is reshaping industries, but for many enterprises, adopting AI can feel overwhelming—especially when starting with Large Language Models (LLMs). While LLMs are trained on exabytes of data and excel in conversation-based applications, they often fall short in delivering specialized insights needed for process improvement.
This is where LSTM-based neural networks come in. Unlike LLMs, which focus on language, LSTMs are designed for tasks like time-series forecasting, making them a perfect fit for enterprise environments. Neural networks like LST-E excel at working with decentralized data silos or distributed data lakes, processing historical data to generate actionable predictions without moving or duplicating data.
With LST-E, our goal is clear:
- Help enterprises understand and predict their energy consumption.
- Enable businesses to optimize energy usage, reduce costs, and minimize CO2 emissions.
- Provide an AI solution that works securely within enterprise environments while ensuring compliance with data regulations.
By using federated learning techniques on Scalytics Connect, LST-E works across independent data stores, enabling enterprises to make accurate energy predictions while maintaining full control over their data.
Short overview how LSTEnergy works
LST-E is a time-series forecasting model that analyzes historical energy consumption data to predict future usage. Here's a quick breakdown of how it works:
- Model Initialization: LST-E is initialized with 50 hidden units and a 0.2 dropout rate.
- Training: The model is trained for 100 epochs with a batch size of 32 and validated on a testing set.
- Performance Monitoring: Using tools like Matplotlib, training and validation loss curves are plotted to ensure performance.
- Prediction: Predictions are generated for the testing set and compared against actual values.
- Error Metrics: The model calculates the Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) for accuracy evaluation.
In practice, LST-E achieves high accuracy within 20 epochs, depending on the dataset. For example, in a typical enterprise scenario, the model runs weekly for each smart meter.
Imagine an enterprise with a dataset of smart meter energy consumption over several months. Using LST-E, the model learns to map past consumption patterns to predict future usage. For instance, by analyzing the last 10 days of consumption, LST-E can predict the energy demand for the 11th day. This provides companies with a proactive approach to energy optimization and cost reduction, enabling smarter resource allocation and better decision-making.
Why we used a LSTM approach
LSTMs and other neural networks are better suited for distributed data silos than traditional AI models because:
- They Process Decentralized Data: Neural networks can work across multiple data sources without requiring centralization, ensuring compliance with data privacy regulations.
- Dynamic Learning: Unlike static algorithms, LSTMs adapt to new patterns in historical data, making them ideal for real-time predictions in dynamic environments.
- Time-Series Expertise: LSTMs are specifically designed to handle time-series data, such as energy consumption, providing highly accurate forecasts.
- Regulation-First Design: With federated learning on Scalytics Connect, LST-E ensures data remains local, minimizing compliance risks and infrastructure costs.
LSTM (Long Short-Term Memory) is a special type of a recurrent neural network (RNN), that is capable of learning long-term dependencies. LSTM models have a special architecture, they use memory cells and gates to regulate the flow of information. This allows them to remember important information from the past while forgetting irrelevant information. They are extremely useful for time series based forecasting, where the goal is to predict future values based on past events.
In the context of time series forecasting, an LSTM model takes as input a sequence of past observations and outputs a prediction for the next value in the sequence. The model is trained on historical data to learn the underlying patterns and relationships between the input features and the target variable. When the model is successfully trained, it can be used to create predictions based on new data sets without being trained again. To improve the accuracy of LSTEnergy, a user can tune the number of layers or how many neurons per layer should be used.
Now, LSTM belongs to the family of neuronal networks. But RRNs tend to forget information that is too far back in the past. This is because the hidden state vector gets diluted by repeated multiplications and additions as it passes through the network. This problem is known as the "vanishing gradient", and it limits the ability of RNNs to learn long-term dependencies.
LSTM solves this problem by introducing a new component: a cell state vector c_t. The cell state acts as a memory that can store and retrieve information over long time spans. It is regulated by three gates: an input gate i_t, an output gate o_t, and a forget gate f_t. These gates are neural networks that learn to control what information to keep or discard from the cell state and the hidden state.
The input gate decides what new information to add to the cell state based on the current input x_t and the previous hidden state h_t. The forget gate decides what old information to erase from the cell state based on the same inputs. The output gate decides what information to output from the cell state based on the updated cell state c_t and the previous hidden state h_t.
The following equations describe how these gates work mathematically:
where W_i, W_f, W_o, W_g, and W_y are weight matrices, b_i, b_f, b_o, b_g, and b_y are bias vectors, sigmoid is a logistic function that squashes values between 0 and 1, tanh is a hyperbolic tangent function that squashes values between -1 and 1, and softmax is a function that normalizes values into a probability distribution. By using these gates, LSTM can learn to selectively store and retrieve relevant information from the cell state over long time spans. This allows it to capture long-term dependencies and avoid vanishing gradients.
About Scalytics
Scalytics Connect is a next-generation Federated Learning Framework built for enterprises. It bridges the gap between decentralized data and scalable AI, enabling seamless integration across diverse sources while prioritizing compliance, data privacy, and transparency.
Our mission is to empower developers and decision-makers with a framework that removes the barriers of traditional infrastructure. With Scalytics Connect, you can build scalable, explainable AI systems that keep your organization ahead of the curve. Break free from limitations and unlock the full potential of your AI projects.
Apache Wayang: The Leading Java-Based Federated Learning Framework
Scalytics is powered by Apache Wayang, and we're proud to support the project. You can check out their public GitHub repo right here. If you're enjoying our software, show your love and support - a star ⭐ would mean a lot!
If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.