In your Part 1 of this mini-series we discussed why data silos fail AI. Now, I guess you already know it, but I want to point it out, again. Many organizations absolutely do not want their sensitive data to leave their secure network environment. Cloud providers address this data privacy issue with deep integration at the network level and dedicated services that are located within the secure customer network. While this setup can meet the requirements of GDPR compliance, it often results in data being moved between silos, duplicated, and made less controllable. Consequently, the data moves from the point of origin or collection to the point of provision within analysis and reporting systems.
Many organizations absolutely do not want their sensitive data to leave their secure network environment.
A current trend in data system architecture is the so-called “Shift-Left Paradigm.” This approach emphasizes that you don't need to move the data; instead, you bring the algorithms for analysis and training tasks directly to the data. This can occur within the database, storage cluster, operational database, or streaming processing environment. As a result, only intermediate information, rather than sensitive raw data, is shared for collaborative use in the data plane. This paradigm aligns well with the trend of creating “data products.” Such an architecture looks like this:
Scalytics Connect enables data collaboration zones, which are secure, connected areas containing sensitive data that cannot leave their jurisdiction. Data products also define these jurisdictions, often referred to as governance zones. In this context, Scalytics Connect serves as a bridge between your data products.
The Role of the Data Firewall
Scalytics Connect comprises edge nodes situated within the organization’s secure network, allowing direct access to operational systems such as SAP, Oracle, or Salesforce. A data plane is established using open protocols, including HTTP(S), MQTT, or the Kafka protocol. This system only accepts data that can be utilized collaboratively with other areas; the sensitive data never leaves the secure network of the source system.
This setup means that the Scalytics Connect server effectively functions as a data firewall. While there are existing application firewalls, Scalytics Connect takes this concept further. We technically ensure that the data remains stationary, which is critical for avoiding unnecessary data movements and uncontrolled copies. To accomplish this, we bring the relevant parts of the algorithm directly to the data. The data firewall operates like a permeable membrane in a biological organism, allowing specific requests to pass through while blocking others. Conversely, it permits certain types of information to flow in but restricts others.
Scalytics Connect thus creates a secure, easily controllable connection between the internally protected and collaborative data networks. If necessary, additional layers such as a public data network or supplementary cooperation networks can be established. With a data firewall in place, the data remains in its original location, and access is only granted to approved processing operations. Processing context and algorithm details can now be used to monitor data usage, all of which occurs transparently through an established open-source API utilizing Apache Wayang.
A Turn-Key Solution for Data Management
Scalytics Connect is offered as a turn-key solution. We establish the data network within your existing infrastructure, assess compliance status, and ensure an audit-ready solution from the outset. Data ownership and data sovereignty are central to our compliance-first approach. We empower data owners to define usage rules within the data firewall, making these rules immediately auditable and the compliance level always visible.
We are creating a new framework for decentralized data-centered collaboration (DDZ). Customers maintain complete control over their data; unnecessary and risky data movements and copies are eliminated. We ensure a connection to the data plane within the infrastructure provided by the customer. This is achieved through the Scalytics Connect data firewall, offering data-sharing capabilities directly at the business level via a scalable, robust API and an intuitive UI.
Transparency and Compliance
To comply with data protection regulations, it is crucial to quickly consider all relevant aspects in a specific data usage context. The requirements stemming from GDPR, the EU Data Act, and the EU AI Act must be implemented. However, we can only confirm that everything is functioning as expected if the compliance status is immediately visible: we need to know which data is being used for what purpose and by whom.
Summary
Scalytics' mission is clear: “We are creating a standardized approach to collaborative data use.” By drastically reducing the effort involved in data movements, we minimize costs and risks. Cooperative data use must be approached from a completely new perspective, particularly regarding the introduction of agent systems in organizations. It’s not merely about which agent can access which database, data warehouse, or pipeline.
Instead, the crucial question is what information a research team, manager, assistant, or agent can extract from the data plane. According to GDPR, the data usage context is a fundamental element of data security, and this is precisely what the Scalytics Connect data firewall prioritizes. By establishing decentralized data sovereignty over raw data and data products, Scalytics Connect enables cooperative data use.
Recommended Next Steps for Our Customers
In light of the growing complexities of data management in an increasingly regulated environment, we advise our customers to take a proactive approach to enhancing their data governance and collaboration capabilities. The first step is to assess your current data architecture to identify potential silos and areas where the shift-left architecture can be implemented. By evaluating how algorithms can be brought to the data rather than moving the data itself, you can mitigate risks associated with data transfers and ensure compliance with regulations like GDPR and the EU Data Act.
Next, consider leveraging Scalytics Connect to establish data collaboration zones within your existing infrastructure. This provides a robust framework for maintaining data sovereignty while enabling secure, collaborative access to sensitive data. Implementing a data firewall, as outlined in our blog, can help you maintain control over your data usage and streamline compliance monitoring.
Finally, engage in regular training and workshops to ensure your team is well-versed in the principles of decentralized data-centered collaboration (DDZ) and the functionalities of Scalytics Connect. By fostering a culture of compliance and transparency, you position your organization to leverage data as a strategic asset while adhering to evolving legal requirements.
By following these recommendations, organizations can effectively navigate the challenges of modern data management while maximizing their operational efficiencies and safeguarding sensitive information.
If you’d like more tailored guidance or have specific questions about implementing these strategies, feel free to reach out to our team.
About Scalytics
Experience seamless integration across diverse data sources, enabling true AI scalability and removing the roadblocks that obstruct your machine learning data compliance and data privacy solutions for AI. Break free from the limitations of the past and accelerate innovation with Scalytics Connect, paving the way for a distributed computing framework that empowers your data-driven strategies.
Apache Wayang: The Leading Java-Based Federated Learning Framework
Scalytics is powered by Apache Wayang, and we're proud to support the project. You can check out their public GitHub repo right here. If you're enjoying our software, show your love and support - a star ⭐ would mean a lot!
If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.