The Data Mesh - should you adapt?

In actuality, not every firm may be a good fit for the implementation of a Data Mesh.  Larger enterprises that experience uncertainty and change in their operations and environment are the primary target audience for Data Mesh.  A Data Mesh is definitely an unnecessary expense if your organization's data requirements are modest and remain constant over time. What is a "Data Mesh"? As it focuses on delivering useful and safe data products, Data Mesh is a strategic approach to modern data management and a strategy to support an organization's journey toward digital transformation. Data Mesh's major goal is to advance beyond the established centralized data management techniques of using data warehouses and data lakes. By giving data producers and data consumers the ability to access and handle data without having to go through the hassle of involving the data lake or data warehouse team, Data Mesh highlights the concept of organizational agility. Data Mesh's dec

Regulation-Compliant Federated Data Processing

Federated data processing has been a standard model for virtual integration of disparate data sources, where each source upholds a certain amount of autonomy. While early federated technologies resulted from mergers, acquisitions, and specialized corporate applications, recent demand for decentralized data storage and computation in information marketplaces and for Geo-distributed data analytics has made federated data services an indispensable component in the data systems market.  At the same time, growing concerns with data privacy propelled by regulations across the world has brought federated data processing under the purview of regulatory bodies.  This series of blog post will discuss challenges in building regulation-compliant federated data processing systems and our initiatives at Databloom that strive towards making compliance as a first-class citizen in our Blossom data platform .   Federated Data Processing Running analytics in a federated environment require distribu

Internationalization: The challenges of building multilingual web applications

At Databloom, we value diversity. We are a multicultural company with team members from different parts of the world, where we speak a wide variety of languages, such as English, French, German, Greek, Hindi, Korean, and Spanish. Data science teams are also so diverse! For that reason, in Databloom's Blossom Studio , we plan to introduce internationalization and localization features to make our application multilingual. In that context, we want to discuss several aspects that we found relevant when trying to implement a multilingual application. In addition, we also want to share some resources that we found helpful when applying some of these concepts into practice. 1. Translation Methods We have two main options: machine automatic translations , where an external service performs the translation for us (e.g., Google translator API , Amazon translate ), and human translations , where we manually provide the translated texts. Generally speaking, it is helpful to use online transla

Apache Wayang: More than a Big Data Abstraction

Recently, Paul King (V.P. and Chair of Groovy PMC) highlighted the big data abstraction [ 1 ] that Apache Wayang [ 2 ] provides. He mainly showed that users specify an application in a logical plan (a Wayang Plan) that is platform agnostic: Apache Wayang, in turn, transforms a logical plan into a set of execution (physical) operators to be executed by specific underlying processing platforms, such as Apache Flink and Apache Spark. In this post, we elaborate on the cross-platform optimizer that comes with Apache Wayang, which decides how to generate execution plans. When a user specifies an application on the so called Wayang plan, 

Integrating new plugins into Blossom (Part 1)

Databloom Blossom is a federated data lakehouse analytics framework that provides a solution for federated learning. Blossom supports the entire generation of distributed ML pipelines from data extraction to consumption, covering access to data sources, data preparation, feature engineering, and Federated Learning. The present blog is part of a series that encourages users to create their personalized blossom plugin, which contains custom logical operators and mappings to process these operators on different execution platforms. In addition, to declare conversion channels to transform output data types suitable for processing by any available Platform. In this first part, the present blog will explain several Blossom concepts necessary to implement new features. Please, consider that Blossom is a cross-platform processing framework, and the computations are not trivial. In the first part of this tutorial, we will go deep into the abstractions that support the integration of technologi

The 3 pillars for effectively and efficiently leading remote work teams

We live in a time when distances are no longer a problem. Many communication and meeting management tools allow us to interact with people from very diverse geographical locations. But, although these tools are of great help to us to contact our friends and family, but the question that one might wonder is: are they able to coordinate, manage and monitor our work teams in the same way? Maybe not by themselves. At Databloom, we have taken on this challenge from a perspective of cultural plurality and the richness that comes from having developers and engineers with diverse skills and knowledge but with a common purpose. Achieve organizational goals in time, quality, and innovation. The fundaments to achieve the challenges are diverse, but at least the three next fundamental pillars to manage successful teams should be highlighted. Pillar 1. Form teams with high standards of quality and commitment: This is undoubtedly the first step for good management. The human groups, particularly the

Machine Learning for X and X for Machine Learning

After the recent advances in Artificial Intelligence (AI), and especially in Machine Learning (ML) and Deep Learning (DL)*, various other computer science fields have gone into a race of "blending" their existing methods with ML/DL. There are two directions to enable such a blend: Either using ML to advance the field, or using the methods developed in the field to improve ML. A commonly used slogan when combining ML with a computer science field is: ML for X or X for ML , where X can be, for instance, any of {databases, systems, reasoning, semantic web}.  In this blog post, we focus on cases where X = big data management . We have already observed works on ML for data management and on data management for ML since several years now. Both directions have a great impact in both academia, with dedicated new conferences popping up, as well as in the industry, with several companies working on either improving their technology with ML or providing scalable and efficient solutio