Posts

Showing posts from June, 2022

The Data Mesh - should you adapt?

Image
In actuality, not every firm may be a good fit for the implementation of a Data Mesh.  Larger enterprises that experience uncertainty and change in their operations and environment are the primary target audience for Data Mesh.  A Data Mesh is definitely an unnecessary expense if your organization's data requirements are modest and remain constant over time. What is a "Data Mesh"? As it focuses on delivering useful and safe data products, Data Mesh is a strategic approach to modern data management and a strategy to support an organization's journey toward digital transformation. Data Mesh's major goal is to advance beyond the established centralized data management techniques of using data warehouses and data lakes. By giving data producers and data consumers the ability to access and handle data without having to go through the hassle of involving the data lake or data warehouse team, Data Mesh highlights the concept of organizational agility. Data Mesh's dec

Apache Wayang: More than a Big Data Abstraction

Image
Recently, Paul King (V.P. and Chair of Groovy PMC) highlighted the big data abstraction [ 1 ] that Apache Wayang [ 2 ] provides. He mainly showed that users specify an application in a logical plan (a Wayang Plan) that is platform agnostic: Apache Wayang, in turn, transforms a logical plan into a set of execution (physical) operators to be executed by specific underlying processing platforms, such as Apache Flink and Apache Spark. In this post, we elaborate on the cross-platform optimizer that comes with Apache Wayang, which decides how to generate execution plans. When a user specifies an application on the so called Wayang plan, 

Integrating new plugins into Blossom (Part 1)

Image
Databloom Blossom is a federated data lakehouse analytics framework that provides a solution for federated learning. Blossom supports the entire generation of distributed ML pipelines from data extraction to consumption, covering access to data sources, data preparation, feature engineering, and Federated Learning. The present blog is part of a series that encourages users to create their personalized blossom plugin, which contains custom logical operators and mappings to process these operators on different execution platforms. In addition, to declare conversion channels to transform output data types suitable for processing by any available Platform. In this first part, the present blog will explain several Blossom concepts necessary to implement new features. Please, consider that Blossom is a cross-platform processing framework, and the computations are not trivial. In the first part of this tutorial, we will go deep into the abstractions that support the integration of technologi

The 3 pillars for effectively and efficiently leading remote work teams

We live in a time when distances are no longer a problem. Many communication and meeting management tools allow us to interact with people from very diverse geographical locations. But, although these tools are of great help to us to contact our friends and family, but the question that one might wonder is: are they able to coordinate, manage and monitor our work teams in the same way? Maybe not by themselves. At Databloom, we have taken on this challenge from a perspective of cultural plurality and the richness that comes from having developers and engineers with diverse skills and knowledge but with a common purpose. Achieve organizational goals in time, quality, and innovation. The fundaments to achieve the challenges are diverse, but at least the three next fundamental pillars to manage successful teams should be highlighted. Pillar 1. Form teams with high standards of quality and commitment: This is undoubtedly the first step for good management. The human groups, particularly the

Machine Learning for X and X for Machine Learning

Image
After the recent advances in Artificial Intelligence (AI), and especially in Machine Learning (ML) and Deep Learning (DL)*, various other computer science fields have gone into a race of "blending" their existing methods with ML/DL. There are two directions to enable such a blend: Either using ML to advance the field, or using the methods developed in the field to improve ML. A commonly used slogan when combining ML with a computer science field is: ML for X or X for ML , where X can be, for instance, any of {databases, systems, reasoning, semantic web}.  In this blog post, we focus on cases where X = big data management . We have already observed works on ML for data management and on data management for ML since several years now. Both directions have a great impact in both academia, with dedicated new conferences popping up, as well as in the industry, with several companies working on either improving their technology with ML or providing scalable and efficient solutio