24 January 2022

Combined Federated Data Services with Blossom and Flower


In a former post in November, Jorge wrote about what FL is and why top data scientists are only using FL to train and productionize ML and AI models; to bolster the rising use of Federated Learning we also see a high adoption in China [1].

When it comes to Federated Learning frameworks we typically find two leading open source projects - Apache Wayang [2] (maintained by databloom) and Flower [3] (maintained by Adap). And at the first view both frameworks seem to do the same. But, as usual, the 2nd view tells another story.

How does Flower differ from Wayang?

Flower is a federated learning system, written in Python and supports a large number of training and AI frameworks. The beauty of Flower is the strategy concept [4]; the data scientist can define which and how a dedicated framework is used. Flower delivers the model to the desired framework and watches the execution, gets the calculations back and starts the next cycle. That makes Federated Learning in Python easy, but also limits the use at the same time to platforms supported by Python. 
Flower has, as far as I could see, no data query optimizer; an optimizer understands the code and splits the model into smaller pieces to use multiple frameworks at the same time (model parallelism). 

And here we have the ideal touchpoint between Blossom and Flower.

Combine Blossom and Flower and build a Federated Data Science NLP Stack

How to build a chatbot system, which serves multiple functions and customers across the world, like in a bank? A chatbot stack typically uses NLP combined with multiple data source to provide a natural communication between humans and machines. The demand of Machine-Human interaction and human based communication has considerably increased and the forecasts of Gartner are a testament to it.

"Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data" (Wikipedia).

The typical infrastructure we have to take into account is like a hyper grown forest: We have multiple data sources, typically reaching from data warehouses over RDBMS systems, pretty closed data sources like financial transaction stores, customer bank data, credit scores etc. The sources are mostly not the most modern, sometimes don’t even have connection points - like DWH systems, which are typically run with 90+% utilization.

Here comes Blossom into the game. With Blossom we can connect to each of those systems (if desired and needed), and we can use already available data processing frameworks and engines like Spark, Kafka or Flink (and their commercial counterparts) without blowing up the engineering team.

Now the fun part with Flower: we plug Flower to Blossom, and voilĂ  - problem solved! The architecture could look like:




To connect Blossom with Flower we just need a few lines of code:

import blossom as bls

import flwr as fl

import tensorflow as tf

context = bls.context(env="federated")

transactions = context.read("url to transaction") \

                      .filter( transactionFilter )

input_flower = context.read("url to customer table") \

       .filter( customerFilter ) \

       .join (transactions ) \

       .map ( convertToVector ) \

       .toNumpy()

context.runFlower(

        input_flower, \

        server=fl.server.start_server("0.0.0.0:8080", config={"num_rounds": 3}) \

        client=fl.client.start_numpy_client("0.0.0.0:8080", client=FlowerImplementedClient())

        flowerEngine=tf

)


We call this stack Combined NLP Federated Data Services. Flower takes care of the chatbot communication, the ML model and the execution with TF (Tensorflow) or any other supported ML framework, delivers the outcome to Blossom. Blossom now takes care of enriching the model with information from deeper backend systems and gives the output back to Flower, and Flower takes care of the next iteration with TensorFlow (TF). 
This architecture is the backbone for an extensive NLP system using the best tools available for Federated Learning. This stack is future proof, both frameworks are built with pluggable extension support from the beginning. That means: whatever comes in the future, that stack can handle it. Even quantum computing AI training will be easily adoptable as a plugin.

Conclusion:
To build cutting edge AI and machine learning / NLP stacks is not an area only the biggest data companies in the world can handle. With this approach we guarantee data sustainability, unmatched data privacy and enable digital transformation on a completely new level.

[1] https://cacm.acm.org/magazines/2020/12/248796-federated-learning-for-privacy-preserving-ai/fulltext
[2] https://wayang.apache.org/documentation.html
[3] https://github.com/adap/flower
[4] https://flower.dev/docs/implementing-strategies.html

most read