# Proactive Communication

The inter-site communication is a major performance bottleneck in the decentralized system. As the interconnection bandwidth between two sites can be multiple orders of magnitude lower than that in a centralized system, reducing communication overhead is the key to enabling the high performance of Yotta.

Using the pipeline parallelism, Yotta overlaps GPU computation with the communication, hiding the communication overhead, shown in the following figure. It's an example of using the traditional pipeline parallelism to run a transformer block in an LLM model on two sites (Site A and Site B). The communication between the two sites can be partially overlapped with the GPU computation on the two sites. However, when the GPU computation on a site is small, the communication overhead cannot be effectively hidden. To address this problem, Yotta uses a technique called proactive communication.

<figure><img src="/files/mYKHQFQLXmaRySULTTLv" alt=""><figcaption></figcaption></figure>

This technique decomposes the original computation and communication operations into finer-grained ones, and transfers each data shard (for communication) whenever it is ready instead of waiting until all of the data shards are ready (the data can be either operands or computation results). The following figure depicts the new parallelism of the pipeline with the computation decomposed into two fine-grained ones to run a transformer block in an LLM model on two sites (Site A and Site B).

<figure><img src="/files/Iem4Hci6u5kFz4JSzYHF" alt=""><figcaption></figcaption></figure>

The result $$Z\_{1}$$ is transferred when it is ready. The transfer of $$Z\_{1}$$ overlaps with the generation of $$Z\_{2}$$, shown below. We can easily see that using the proactive communication leads to a reduction of the execution time because the data transfer can be overlapped with the computation. In this example, transferring $$Z\_{1}$$ overlaps with the computation of generating $$Z\_{2}$$.

<figure><img src="/files/ZDexU3ABHf6Hk22KfelQ" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.yottalabs.ai/technology/proactive-communication.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
