Kafka

  • Apache Kafka Is an open source distributed event streaming platform used by thousands of companies for high-performance data pipelines, Streaming analytics, data integration, and mission critical applications.
  • REST API’s are slow and can be used for streaming.
  • Database throughput for IOPS is low So it can’t handle lot of streaming data, which is continuously updating.
  • Kafka follows Event driven architecture And follows pub-sub model.
  • Kafka is used in live tracking, Streaming due to its high throughput.
  • By default, consumer consumes data From All partitions, If a new consumer comes, then Kafka will divide the load.
  • A Consumer Can consume data from more than one partitions.
  • A partition cannot have more than one consumers. If a new consumer arrives in such a case, then Kafka will auto balance Partition that is, it will shrink partition from previous consumers and will assign them to new consumers.
  • So if a topic has four partition, it cannot have no more than four consumers
    • To overcome above problem that is to add a fifth consumer to topic, we need to create consumer groups.
  • A partition can also have more than one consumer by having a consumer from consumer group besides being consumed by individual consumer.
  • A zookeeper manages, different operations in Kafka.
  • A consumer must be added to a consumer group once it is created.
  • Zookeeper
    • A zookeeper Is a Coordination service That allows us to store data in a key value pair in a reliable way.
    • Solves the problem of handling master failure.
    • Monitors the health of Partitions
    • Internally zookeeper maintains hash map, which is the two types
      • Ephemeral HashMap
        • Value cannot be changed by a non-owner machine.
        • Key value is added by owner machine only.
        • Machine has to send heartbeat every five or 10 seconds. If this fails, its key value is deleted.
        • Other machines set up watch over the key value pair changes.
        • Once the master dies Slaves are Notified, one of them becomes the master and controls the HashMap again.
      • Regular HashMap
    • Suzuki runs on major number of machines. One of them is a leader through which writes are performed.
    • To solve split brain problem, we always have odd number of machines in zookeeper so that there is always a majority. For example, in case of three machines, two will always be in majority.
    • Leader selection is not time sensitive.
    • Writes are considered successful If majority numbers of machine acknowledge.
    • Reads can go to any machine, but majority should acknowledge. They are master independent.
    • Generally, zookeeper clusters start with seven machines.
    • Zookeeper tracks who is master and who are slaves.
    • Zookeeper Manages list of brokers For our cluster.
    • Zookeeper helps in performing leader election for partition
    • Zookeeper send notifications to Kafka In case of changes, example, new topic, broker dies, broker comes up, Delete topics, et cetera.
    • Kafka 2.x Can’t work without zookeeper
    • Kafka 3.x Can work without zookeeper(KIP-500)
      • Uses Kafka Raft instead.
    • Kafka 4.x will not have zookeeper.
    • Zookëper by design operates on odd number of servers(1,3,5,7)
    • Zookëper has a lead writer. The rest of servers or followers reads.
    • Zookeeper does not set Customer offset with Kafka>0.10
    • Conductor and Zookeeper 
      • Conductor uses zookeeper for administrative operation operations on older clusters
      • We detect these operations automatically.
      • Mainly reassigning partition in Kafka.
        • Some of these admin operations have been added as Kafka API’s
Example

No comments:

Post a Comment

Spring Boot

What is circular/cyclic dependency in spring boot? When two services are interdependent on each other, that is to start one service, we requ...