- Apache Kafka Is an open source distributed event streaming platform used by thousands of companies for high-performance data pipelines, Streaming analytics, data integration, and mission critical applications.
- REST API’s are slow and can be used for streaming.
- Database throughput for IOPS is low So it can’t handle lot of streaming data, which is continuously updating.
- Kafka follows Event driven architecture And follows pub-sub model.
- Kafka is used in live tracking, Streaming due to its high throughput.
- By default, consumer consumes data From All partitions, If a new consumer comes, then Kafka will divide the load.
- A Consumer Can consume data from more than one partitions.
- A partition cannot have more than one consumers. If a new consumer arrives in such a case, then Kafka will auto balance Partition that is, it will shrink partition from previous consumers and will assign them to new consumers.
- So if a topic has four partition, it cannot have no more than four consumers
- To overcome above problem that is to add a fifth consumer to topic, we need to create consumer groups.
- A partition can also have more than one consumer by having a consumer from consumer group besides being consumed by individual consumer.
- A zookeeper manages, different operations in Kafka.
- A consumer must be added to a consumer group once it is created.
- Zookeeper
- A zookeeper Is a Coordination service That allows us to store data in a key value pair in a reliable way.
- Solves the problem of handling master failure.
- Monitors the health of Partitions
- Internally zookeeper maintains hash map, which is the two types
- Ephemeral HashMap
- Value cannot be changed by a non-owner machine.
- Key value is added by owner machine only.
- Machine has to send heartbeat every five or 10 seconds. If this fails, its key value is deleted.
- Other machines set up watch over the key value pair changes.
- Once the master dies Slaves are Notified, one of them becomes the master and controls the HashMap again.
- Regular HashMap
- Suzuki runs on major number of machines. One of them is a leader through which writes are performed.
- To solve split brain problem, we always have odd number of machines in zookeeper so that there is always a majority. For example, in case of three machines, two will always be in majority.
- Leader selection is not time sensitive.
- Writes are considered successful If majority numbers of machine acknowledge.
- Reads can go to any machine, but majority should acknowledge. They are master independent.
- Generally, zookeeper clusters start with seven machines.
- Zookeeper tracks who is master and who are slaves.
- Zookeeper Manages list of brokers For our cluster.
- Zookeeper helps in performing leader election for partition
- Zookeeper send notifications to Kafka In case of changes, example, new topic, broker dies, broker comes up, Delete topics, et cetera.
- Kafka 2.x Can’t work without zookeeper
- Kafka 3.x Can work without zookeeper(KIP-500)
- Uses Kafka Raft instead.
- Kafka 4.x will not have zookeeper.
- Zookëper by design operates on odd number of servers(1,3,5,7)
- Zookëper has a lead writer. The rest of servers or followers reads.
- Zookeeper does not set Customer offset with Kafka>0.10
- Conductor and Zookeeper
- Conductor uses zookeeper for administrative operation operations on older clusters
- We detect these operations automatically.
- Mainly reassigning partition in Kafka.
- Some of these admin operations have been added as Kafka API’s
Example
- https://github.com/gauravmatta/springmvc/tree/1e8408dcd02e53b54c3cbae3dd1cd20f9874e5f9/kafkacluster
- Producer Example
- Consumer Example
- Check Kafka Service Health
No comments:
Post a Comment