Apache Kafka

Apache Kafka is an open source, distributed publish-subscribe messaging system,
and its main characters include:

-Persistent Messaging
-High throughput
-Distributed: support messages partitioning over Kafka Servers and 
distributing consumption over a cluster of consumer machines 
while maintaining per-partition ordering machines
        Producer(frontend, services, proxies, adapters, other producers)
           |
         Kafka ---------> Zookeeper
           |
        Consumer(realtime, NoSQL, Hadoop, Warehouses)

Install Kafka

**Download Kafka**
[root@localhost opt] https://www.apache.org/dyn/closer.cgi/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz
**Extract the Kafka downloaded package**
[root@localhost opt]# tar xzf kafka-0.8.0-beta1-src.tgz

Setup Kafka Cluster

ZooKeeper
               ------------------------------
               |                            |
               |                            |
             Producer --> Kafka Broker --> Consumer

Zookeeper
Kafka provides the default and simple ZooKeeper configuration file used for lauching a single local ZooKeeper instance. Zookeeper allows distributed processes coordinating with each other through a shared hierarchical name space of data registers

**Start Zookeeper Server**
 [root@localhost kafka-0.8]# bin/zookeeper-server-start.sh config/zookeeper.properties
**Data directory where the zookeeper snapshot is stored**
  ## dataDir=/tmp/zookeeper
**The port listening for client request**
  ## clientPort=2181
**Consumer group id
  ## groupid=test-consumer-group
**zookeeper connection string
  ## zookeeper.connect=localhost:2181
**Start the Kafka broker**
[root@localhost kafka-0.8]# bin/kafka-server-start.sh config/server.properties
**Create a Kafka Topic**
[root@localhost kafka-0.8]# bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 1 --partition 1 --topic kafkatopic
**Start a producer for sending messages**
[root@localhost kafka-0.8]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafkatopic
**Start a consumer for consuming messages**
[root@localhost kafka-0.8]# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafkatopic --from-beginning
**Create a Kafka topic**
[root@localhost kafka-0.8]# bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 2 --partition 2 --topic othertopic
**Start a producer for sending messages**
[root@localhost kafka-0.8]# bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093 --topic othertopic
**Start a consumer for consuming messages**
[root@localhost kafka-0.8]# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic othertopic --from-beginning

Kafka Design Fundamentals

  • Kafka Design Fundamentals
  • Message Compression in Kafka
  • Cluster mirroring in Kafka
  • Replication in Kafka

The fundamental backbone of Kafka is message caching and storing it on the file system. In Kafka, data is immediately written into the OS kernel, Caching and flusing of data is configurable.
By default, producers and consumers work on the traditional push-and-pull model, where producers push the message to a Kafka broker and consumers pull the message from the broker

Kafka Partitioning Strategy
The decision about how the message is partitioned is taken by the producer, and the broker stores the messages in the same order as they arrive. The number of partitions can be configured for each topic within Kafka broker.

Kafka Replications
-Synchronous Replication: a producer first identifies the lead replica from ZooKeeper and publishes the message. As soon as the message is published, it is written to the log of the lead replica and all the followers of the lead start pulling the message, and by using a single channel, the order of messages is ensured. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. Once replications are complete and all expected acknowledgements are received, the lead replica sends an acknowledgement to the producer.
-Asynchronous replication: The only difference in this mode is that as soon as a lead replica writes the message to its local log, it sends the acknowledgement to the message client and does not wait for the acknowledgements from followers replicas.

Writing Producers

import classes:
    import kafka.javaapi.producer.Producer;
    import kafka.producer.KeyedMessage;
    import kafka.producer.ProducerConfig;
    
    //Define properties:
    Properties props = new Properties();
    //specify the broker<node:port> that the producer needs to connect to
    props.put("metadata.broker.list", "localhost:9092"),
    //specify the serializer class that needs to be used while preparing the message for transmission from the producer to the broker
    props.put("serializer.class", "kafka.serialier.StringEncoder")
    props.put("request.required.acks", "1");
    ProducerConfig config = new ProducerConfig(props);
    Producer<Integer, String> producer = new Producer<Integer, String>(config);
    
    //Build the message and send
    String messageStr = new String("Hello from Java Producer");
    KeyedMessage<Integer, String> data = new KeyedMessage<Integer, String>(topic, messageStr);
    producer.send(data);

Implement the Partitioner Class

packahe test.kafka;
import kafka.producer.Partitoner;
public class SimplePartitioner implements Partitoner<Integer> {
    public int partition(Integer key, int numPartitions) {
        int partition = 0;
        int iKey = key;
        if(iKey > 0) {
            partition = iKey % numPartitions;
        }
        return partition;
    }
}

Building the message and send

package test.kafka;

import java.util.Properties;
import java.util.Random;
import kafka.javaapi.producer.Producer;
import kafka.producer.ProducerConfig;

public MultiBrokerProducer(){
     props.put("metadata.broker.list","localhost:9092,localhost:9093");
     props.put("serializer.class","kafka.serializer.StringEncoder");
     props.put("partitioner.class", "test.kafka.SimplePartitioner");
     props.put("request.required.acks", "1");
     ProducerConfig config = new ProducerConfig(props);
     producer = new Producer<Integer, String>(config);
}

public static void main(String[] args) {
    MultiBrokerProducer sp = new MultiBrokerProducer();
    Random rnd = new Random();
    String topic = (String) args[0];
    for (long messCount = 0; messCount < 10; messCount++) {
        Integer key = rnd.nextInt(255);
        String msg = "This message is for key - " + key;
        KeyedMessage<Integer, String> data1 = new
        KeyedMessage<Integer, String>(topic, key, msg);
        producer.send(data1);
    }
    producer.close();
   }
}

相关推荐