Posted in: Bigdata, Other

使用CDH示例程序进行字数统计

Wordcount程序是Hadoop上的经典“HelloWorld”程序。CDH系统自带了wordcount程序来检测部署的成功与否。

# 解压提前准备好的莎士比亚全集
[sujx@elephant ~]$ gzip -d shakespeare.txt.gz

# 上传至hadoop文件系统
[sujx@elephant ~]$ hdfs dfs -mkdir /user/sujx/input
[sujx@elephant ~]$ hdfs dfs -put shakespeare.txt /user/sujx/input

# 查看有哪些测试程序可用
[sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.    grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.       pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.   randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

# 执行mapreduce运算,output文件夹会自动建立
[sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/sujx/input/shakespeare.txt /user/sujx/output/

# 查看输出结果
[sujx@elephant ~]$ hdfs dfs -ls /user/sujx/output
Found 4 items
-rw-r--r--   3 sujx supergroup          0 2020-03-09 02:47 /user/sujx/output/_SUCCESS      -rw-r--r--   3 sujx supergroup     238211 2020-03-09 02:47 /user/sujx/output/part-r-00000  -rw-r--r--   3 sujx supergroup     236617 2020-03-09 02:47 /user/sujx/output/part-r-00001  -rw-r--r--   3 sujx supergroup     238668 2020-03-09 02:47 /user/sujx/output/part-r-00002 

# 查看输出内容
[sujx@elephant ~]$ hdfs dfs -tail /user/sujx/output/part-r-00000
.       3
writhled        1
writing,        4
writings.       1
writs   1
written,        3
wrong   112
wrong'd-        1
wrong-should    1
wrong.  39
wrong:  1
wronged 11
wronged.        3
wronger,        1
wronger;        1
wrongfully?     1
wrongs  40
wrongs, 9
wrongs; 9
wrote?  1
wrought,        4
…………
Posted in: Linux, Other, Something, System

Docker的快速练习

建立一个三节点的网络环境来进行docker的操作练习,其中master节点存储私有仓库镜像文件。

节点IP用途
master192.168.174.181管理节点和私有仓库
node1192.168.174.180节点一
node1192.168.174.180节点二

管理节点安装

docker程序的安装

[root@master ~]# yum install -y docker
[root@master ~]# fdisk -l
Disk /dev/sda: 10.7 GB, 10737418240 bytes, 20971520 sectors
Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors

# 这里我们将使用新增的/dev/sdb磁盘作为docker的存储

[root@master ~]# vim /etc/sysconfig/docker-storage-setup #使用:r 打开docker磁盘驱动模板文件/usr/share/container-storage-setup/container-storage-setup
#STORAGE_DRIVER=overlay2   #注释掉overlay2磁盘驱动
STORAGE_DRIVER=devicemapper #使用默认磁盘驱动
EXTRA_STORAGE_OPTIONS="--storage-opt dm.fs=xfs" #格式化为xfs
DEVS=/dev/sdb   #使用dev/sdb磁盘
CONTAINER_THINPOOL=container-thinpool   #thinpool的容器存储方式,也是lv的名字
VG=docker_VG    #存储的vg名称

[root@master ~]# container-storage-setup 
INFO: Writing zeros to first 4MB of device /dev/sdb
4+0 records in
4+0 records out
4194304 bytes (4.2 MB) copied, 0.00600853 s, 698 MB/s
INFO: Device node /dev/sdb1 exists.
  Physical volume "/dev/sdb1" successfully created.
  Volume group "docker_VG" successfully created
  Rounding up size to full physical extent 12.00 MiB
  Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
  Logical volume "container-thinpool" created.
  Logical volume docker_VG/container-thinpool changed.
[root@master ~]# vgs
  VG        #PV #LV #SN Attr   VSize   VFree
  centos      1   2   0 wz--n-  <9.00g    0 
  docker_VG   1   1   0 wz--n- <10.00g 6.00g
[root@master ~]# lvs
  LV                 VG        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root               centos    -wi-ao---- <8.00g                                                    
  swap               centos    -wi-ao----  1.00g                                                    
  container-thinpool docker_VG twi-a-t---  3.97g             0.00   10.29   
# 创建成功,启动docker服务
[root@master ~]# systemctl enable docker --now

系统配置

通过配置多个加速器实现不同网络环境下的快速部署。

# Docker加速器配置
cat>/etc/docker/daemon.json<<EOF
{
  "registry-mirrors": ["https://dockerhub.azk8s.cn","http://f1361db2.m.daocloud.io","https://d1a0f2854f4b44c2a3b3af4f5425db1a.mirror.swr.myhuaweicloud.com","https://hub-mirror.c.163.com","https://registry.docker-cn.com"],
  "insecure-registries": ["registry:5000"]
}
EOF
[root@master ~]# systemctl daemon-reload && systemctl restart docker

# 关闭防火墙
[root@master ~]# systemctl disable firewalld.service --now

# 关闭selinux
setenforce 0

Docker环境准备

镜像准备

# 拉取实验镜像
[root@master ~]# docker pull docker.io/centos
Using default tag: latest
Trying to pull repository docker.io/library/centos ... 
latest: Pulling from docker.io/library/centos
8a29a15cefae: Pull complete 
Digest: sha256:fe8d824220415eed5477b63addf40fb06c3b049404242b31982106ac204f6700
Status: Downloaded newer image for docker.io/centos:latest

# 镜像列表
[root@master ~]# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
docker.io/nginx      latest              2073e0bcb60e        2 days ago          127 MB
docker.io/httpd      latest              c562eeace183        3 days ago          165 MB
docker.io/php        latest              7dc31b4f3403        3 days ago          405 MB
docker.io/mysql      latest              791b6e40940c        3 days ago          465 MB
docker.io/debian     latest              a8797652cfd9        3 days ago          114 MB
docker.io/registry   latest              708bc6af7e5e        12 days ago         25.7 MB
docker.io/centos     latest              470671670cac        2 weeks ago         237 MB
docker.io/mysql      5.5                 d404d78aa797        9 months ago        205 MB
docker.io/centos     6.10                48650444e419        10 months ago       194 MB

# 删除镜像
[root@master ~]# docker rmi docker.io/mysql
Untagged: docker.io/mysql:latest
Untagged: docker.io/mysql@sha256:6d0741319b6a2ae22c384a97f4bbee411b01e75f6284af0cce339fee83d7e314
Deleted: sha256:791b6e40940cd550af522eb4ffe995226798204504fe495743445b900e417a51
Deleted: sha256:a3c92ad464abbee6d08856efd404df8c43e9d991b9253bed8281e452d8021dfa
Deleted: sha256:3eb0379ecdc39f86da90c491765187e40dda381e57f319dd21afd0b1e2c40158
Deleted: sha256:fe814f19102e93fd9e2c12b4c864d110bbe4884ff4c5c34e2e1d96341ec17778
Deleted: sha256:f973fa93f201d11a3a6ccf900614fa6e25f4cf899da69f163510560263642d0e
Deleted: sha256:db53286cf6b77826bd35675098bfa76863ace9a04b4e28f4d8340d53c23821e8
Deleted: sha256:477e19600de637164faac8d2e39d4552fac8fbf3c4a9f29efe34072c0fd156e9
Deleted: sha256:2c109aa38ef35164d5adcabac202bde92420867a5839deb75f5ce034aacc00b4
Deleted: sha256:0de337169373e6779cb3ca09485e95fedd4ac98abee19b839cd46e294a64f363
Deleted: sha256:73f1cb0f35d3377b825488e38241d0e12c63e7d30946362402dd8ab2e9467d81
Deleted: sha256:5807022bbb80a63e78831d4dff1ac497a450287ce43fbb0381623b19f5d45c8a
Deleted: sha256:1aaef8d601e09d40fc66f3531268e837f4ae3eedf84f94359fa33177f0be4c6e
Deleted: sha256:e0db3ba0aaea8a01d5cb000aeb449c153be0a47a369cafc4e912b85fb18192cf

# 镜像导出
[root@master ~]# docker save docker.io/centos:6.10 > /tmp/sujxcentos.tar

# 镜像导入
[root@node2 ~]# docker load < /root/sujxcentos.tar
[root@node2 ~]# docker images
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
docker.io/centos                  6.10                48650444e419        10 months ago       194 MB

# 检索镜像
[root@master ~]# docker search oracle
INDEX       NAME                                            DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
docker.io   docker.io/oraclelinux                           Official Docker builds of Oracle Linux.         629       [OK]       
docker.io   docker.io/jaspeen/oracle-11g                    Docker image for Oracle 11g database            144                  [OK]
docker.io   docker.io/oracleinanutshell/oracle-xe-11g                                                       82                   
docker.io   docker.io/oracle/openjdk                        Docker images containing OpenJDK Oracle Linux   60                   [OK]
docker.io   docker.io/oracle/graalvm-ce                     GraalVM Community Edition Official Image        56                   [OK]
docker.io   docker.io/absolutapps/oracle-12c-ee             Oracle 12c EE image with web management co...   38                   
docker.io   docker.io/araczkowski/oracle-apex-ords          Oracle Express Edition 11g Release 2 on Ub...   27                   [OK]
docker.io   docker.io/bofm/oracle12c                        Docker image for Oracle Database                23                   [OK]
docker.io   docker.io/oracle/nosql                          Oracle NoSQL on a Docker Image with Oracle...   22                   [OK]
docker.io   docker.io/datagrip/oracle                       Oracle 11.2 & 12.1.0.2-se2 & 11.2.0.2-xe        14                   [OK]
docker.io   docker.io/oracle/weblogic-kubernetes-operator   Docker images containing the Oracle WebLog...   10                   
docker.io   docker.io/openweb/oracle-tomcat                 A fork off of Official tomcat image with O...   8                    [OK]
docker.io   docker.io/truevoly/oracle-12c                   Copy of sath89/oracle-12c image (https://g...   8                    
docker.io   docker.io/18fgsa/oracle-client                  Hosted version of the Oracle Container Ima...   2                    

建立本地私有仓库

[root@master ~]# docker run -d -p 5000:5000 --name=registry --restart=always docker.io/registry
345e05f68235687b47d2917fd0a86620ac2d6b40fbe7647063b817e0d690cf6b

# 打标
[root@master ~]# docker tag docker.io/mysql:5.5 registry:5000/sujx_images/mysql:5.5

#上传
[root@master ~]# docker push registry:5000/sujx_images/mysql:5.5
The push refers to a repository [registry:5000/sujx_images/mysql]
c9f3545812c8: Pushed 
f49eaacc87a0: Pushed 
a9c5a24e943f: Pushed 
90b4ae8695b5: Pushed 
4054cc666efd: Pushed 
f83622e85376: Pushed 
af84b063c827: Pushed 
ddc265b679cf: Pushed 
647245c554e4: Pushed 
432b5f62e513: Pushed 
6270adb5794c: Pushed 
5.5: digest: sha256:c9c671d0c959183154313d6830d46f9a00d5937f97415c15ebd3c6844f6f1467 size: 2619

# 本地其他客户端拉取
[root@node2 ~]# docker pull registry:5000/sujx_images/mysql:5.5
Trying to pull repository registry:5000/sujx_images/mysql ... 
5.5: Pulling from registry:5000/sujx_images/mysql
743f2d6c1f65: Pull complete 
3f0c413ee255: Pull complete 
aef1ef8f1aac: Pull complete 
f9ee573e34cb: Pull complete 
3f237e01f153: Pull complete 
03da1e065b16: Pull complete 
04087a801070: Pull complete 
7efd5395ab31: Pull complete 
1b5cc03aaac8: Pull complete 
2b7adaec9998: Pull complete 
385b8f96a9ba: Pull complete 
Digest: sha256:c9c671d0c959183154313d6830d46f9a00d5937f97415c15ebd3c6844f6f1467
Status: Downloaded newer image for registry:5000/sujx_images/mysql:5.5

# 本地其他节点上传镜像
[root@node2 ~]# docker tag docker.io/centos:6.10 registry:5000/sujx_images/centos:6.10
[root@node2 ~]# docker push registry:5000/sujx_images/centos:6.10
The push refers to a repository [registry:5000/sujx_images/centos]
8088cb617267: Pushed 
6.10: digest: sha256:7e53308393264c34359fbdf6d15d5c8c4985b8c2a58ee0ad4f7d5cc2e3c1577a size: 529

Posted in: Other

How to expain the Cloud to Non-techie Friends

Your friend has no technical backgroud, yet she is curious about technology. Recently she asked you what Cloud Computing is. What would you tell her? How would you explain its concept and various models to someone with little or no technical background? Here's a suggestion. Compare Cloud Computing to everyone's favorite food, pizza!

That's exactly what HP does. In a Youtube video titled Cloud pizza, HP Norway's Technology Director, Stig Alstedt, explains Cloud Computing by comparing it with pizza. Let's see how.

Homemade Pizza = Traditional IT

Back in the old days, if you wanted pizza for dinner, you had to make your own. In that case, you buy all the ingredients, knead the dough, make the curst, chop the vegetables, and bake the pizza. You also need proper cookware. The process requires expertise, but you get exactly what you want, how you want it, when you want it. Homemade Pizza is equal to building your own IT infrastcture with your own resources.

Frozen Pizza = Private Cloud

Nowadays supermarkets offer customers a variety of frozen pizzas, Pizza manufacturers do almost all the work. You just select one from the frozen pizza aisle, bring it home, and bake it in your own oven. It's convenient, but the flavors and prices are fixed. Frozen Pizza is similar to Priviate Cloud.

Delivered Pizza = Managed Cloud

Delivered pizza is a bit more convenient than frozen pizza. You order what you want and how you want, then the piza is delivered to your doorsetp. No baking is necessary. However, menu items are limited and you still need your own dishes. Deliverd Pizza is close to Managed CLoud.

Restaurant Pizza = Public Cloud

Restaurant pizza requires the least expertise because the restauarnt provides you with everything. You simply show up, order, eat and pay. You get one invoice at the end of your meal. However, other customers alson dine in, so the service can be slow when the place is crowded. Furthermore, everyone orders from the same menu. It means there is not much room for individual customization. Restaurant Pizza is comparable to Public Cloud, except that Public Cloud is often the most inexpensive option while restaurant pizza is not.

Your Own Topping = Hybrid Cloud

Many pizza delivery places and restaurants offer an option to choose your own topping. Hybrid Cloud is like choosing your own toppings to make your favorite pizza.


如何向没有技术背景的朋友解释云计算?