使用CDH示例程序进行字数统计

Wordcount程序是Hadoop上的经典“HelloWorld”程序。CDH系统自带了wordcount程序来检测部署的成功与否。

# 解压提前准备好的莎士比亚全集
[sujx@elephant ~]$ gzip -d shakespeare.txt.gz

# 上传至hadoop文件系统
[sujx@elephant ~]$ hdfs dfs -mkdir /user/sujx/input
[sujx@elephant ~]$ hdfs dfs -put shakespeare.txt /user/sujx/input

# 查看有哪些测试程序可用
[sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.    grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.       pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.   randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

# 执行mapreduce运算,output文件夹会自动建立
[sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/sujx/input/shakespeare.txt /user/sujx/output/

# 查看输出结果
[sujx@elephant ~]$ hdfs dfs -ls /user/sujx/output
Found 4 items
-rw-r--r--   3 sujx supergroup          0 2020-03-09 02:47 /user/sujx/output/_SUCCESS      -rw-r--r--   3 sujx supergroup     238211 2020-03-09 02:47 /user/sujx/output/part-r-00000  -rw-r--r--   3 sujx supergroup     236617 2020-03-09 02:47 /user/sujx/output/part-r-00001  -rw-r--r--   3 sujx supergroup     238668 2020-03-09 02:47 /user/sujx/output/part-r-00002 

# 查看输出内容
[sujx@elephant ~]$ hdfs dfs -tail /user/sujx/output/part-r-00000
.       3
writhled        1
writing,        4
writings.       1
writs   1
written,        3
wrong   112
wrong'd-        1
wrong-should    1
wrong.  39
wrong:  1
wronged 11
wronged.        3
wronger,        1
wronger;        1
wrongfully?     1
wrongs  40
wrongs, 9
wrongs; 9
wrote?  1
wrought,        4
…………

[英文练习]Everything is Already in the Cloud

一切皆在云端

These days you hear a lot about cloud computing. If you have no technical background,the term can be a bit intimidating. It sounds complicated, but most of us are already using the cloud. Though you may not realize it, all your information is most likely in the cloud. How? Let's take a look.


近来,你一定听到了很多关于云计算的讨论。如果你没有技术背景,这些术语可能会让你迷惑不解。这些看起来十分复杂,但我们大多数人已经在使用云计算了。即使你没有认知到这点,你的全部信息已经在云中了。这是怎样一回事情呢?我们一块来学习一下把。

Email

邮件

All web-based email services like Yahoo, Google (Gmail) and Microsoft (Hotmail) are cloud-based.Unless you run your own email server,you are using cloud-based services. That is , all of your contacts and emails are in the cloud.


所有的网页版邮件服务,比如雅虎、谷歌的Gmail、微软的Hotmail都是基于云计算的。除非你自己维护着一套邮件服务器,那样你也是在用基于云的服务。所以,你的所有联系人和邮件都在云上。

Blog & Websites

博客和网站

Blogs and websites such as Medium, Tumblr, Flickr, Instagram, and Pinterest are also cloud-based services. Again, all of your postings and photo are in the cloud.


日志和网页就像Medium、Tumblr、Flickr、Instagram和Pinterest都是基于云计算的服务。又一次,在云上存储着你的所有日志和照片

Socail Networking Sites

社交网络

Facebook, Twitter, LinkedIn, and many other social media sites are hosted in the cloud. Your birthday, educational background, work history, and latest whereabouts are all in the cloud. Ah, your friends' information, too!


Facebook、Twitter、Linkedin和其他社交网站都是托管在云端的。你的生日、教育背景、工作纪录以及日常活动位置都在云端。连你的朋友的信息也是如此。

Mobile App Stores & Apps

应用商店和应用

Major mobile app strores, such as Goolge Play, Apple's App Store, and Windows Marketplace, keep their apps and the account information in the cloud. If you buy apps from these major app stores, your purchase history and account information are in the cloud.


主要的移动应用商店,比如谷歌的Play、苹果的APP Store、微软的Windows Marketplace,都保存程序和账号信息在云端。如果你在这些应用商店买了程序,你的购买记录和账号信息就都在云端。

Gaming

游戏

Cloud gaming, according to an airticle in VentureBeat,could reach a turning point in 2015.Xbox Live, World of Warcraft, Steam and hundreds of game platforms are hosted in the cloud. Now with cloud-gaming capable devices, a massive number of players can interact with each other online. Most of these games store your game saves and comments in the cloud.


根据一篇VentureBeat的报道,云游戏在2015年迎来的拐点。Xbox Live、魔兽争霸、Steam和其他几百个游戏平台都是在云端运行的。随着云端游戏性能的增长,巨量的游戏玩家可以在网上展开互动。大多数的游戏把你的游戏存档和评论都保存在云端。

Productivity Tools

生产力工具

Productivity tools like work processors, spreadsheets, presentaion programs, flowcharting applications, image editors, formula editors, and graph tools are available online as cloud-based services and applications. Goole Docs and Microsofot Office 365 are good examples. If you are using any of these toolgs, either at work or personally, your documents are stored in the cloud.


生产力工具,比如文字处理、电子表格、展板程序、流程软件、图像编辑器、公式编辑器和图形软件都已经是基于云服务了。Google Docs和微软Office365就是很好的例子。如果你使用这些工具,无论是工作用还是私人用,你的文档都是存在云端的。

Online Storage

网盘

Nowadays most people own multiple devices--from laptops to smartphones and tablets--and they want to access their data from anywhere, at any time,from any connected device. Online storage are used for that reason, and Microsoft OneDrive, Apple iCloud Drive, Google Drive, and Dropbox are popular ones. Needless to say, these services store your data in the cloud.


今天大多数人都用多种设备,从笔记本电脑、智能手机和平板电脑。他们想要在任何时间、任何地点、任何连接的设备上访问他们的数据。网盘就是为此而生。微软的OneDrive、苹果的iCloud Drive、谷歌的GoogleDrive和Dropbox都是很流行的网盘。勿需在言,这些服务都把你的数据存放在云端。


sentence:

  1. These days you hear a lot about cloud computing.
  2. If you have no technical background, the term can be a bit intimidating.
  3. It sounds complicated, but more of us is already using the cloud.
  4. How? Let's take a look.
  5. Online Storage.
  6. Through you may not realize it, but your information is most likely.
  7. Nowadays most people own multiple devies.
  8. Form laptops and smartphones and tablets
  9. and they want to access their data from anywhere, at any time,from any connected devices.

Docker的快速练习

建立一个三节点的网络环境来进行docker的操作练习,其中master节点存储私有仓库镜像文件。

节点 IP 用途
master 192.168.174.181 管理节点和私有仓库
node1 192.168.174.180 节点一
node1 192.168.174.180 节点二

管理节点安装

docker程序的安装

[root@master ~]# yum install -y docker
[root@master ~]# fdisk -l
Disk /dev/sda: 10.7 GB, 10737418240 bytes, 20971520 sectors
Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors

# 这里我们将使用新增的/dev/sdb磁盘作为docker的存储

[root@master ~]# vim /etc/sysconfig/docker-storage-setup #使用:r 打开docker磁盘驱动模板文件/usr/share/container-storage-setup/container-storage-setup
#STORAGE_DRIVER=overlay2   #注释掉overlay2磁盘驱动
STORAGE_DRIVER=devicemapper #使用默认磁盘驱动
EXTRA_STORAGE_OPTIONS="--storage-opt dm.fs=xfs" #格式化为xfs
DEVS=/dev/sdb   #使用dev/sdb磁盘
CONTAINER_THINPOOL=container-thinpool   #thinpool的容器存储方式,也是lv的名字
VG=docker_VG    #存储的vg名称

[root@master ~]# container-storage-setup 
INFO: Writing zeros to first 4MB of device /dev/sdb
4+0 records in
4+0 records out
4194304 bytes (4.2 MB) copied, 0.00600853 s, 698 MB/s
INFO: Device node /dev/sdb1 exists.
  Physical volume "/dev/sdb1" successfully created.
  Volume group "docker_VG" successfully created
  Rounding up size to full physical extent 12.00 MiB
  Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
  Logical volume "container-thinpool" created.
  Logical volume docker_VG/container-thinpool changed.
[root@master ~]# vgs
  VG        #PV #LV #SN Attr   VSize   VFree
  centos      1   2   0 wz--n-  <9.00g    0 
  docker_VG   1   1   0 wz--n- <10.00g 6.00g
[root@master ~]# lvs
  LV                 VG        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root               centos    -wi-ao---- <8.00g                                                    
  swap               centos    -wi-ao----  1.00g                                                    
  container-thinpool docker_VG twi-a-t---  3.97g             0.00   10.29   
# 创建成功,启动docker服务
[root@master ~]# systemctl enable docker --now

系统配置

通过配置多个加速器实现不同网络环境下的快速部署。

# Docker加速器配置
cat>/etc/docker/daemon.json<<EOF
{
  "registry-mirrors": ["https://dockerhub.azk8s.cn","http://f1361db2.m.daocloud.io","https://d1a0f2854f4b44c2a3b3af4f5425db1a.mirror.swr.myhuaweicloud.com","https://hub-mirror.c.163.com","https://registry.docker-cn.com"],
  "insecure-registries": ["registry:5000"]
}
EOF
[root@master ~]# systemctl daemon-reload && systemctl restart docker

# 关闭防火墙
[root@master ~]# systemctl disable firewalld.service --now

# 关闭selinux
setenforce 0

Docker环境准备

镜像准备

# 拉取实验镜像
[root@master ~]# docker pull docker.io/centos
Using default tag: latest
Trying to pull repository docker.io/library/centos ... 
latest: Pulling from docker.io/library/centos
8a29a15cefae: Pull complete 
Digest: sha256:fe8d824220415eed5477b63addf40fb06c3b049404242b31982106ac204f6700
Status: Downloaded newer image for docker.io/centos:latest

# 镜像列表
[root@master ~]# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
docker.io/nginx      latest              2073e0bcb60e        2 days ago          127 MB
docker.io/httpd      latest              c562eeace183        3 days ago          165 MB
docker.io/php        latest              7dc31b4f3403        3 days ago          405 MB
docker.io/mysql      latest              791b6e40940c        3 days ago          465 MB
docker.io/debian     latest              a8797652cfd9        3 days ago          114 MB
docker.io/registry   latest              708bc6af7e5e        12 days ago         25.7 MB
docker.io/centos     latest              470671670cac        2 weeks ago         237 MB
docker.io/mysql      5.5                 d404d78aa797        9 months ago        205 MB
docker.io/centos     6.10                48650444e419        10 months ago       194 MB

# 删除镜像
[root@master ~]# docker rmi docker.io/mysql
Untagged: docker.io/mysql:latest
Untagged: docker.io/mysql@sha256:6d0741319b6a2ae22c384a97f4bbee411b01e75f6284af0cce339fee83d7e314
Deleted: sha256:791b6e40940cd550af522eb4ffe995226798204504fe495743445b900e417a51
Deleted: sha256:a3c92ad464abbee6d08856efd404df8c43e9d991b9253bed8281e452d8021dfa
Deleted: sha256:3eb0379ecdc39f86da90c491765187e40dda381e57f319dd21afd0b1e2c40158
Deleted: sha256:fe814f19102e93fd9e2c12b4c864d110bbe4884ff4c5c34e2e1d96341ec17778
Deleted: sha256:f973fa93f201d11a3a6ccf900614fa6e25f4cf899da69f163510560263642d0e
Deleted: sha256:db53286cf6b77826bd35675098bfa76863ace9a04b4e28f4d8340d53c23821e8
Deleted: sha256:477e19600de637164faac8d2e39d4552fac8fbf3c4a9f29efe34072c0fd156e9
Deleted: sha256:2c109aa38ef35164d5adcabac202bde92420867a5839deb75f5ce034aacc00b4
Deleted: sha256:0de337169373e6779cb3ca09485e95fedd4ac98abee19b839cd46e294a64f363
Deleted: sha256:73f1cb0f35d3377b825488e38241d0e12c63e7d30946362402dd8ab2e9467d81
Deleted: sha256:5807022bbb80a63e78831d4dff1ac497a450287ce43fbb0381623b19f5d45c8a
Deleted: sha256:1aaef8d601e09d40fc66f3531268e837f4ae3eedf84f94359fa33177f0be4c6e
Deleted: sha256:e0db3ba0aaea8a01d5cb000aeb449c153be0a47a369cafc4e912b85fb18192cf

# 镜像导出
[root@master ~]# docker save docker.io/centos:6.10 > /tmp/sujxcentos.tar

# 镜像导入
[root@node2 ~]# docker load < /root/sujxcentos.tar
[root@node2 ~]# docker images
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
docker.io/centos                  6.10                48650444e419        10 months ago       194 MB

# 检索镜像
[root@master ~]# docker search oracle
INDEX       NAME                                            DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
docker.io   docker.io/oraclelinux                           Official Docker builds of Oracle Linux.         629       [OK]       
docker.io   docker.io/jaspeen/oracle-11g                    Docker image for Oracle 11g database            144                  [OK]
docker.io   docker.io/oracleinanutshell/oracle-xe-11g                                                       82                   
docker.io   docker.io/oracle/openjdk                        Docker images containing OpenJDK Oracle Linux   60                   [OK]
docker.io   docker.io/oracle/graalvm-ce                     GraalVM Community Edition Official Image        56                   [OK]
docker.io   docker.io/absolutapps/oracle-12c-ee             Oracle 12c EE image with web management co...   38                   
docker.io   docker.io/araczkowski/oracle-apex-ords          Oracle Express Edition 11g Release 2 on Ub...   27                   [OK]
docker.io   docker.io/bofm/oracle12c                        Docker image for Oracle Database                23                   [OK]
docker.io   docker.io/oracle/nosql                          Oracle NoSQL on a Docker Image with Oracle...   22                   [OK]
docker.io   docker.io/datagrip/oracle                       Oracle 11.2 & 12.1.0.2-se2 & 11.2.0.2-xe        14                   [OK]
docker.io   docker.io/oracle/weblogic-kubernetes-operator   Docker images containing the Oracle WebLog...   10                   
docker.io   docker.io/openweb/oracle-tomcat                 A fork off of Official tomcat image with O...   8                    [OK]
docker.io   docker.io/truevoly/oracle-12c                   Copy of sath89/oracle-12c image (https://g...   8                    
docker.io   docker.io/18fgsa/oracle-client                  Hosted version of the Oracle Container Ima...   2                    

建立本地私有仓库

[root@master ~]# docker run -d -p 5000:5000 --name=registry --restart=always docker.io/registry
345e05f68235687b47d2917fd0a86620ac2d6b40fbe7647063b817e0d690cf6b

# 打标
[root@master ~]# docker tag docker.io/mysql:5.5 registry:5000/sujx_images/mysql:5.5

#上传
[root@master ~]# docker push registry:5000/sujx_images/mysql:5.5
The push refers to a repository [registry:5000/sujx_images/mysql]
c9f3545812c8: Pushed 
f49eaacc87a0: Pushed 
a9c5a24e943f: Pushed 
90b4ae8695b5: Pushed 
4054cc666efd: Pushed 
f83622e85376: Pushed 
af84b063c827: Pushed 
ddc265b679cf: Pushed 
647245c554e4: Pushed 
432b5f62e513: Pushed 
6270adb5794c: Pushed 
5.5: digest: sha256:c9c671d0c959183154313d6830d46f9a00d5937f97415c15ebd3c6844f6f1467 size: 2619

# 本地其他客户端拉取
[root@node2 ~]# docker pull registry:5000/sujx_images/mysql:5.5
Trying to pull repository registry:5000/sujx_images/mysql ... 
5.5: Pulling from registry:5000/sujx_images/mysql
743f2d6c1f65: Pull complete 
3f0c413ee255: Pull complete 
aef1ef8f1aac: Pull complete 
f9ee573e34cb: Pull complete 
3f237e01f153: Pull complete 
03da1e065b16: Pull complete 
04087a801070: Pull complete 
7efd5395ab31: Pull complete 
1b5cc03aaac8: Pull complete 
2b7adaec9998: Pull complete 
385b8f96a9ba: Pull complete 
Digest: sha256:c9c671d0c959183154313d6830d46f9a00d5937f97415c15ebd3c6844f6f1467
Status: Downloaded newer image for registry:5000/sujx_images/mysql:5.5

# 本地其他节点上传镜像
[root@node2 ~]# docker tag docker.io/centos:6.10 registry:5000/sujx_images/centos:6.10
[root@node2 ~]# docker push registry:5000/sujx_images/centos:6.10
The push refers to a repository [registry:5000/sujx_images/centos]
8088cb617267: Pushed 
6.10: digest: sha256:7e53308393264c34359fbdf6d15d5c8c4985b8c2a58ee0ad4f7d5cc2e3c1577a size: 529

How to expain the Cloud to Non-techie Friends

Your friend has no technical backgroud, yet she is curious about technology. Recently she asked you what Cloud Computing is. What would you tell her? How would you explain its concept and various models to someone with little or no technical background? Here's a suggestion. Compare Cloud Computing to everyone's favorite food, pizza!

That's exactly what HP does. In a Youtube video titled Cloud pizza, HP Norway's Technology Director, Stig Alstedt, explains Cloud Computing by comparing it with pizza. Let's see how.

Homemade Pizza = Traditional IT

Back in the old days, if you wanted pizza for dinner, you had to make your own. In that case, you buy all the ingredients, knead the dough, make the curst, chop the vegetables, and bake the pizza. You also need proper cookware. The process requires expertise, but you get exactly what you want, how you want it, when you want it. Homemade Pizza is equal to building your own IT infrastcture with your own resources.

Frozen Pizza = Private Cloud

Nowadays supermarkets offer customers a variety of frozen pizzas, Pizza manufacturers do almost all the work. You just select one from the frozen pizza aisle, bring it home, and bake it in your own oven. It's convenient, but the flavors and prices are fixed. Frozen Pizza is similar to Priviate Cloud.

Delivered Pizza = Managed Cloud

Delivered pizza is a bit more convenient than frozen pizza. You order what you want and how you want, then the piza is delivered to your doorsetp. No baking is necessary. However, menu items are limited and you still need your own dishes. Deliverd Pizza is close to Managed CLoud.

Restaurant Pizza = Public Cloud

Restaurant pizza requires the least expertise because the restauarnt provides you with everything. You simply show up, order, eat and pay. You get one invoice at the end of your meal. However, other customers alson dine in, so the service can be slow when the place is crowded. Furthermore, everyone orders from the same menu. It means there is not much room for individual customization. Restaurant Pizza is comparable to Public Cloud, except that Public Cloud is often the most inexpensive option while restaurant pizza is not.

Your Own Topping = Hybrid Cloud

Many pizza delivery places and restaurants offer an option to choose your own topping. Hybrid Cloud is like choosing your own toppings to make your favorite pizza.


如何向没有技术背景的朋友解释云计算?