SPARK 命令行读取 parquet 数据
查看 HDFS 数据
[root@node-master]# hadoop fs -ls /
Found 12 items
drwxrwxrwx - hdfs hadoop 0 2020-11-24 17:59 /app-logs
drwxrwxrwx - hdfs hadoop 0 2020-11-24 17:59 /ats
drwxr-xr-x - hdfs hadoop 0 2020-11-24 17:59 /datasets
drwxrwxrwx - flink hadoop 0 2020-11-24 18:00 /flink
drwxrwxrwx - mapred hadoop 0 2020-11-24 17:59 /mr-history
drwxrwxrwx - hdfs hadoop 0 2020-11-24 17:59 /mrs
drwxrwxrwx - hdfs hadoop 0 2020-11-24 18:03 /tmp
drwxr-xr-x - root ficommon 0 2020-12-07 17:41 /aka
drwxrwxrwx - hdfs hadoop 0 2020-12-07 17:40 /user
查看表
val db = spark.read.parquet("/aka/test")
db: org.apache.spark.sql.DataFrame = [value: string]
db.show(false)
查看数据
# 拷贝文件到 hdfs 我已经拷贝过去 /train_data/下全部文件
# 打开spark-shell
# 输入以下内容
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val parquetFile = sqlContext.parquetFile("/data/test/*.parquet")
# 打印 150 行内容
parquetFile.take(150).foreach(println)
版权声明:
作者:Akiraka
链接:https://www.akiraka.net/hadoop/865.html
来源:Akiraka
文章版权归作者所有,未经允许请勿转载。
THE END
0
二维码
海报
SPARK 命令行读取 parquet 数据
查看 HDFS 数据
[root@node-master]# hadoop fs -ls /
Found 12 items
drwxrwxrwx - hdfs hadoop 0 2020-11-24 17:59 /app-logs
drwxrwxrwx ……
文章目录
关闭