分类 "Bigdata" 下的文章

问题:hadoop上传文件到hdf时报错,显示没有找到datanode

解决:因为重新格式化namenode前未删除data、logs目录

方法:

1、查看/opt/module/hadoop-3.1.3/logs下hadoop-zhitu-datanode-hadoop102.log日志文件
找到namenode的clusterID
2、将/opt/module/hadoop-3.1.3/data/dfs/data/current/VERSION文件里的clusterID更改为上面找到的namenode的clusterID

阅读全文

问题:报错:return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Job failed with org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 57, had
oop03, executor 2): UnknownReason

解决:因为语句中collect_set(named_struct))顺序有错

阅读全文

问题:hive的字段是array<struct>类型,如何将数据转换成字符串?

解决:一步步拆解

方法:
hive字段

`page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '页面访问统计'

数据生成方法

select
    mid_id,
    collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats
from
(
    select
        mid_id,
        page_id,
        count(*) page_count,
        sum(during_time) during_time
    from dwd_page_log
    where dt='2020-06-14'
    group by mid_id,page_id
)t2
group by mid_id

阅读全文

问题:logstash如何配置系统服务?

方法:

在logstash项目下

bin/system-install

服务启动

sudo su
systemctl daemon-reload
systemctl start logstash.service

如果报错logstash.service: Failed with result 'exit-code'.
查看错误日志

vim /var/log/message
logstash.service: Failed at step USER spawning

说明是没有用户

vim /etc/systemd/system/logstash.service

将user,group改成当前logstash目录的用户和组,然后

systemctl daemon-reload
systemctl restart logstash.service

阅读全文