I have uploaded a Directory to hadoop cluster that is having "," in its name like "MyDir, Name" when I am trying to delete this Directory by using rmr hadoop shell command as... moreI have uploaded a Directory to hadoop cluster that is having "," in its name like "MyDir, Name" when I am trying to delete this Directory by using rmr hadoop shell command as following
hadoop dfs -rmr hdfs://host:port/Navi/MyDir, Name
I'm getting the following messages rmr: cannot remove hdfs://host:port/Navi/MyDir,: No such file or directory. rmr: cannot remove Name: No such file or directory.
However I have successfully deleted other Directories from the same location, using the same command i.e.
hadoop dfs -rmr hdfs://host:port/dir_path
I am migrating my application from hadoop 1.0.3 to hadoop 2.2.0 and maven build had hadoop-core marked as dependency. Since hadoop-core is not present for hadoop 2.2.0. I tried... moreI am migrating my application from hadoop 1.0.3 to hadoop 2.2.0 and maven build had hadoop-core marked as dependency. Since hadoop-core is not present for hadoop 2.2.0. I tried replacing it with hadoop-client and hadoop-common but I am still getting this error for ant.filter. Can anybody please suggest which artifact to use?
previous config :
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>
Error:
Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project event: Compilation failure: Compilation failure:
/opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java: package org.apache.tools.ant.filters does not exist
I am installing Hadoop on my laptop. SSH works fine, but I cannot start hadoop.
munichong@GrindPad:~$ ssh localhost
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-25-generic... moreI am installing Hadoop on my laptop. SSH works fine, but I cannot start hadoop.
munichong@GrindPad:~$ ssh localhost
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-25-generic x86_64)
* Documentation: https://help.ubuntu.com/
0 packages can be updated.
0 updates are security updates.
Last login: Mon Mar 4 00:01:36 2013 from localhost
munichong@GrindPad:~$ /usr/sbin/start-dfs.sh
chown: changing ownership of `/var/log/hadoop/root': Operation not permitted
starting namenode, logging to /var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out
/usr/sbin/hadoop-daemon.sh: line 136: /var/run/hadoop/hadoop-munichong-namenode.pid: Permission denied
usr/sbin/hadoop-daemon.sh: line 135: /var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out: Permission denied
head: cannot open `/var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out' for reading: No such file or directory
localhost: chown: changing ownership of `/var/log/hadoop/root': Operation not permitted
localhost: starting datanode, logging to... less
In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
Hi i can't resolve my problem when running hadoop with start-all.sh
rochdi@127:~$ start-all.sh
/usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression... moreHi i can't resolve my problem when running hadoop with start-all.sh
rochdi@127:~$ start-all.sh
/usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression expected
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-rochdi-namenode-127.0.0.1
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression expected
localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-rochdi-datanode-127.0.0.1
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression expected
localhost: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-rochdi-secondarynamenode-127.0.0.1
/usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression expected
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-rochdi-jobtracker-127.0.0.1
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 62: [: localhost: integer expression... less
I've recently started getting into data analysis and I've learned quite a bit over the last year (at the moment, pretty much exclusively using Python). I feel the next step is to... moreI've recently started getting into data analysis and I've learned quite a bit over the last year (at the moment, pretty much exclusively using Python). I feel the next step is to begin training myself in MapReduce/Hadoop. I have no formal computer science training however and so often don't quite understand the jargon that is used when people write about Hadoop, hence my question here.
What I am hoping for is a top level overview of Hadoop (unless there is something else I should be using?) and perhaps a recommendation for some sort of tutorial/text book.
If, for example, I want to parallelise a neural network which I have written in Python, where would I start? Is there a relatively standard method for implementing Hadoop with an algorithm or is each solution very problem specific?
The Apache wiki page describes Hadoop as "a framework for running applications on large cluster built of commodity hardware". But what does that mean? I've heard the term "Hadoop Cluster" and I know that Hadoop is Java... less
I have a option of using Sqoop or Informatica Big Data edition to source data into HDFS. The source systems are Tearadata, Oracle.
I would like to know which one is better and any... moreI have a option of using Sqoop or Informatica Big Data edition to source data into HDFS. The source systems are Tearadata, Oracle.
I would like to know which one is better and any reason behind the same.
Note: My current utility is able to pull data using sqoop into HDFS , Create Hive staging table and archive external table.
Informatica is the ETL tool used in the organization.
Regards Sanjeeb
I want to setup a hadoop-cluster in pseudo-distributed mode. I managed to perform all the setup-steps, including startuping a Namenode, Datanode, Jobtracker and a Tasktracker on... moreI want to setup a hadoop-cluster in pseudo-distributed mode. I managed to perform all the setup-steps, including startuping a Namenode, Datanode, Jobtracker and a Tasktracker on my machine.
Then I tried to run some exemplary programms and faced the java.net.ConnectException: Connection refused error. I stepped back to the very first steps of running some operations in standalone mode and faced the same problem.
I performed even triple-check of all the installation steps and have no idea how to fix it. (I am new to Hadoop and a beginner Ubuntu user thus I kindly ask you for "taking it into account" if providing any guide or tip).
This is the error output I keep receiving:
hduser@marta-komputer:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs+'
15/02/22 18:23:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/22 18:23:04 INFO client.RMProxy:... less
I am newer to Hadoop, and want to know what is the differences between Hadoop-common, Hadoop-core and Hadoop-client?
By the way,for a given class, how do I know which... moreI am newer to Hadoop, and want to know what is the differences between Hadoop-common, Hadoop-core and Hadoop-client?
By the way,for a given class, how do I know which artifact contains it in Maven ? For example, which one contains the org.apache.hadoop.io.Text?
I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
The test environment is as follows:
Number of data... moreI'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
The test environment is as follows:
Number of data nodes: 3
Data node machine spec:
CPU: Core i7-4790 (# of cores: 4, # of threads: 8)
RAM: 32GB (8GB x 4)
HDD: 8TB (2TB x 4)
I am trying to implement one sample word count program using Hadoop. I have downloaded and installed Hadoop 2.0.0. I want to do this sample program using Eclipse because i think... moreI am trying to implement one sample word count program using Hadoop. I have downloaded and installed Hadoop 2.0.0. I want to do this sample program using Eclipse because i think later in my real project I have to use Eclipse only.
I am not able to find Hadoop related jar files like hadoop-core.jar and other required jar files. I searched in all the folders of 2.0 hadoop but couldn't find those files. Those same files are available in 1.0 version of Hadoop but not in 2.0 version. I would like to know where can I get these files?
I am not able to find much information about 2.0 version.
please help less
I am getting the following error while starting namenode for latest hadoop-2.2 release. I didn't find winutils exe file in hadoop bin folder. I tried below commands
$ bin/hdfs... moreI am getting the following error while starting namenode for latest hadoop-2.2 release. I didn't find winutils exe file in hadoop bin folder. I tried below commands
$ bin/hdfs namenode -format
$ sbin/yarn-daemon.sh start resourcemanager
ERROR util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:863) less
In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps.
i.e. Map1 , Reduce1 , Map2 , Reduce2 , and so on.
So you have... moreIn many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps.
i.e. Map1 , Reduce1 , Map2 , Reduce2 , and so on.
So you have the output from the last reduce that is needed as the input for the next map.
The intermediate data is something you (in general) do not want to keep once the pipeline has been successfully completed. Also because this intermediate data is in general some data structure (like a 'map' or a 'set') you don't want to put too much effort in writing and reading these key-value pairs.
What is the recommended way of doing that in Hadoop?
Is there a (simple) example that shows how to handle this intermediate data in the correct way, including the cleanup afterward? less
I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
scala> val df =... moreI would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
scala> val df = sqlContext.load("hdfs:///csv/file/dir/file.csv")
Error which I got:
java.lang.RuntimeException: hdfs:///csv/file/dir/file.csv is not a Parquet file. expected magic number at tail but found
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)
at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:277)
at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:276)
at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at... less
I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this
bin/hadoop jar... moreI tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs+'
I am getting the following Exception
java.lang.OutOfMemoryError: Java heap space
Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.
anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task: Using ResourceCalculatorPlugin :... less
I am trying to run a simple NaiveBayesClassifer using hadoop, getting this error
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at... moreI am trying to run a simple NaiveBayesClassifer using hadoop, getting this error
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)
Code :
Configuration configuration = new Configuration();
NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..
modelPath is pointing to NaiveBayes.bin file, and configuration object is printing... less
I am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics... moreI am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics world and I will be more than happy for some suggestions that you might have based on your experience?
Are they supposed to be equal?
but, why the "hadoop fs" commands show the hdfs files while the "hdfs dfs" commands show the local files?
here is the hadoop version... moreAre they supposed to be equal?
but, why the "hadoop fs" commands show the hdfs files while the "hdfs dfs" commands show the local files?
here is the hadoop version information:
Hadoop 2.0.0-mr1-cdh4.2.1 Subversion git://ubuntu-slave07.jenkins.cloudera.com/var/lib/jenkins/workspace/CDH4.2.1-Packaging-MR1/build/cdh4/mr1/2.0.0-mr1-cdh4.2.1/source -r Compiled by jenkins on Mon Apr 22 10:48:26 PDT 2013
Nathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using... moreNathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using his Pail library running on top of Map Reduce.
Is it possible to achieve the same result in Google Cloud Storage?
Can I use Google Cloud Dataflow instead of MapReduce for this purpose?
I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide... moreI installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide successfully.However, I cannot for the life of me figure out how to stop all of the verbose INFO logging after each command.I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my log4j.properties file in the conf folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the logging INFO statements printing after executing each statement.I am very confused with how this is supposed to work.
#Set everything to be logged to the console log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout... less
I'm currently configuring hadoop on a server running CentOs. When I run start-dfs.sh or stop-dfs.sh, I get the following error:WARN util.NativeCodeLoader: Unable to load... moreI'm currently configuring hadoop on a server running CentOs. When I run start-dfs.sh or stop-dfs.sh, I get the following error:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableI'm running Hadoop 2.2.0.Doing a search online brought up this link: http://balanceandbreath.blogspot.ca/2013/01/utilnativecodeloader-unable-to-load.htmlHowever, the contents of /native/ directory on hadoop 2.x appear to be different so I am not sure what to do.I've also added these two environment variables in hadoop-env.sh:export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/"export HADOOP_COMMON_LIB_NATIVE_DIR="/usr/local/hadoop/lib/native/"Any ideas? less
I am currently starting a project titled "Cloud computing for time series mining algorithms using Hadoop". The data which I have is hdf files of size over a terabyte.In hadoop as... moreI am currently starting a project titled "Cloud computing for time series mining algorithms using Hadoop". The data which I have is hdf files of size over a terabyte.In hadoop as I know that we should have text files as input for further processing (map-reduce task). So I have one option that I convert all my .hdf files to text files which is going to take a lot of time.
Or I find a way of how to use raw hdf files in map reduce programmes. So far I have not been successful in finding any java code which reads hdf files and extract data from them. If somebody has a better idea of how to work with hdf files I will really appreciate such help.
Thanks Ayush less
I am confused as to where Talend and Apache spark fit in the big data ecosystem as both Apache Spark and Talend can be used for ETL.
Could someone please explain this with an example?