In this Post we will learn how to setup learning environment for pyspark in windows. To learning spark with python, we will install pyspark in windows and we will use jupyter notebook and spider IDE to test and run pyspark code. Prerequisite:- Java should be installed. If java is not installed please install java then […]
In this post we will discuss about handling Null value during sqoop import/export. If any value is NULL in the table and we want to sqoop that table ,then sqoop will import NULL value as string “null” in HDFS. So , that will create problem to use Null condition in our query using hive For […]
Scala is an acronym for Scalable Language. Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise ,elegant and type-safe way. multi-paradigm programming means it supports Object oriented programming as well as fuctional programming. So scala is a scalable programming language for component software with the focus on pattern […]
Prerequisite :–Java should be installed. Please follow below step to install scala in Window :– 1)Download scala :-Click on below link to download scala. https://www.scala-lang.org/download/ 2)After download unzip it and set class path. 3)After setting class path go to command prompt and type scala. After that you can practice scala program in […]
Wow! Merge in Hive ? Yes , after the successful release of hive 2.2.X merge is also possible in hive now. Today I will walk you through one simple example that will clear merge concept in hive. What is Merge option in hive:- With Merge option we can perform record level insert,update and delete in […]
In this post , we will learn how to parse XML file using hive. I am using below xml file for this example. jmdbks@hadoop:~$ cat test.xml <test><name>Sumit Kumar</name><properties><age>29</age><sex>male</sex></properties></test> <test><name>Amit Kumar</name><properties><age>30</age><sex>male</sex></properties></test> <test><name>Aditya Kumar</name><properties><age>23</age><sex>male</sex></properties></test> <test><name>Priya Kumar</name><properties><age>24</age><sex>Female</sex></properties></test> <test><name>Rohan Kumar</name><properties><age>20</age><sex>male</sex></properties></test> <test><name>Nitish Kumar</name><properties><age>29</age><sex>male</sex></properties></test> jmdbks@hadoop:~$ Below are the Step by Step Procedure to parse XML file using hive . Step 1:- […]
(I)explode() and posexplode():- explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. Below example will help to understand explode() better. 1)Create example data set that having only one column as Array<int>. beauty2955@hadoop:~$ cat array_exm1 100,200,300,500 400,200,201 300,45 101 2)create table and load array_exm1 into […]
1)from_unixtime: This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time. hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()); OK 2015–05–18 05:43:37 Time […]
Hive installation: 1.) search for apache hive-2.2.0 bin in google and download zar file (latest bin.tar.gz file) http://www-eu.apache.org/dist/hive/hive-2.2.0/ e.g. :- apache-hive-2.2.0-bin.tar.gz or download hive from linux command as below:– wget http://www-eu.apache.org/dist/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz 2.) extract file: tar -xvf <filename> e.g.:-tar -xvf apache-hive-2.2.0-bin.tar.gz mv apache-hive-2.2.0-bin hive2 3.) download mysqlconnector by using below command wget https://la-mirrors.evowise.com/mysql/Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz extract file : […]
HDFS commands 1)mkdir (Create a directory) hadoop fs –mkdir /data 2)copyFromLocal(Copy a file or directory from Local to HDFS) If we want to copy file1 from local to HDFS inside directory /data then we have to use below command hadoop fs –copyFromLocal file1 /data/ Note: Can be used for copying multiple files, similar pattern files, […]