What is Spark?
Spark provides general data processing
platform. Spark runs programs up to 100 times faster in memory or 10 times
faster on disk, than Hadoop. Last year, Spark is now taking over Hadoop by
completing the 100 TB contest 3 times faster than machines and it also became
the fastest open source engine.
CODE SAMPLE:
parkContext.textFile("hdfs://...")
.flatMap(line =>
line.split(" "))
.map(word => (word,
1)).reduceByKey(_ + _)
.saveAsTextFile("hdfs://...")
Spark Core:
Spark Core is the engine for large-scale distributed and
parallel data processing. It is responsible for:
·
memory management.
·
monitoring jobs on a
cluster.
Spark introduces the concept of an RDD, an immutable fault-tolerant, distributed
collection of objects that can be controlled on in parallel. An RDD contains of
object and is created by loading an external data or distributing collection
from the driver program.
RDDs support two types of operations:
·
Transformations are performed on RDD and which yield a
new RDD containing the result.
·
Actions are operations that return a value after
running a computation on an RDD.
Transformations in Spark are “lazy”, meaning that they do not
compute their results right away. Instead, they just “remember” the operation
to be performed and the dataset (e.g., file) to which the operation is to be
performed. The transformations are only actually computed when an action is
called and the result is returned to the driver program. This design enables
Spark to run more efficiently. A big file was transformed in various ways and
passed to first action, Spark only process and return the result for the first
line, rather than do the work for the entire file.
By default, each transformed RDD are computed each time you run
on it. However, you may also an RDD in memory using the cache method, in which
case Spark will keep the elements around on the cluster for much faster access
the next time you query it.
SPARK SQL:
Spark SQL is a Spark component that
supports query data either via SQL or via the Query
Language. It originated as the Hive port to run on top
of Spark (in place of Map Reduce) and is now integrated with the Spark stack.
In addition to providing support for various data sources, it makes it possible
to make SQL queries with code transformations which results in a very powerful
tool. Below is an example of a Hive compatible query
Training:
Peopleclick is one of the leading IT Training institute
provides Spark Training Bangalore. The trainers of peopleclick are all working
professionals provide Spark Training Bangalore. After Spark Training Bangalore,
the candidates get placed in MNC. The trainers also provide live project
training in Bangalore. The trainers are also very supportive and guide the
candidate throughout the course. For more information please visit:
www.hadooptrainingbangalore.com/spark-training-bangalore

Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
ReplyDeletebest apache spark online course
Very interesting post! Thanks for sharing your experience suggestions.help more guys.
ReplyDeleteSAP Human Capital Management Online Training Institute from India
Dot Net Training from Hyderabad
Best ASP.Net Core Training from South Africa
ASP.Net Core Online Coaching from Canada
Oracle Cloud CPQ Online Training from Chennai
Salesforce Configure Price Quote Certification Online Training from Hyderabad
SAP Vistex Training from Pune
IAM Training Course In Hyderabad
AWS Security Specialty Online Training from USA