# Introduction ig Data or Data Integration is basically related with interoperability of data. Big Data deals with divergent fields such as: 1. Substantial data movement 2. Replication of data 3. Synchrony of data 4. Transmutation of data Geoscience is the application and exploration of Earth's minerals, soil, water and energy resources. The variability in Earth sciences in any area can be shown in both spatial and temporal variations. # II. # Analysis of Big Data Prior to 2012 U.S was the largest single contributor to global data. The emerging markets are showing the largest increases in data growth. In 2012, the amount of information stored worldwide exceeded 2.8 zetabytes. By 2020, the total amount of data stored is expected to be 50 times greater than today. What good is all of this data? Data is raw, unrecognized facts that is in and of itself worthless. Information is potentially valuable concepts based on data. Knowledge is what we understand based on information. Wisdom is effective use of knowledge in decision making. # Literature Review There are many studies wherein many scientists have studied Big Data by inventing customized tools have been developed using various scripting languages. An overview of such studies is discussed in this section. Azza Abouzeid et al devised a paper entitled "Ha doop DB: An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads "This paper elaborates on how Hadoop DB is able to approach the performance of parallel data systems and how Hadoop works in het erogenous environments. Jerome Boulon et al have discussed about "Chukwa: A large-scale monitoring system" used for monitoring and analysing large distributed systems. Jeffrey Dean et al have elaborated on "MapReduce: Simplified Data Processing on Large Clusters" which is a programming model and is used processing and generating large data sets. Two functions are used: map function and reduce function. Tom Narock and Pascal Hitzler discussed about "Crowd sourcing Semantics for Big Data in Geosciences Applications" i.e. how semantic algorithms have been used for achieving accurate data . Sanjay Ghemawat et al discussed on "The Google File System" which is a scalable distributed file system for large distributed data-intensive applications. It enhances the performance while analysing large clusters of data and provides great performance when dealing with large number of clients. a) Technique used Big Data can be coded in many different languages such as C, C++, Python. However, most suitable language considered for coding is Python. Python is said to be multi-model programming language. It authorize programmers to acquire various methodology of programming: object-oriented and structured programming which is fully sustained by Python. Python offers diverse language characteristics which stimulates functional programming and aspectoriented programming. There are many factors that favours Python as a language to code for Big Data. In modern times plenty of API's and libraries have been advanced for Python. In research also Python has a lot to implement ranging from networking to GUI development. Thus the interaction among systems has been highly enriched even though it remains a formidable task in many programming languages. # Pydoop recommends diverse features which are usually not found in other Python libraries for Hadoop like MapReduce library which enables users to combine and partition data sets, easily installed library and can be used freely. SciPy is an open source library that is offered by Python for all the users aiming to do scientific computations. This library furnish various modules such as ODE(Ordinary Differential Equations), FFT(Fast Fourier Transformation),optimization which finds application in the field of science and engineering. The MapReduce provides a framework where large volumes of data can be analysed. The tool can be extended futhur by increasing the volume of data supplied as well as some other scripting language can be adopted by the scientists to enhance the power of Big Data and thus make new discoveries in this discipline. Big Data is emerging as a powerful technique in recent years and provides solutions to the challenges of merging data thus making a mark in manifold fields like banking, health care, education which will involve whole world at large. 1![Figure 1 : Analysis of Big Data](image-2.png "Figure 1 :") ![Figure : Snapshot of Program 1](image-3.png "C") 1![Figure : Snapshot of Execution 1If the user manipulates the text then the output will be modified accordingly as indicated.](image-4.png "Figure : Snapshot of Execution 1") 2![Figure : Snapshot of Execution 2](image-5.png "Figure : Snapshot of Execution 2") ![Journal of Computer Science and Technology Volume XVI Issue II Version I Journals Inc. (US)](image-6.png "Global") © 2016 Global Journals Inc. (US) 1 * Big data: How do your data grow? CLynch 10.1038/455028a Nature 455 2008 * Scope: Easy and efficient parallel processing of massive data sets RChaiken BJenkins P.-ALarson BRamsey DShakib SWeaver JZhou Proc. of VLDB of VLDB 2008 * Sorting 1pb with mapreduce GCzajkowski * MapReduce: Simplified Data Processing on Large Clusters JDean SGhemawat OSDI 2004 * MapReduce: Simplified Data Processing on Large Clusters JeffreyDean SanjayGhemawat Communications of the ACM 2008 51 * The Ganglia Distributed Monitoring System: Design, Implementation, and Experience MatthewLMassie BrentNChun DavidECuller In Parallel Computing 30 7 2004 * Web search for a planet: The Google cluster architecture ALuiz JeffreyBarroso Ursh¨olzleDean 22.28 IEEEMicro 23 2 April 2003 * From Data to Decisions: A Value Chain for Big Data HGMiller PMork 10.1109/MITP.2013.11 IT Professional 15 1 2013 * References Références Referencias