Big Data Analysis using Spark for Collision Rate near CalStateLA

Authors

  • Shubhra Wahi

  • Manik Katyal

  • Jongwook Woo

Keywords:

spark, collision data, gender analysis, geo spatial analysis, big data

Abstract

Police say alcohol, drugs and speed are the three major factors that cause collisions, we thought that it would be insightful to analyze the collision data to ensure the correctness of this conclusion; and also to get further information like what age groups were involved, in what areas have accidents occurred, what were the reasons behind collisions, etc. These experiences can possibly make overall population mindful of the reasons for crashes created by impacts. To analyze more than hundred thousand records we adopted Spark for faster processing of this massive data set. In this paper, we are presenting facts based on data and analytics which lead to conclusions like the number of collisions decreased between 2009 and 2013, Females involved in collisions were much less than males, etc. Moving ahead in our research, we addressed complex analytics like areas near CalStateLA more prone to collisions, brands of cars more involved in collisions and which specific type of collision was most observed.

How to Cite

Shubhra Wahi, Manik Katyal, & Jongwook Woo. (2016). Big Data Analysis using Spark for Collision Rate near CalStateLA. Global Journal of Computer Science and Technology, 16(H4), 1–8. Retrieved from https://computerresearch.org/index.php/computer/article/view/1501

Big Data Analysis using Spark for Collision Rate near CalStateLA

Published

2016-10-15