The analytical world has evolved from having a BI project to having BIG DATA projects. This revolution goes from the needs of analyzing information to the point of requiring real-time analysis. Going through situations where the analytical aspects were not defined, we began to talk about new roles within companies. Among them, we find the Data Scientist, the role that covers the ability to analyze large volumes of information.
Data Scientist requires resources to be familiar with the process of capturing, analyzing and presenting business data. Let’s look at each of these characteristics in more detail:
Know how to design the structure that supports data loading by reading Designing a Data Warehouse
Skills from a Data Scientist
Due to its characteristic, the Data Scientist requires knowledge of programming and database. Within this culture, some know-hows are primordial to carry out the tasks of this role. Such is the case of Hadoop, Java, Python, SQL, Hive, and Pig, among others.
But the requirements do not end there. Extraction, transformation and data loading (ETL’s) are also requirements to continue the process. This role demands knowledge of data warehousing and unstructured data models techniques to design the structure that supports data loading. The Data Scientist has full experience in the business since it´s essential to understand the information that is processed.
In the second order, the Data Scientist must have enough knowledge of R, Excel, SAS and other tools that allow the operation of the information. Through these instruments, they try to discover patterns and correlations through statistics. This characteristic Is vital for the management and work of the data. That is why these are the main skills required of the Data Scientist. But the proper use of these technologies must necessarily be accompanied by mathematics and statistics. This mix will result in understanding correlations, regressions and all functions necessary for the use of data from different angles.
This role should have the handling of visualizing tools such as Flare, HighCharts, AmCharts, D3.js, among others, to present the results visually. But it will not be enough with a good presentation. The Data Scientist should be good at counting the analysis patterns, explaining the results obtained and why those are reliable. The Data Scientist should have Storytelling skills.
Learn the pros and cons of Bloom Filters and HyperLogLog, two efficient types of data structures Probabilistic Data Structures to Improve System Performance