Junior Data Engineer
Creditsafe is the most used business data provider in the world, reducing risk and maximizing opportunities for our 110,000 business customers. Our journey began in Oslo, Norway in 1997, where we had a dream of using the then revolutionary internet to deliver instant access company credit reports to small and medium-sized businesses. Creditsafe realised this dream and changed the market for the better for businesses of all sizes. From there, we opened 15 more offices throughout Europe, the USA and Asia. We provide data on more than 365 million companies and provide customer notifications for billions of changes annually. We are a high growth company offering the freedom and flexibility of a start-up type culture due to the continuous innovation and new product development to support market disruption which we are famous for, coupled with the stability of being a profitable and high growth company! With such a large customer base and breadth of data and analytics technology you will have real opportunities to help companies survive and thrive in challenging times by reducing business risk and choosing trustworthy customers and suppliers.
If you enjoy working with billions of time-series data points and creating new business insights to solve real problems for a diverse set of Customers and have an interest in Big Data Platforms, Machine Learning and Predictive Analytics you will enjoy working at Creditsafe.
You will be working closely with the data vault team building specific systems
facilitating the transition from traditional data processing and approaches to a Data Vault based approach. The role will define and build data pipelines that will improve data-informed decision-making within the business. This is an opportunity to work with large volumes of data and gain exposure to big data architectures.
• Experience of reading and writing data using Python and SQL
• Understanding of Agile development methodologies
• Python unit testing Frameworks such as pytest and nose
• Familiarity with cloud technology, preferably AWS
• Good understanding of GIT
• Knowledge of automated delivery processes
• Implemented data pipelines using Apache Airflow
• Execute Data Transformations in SQL via DBT
• Understanding of MPP data platforms such as Apache Hive, Presto, Spark, Redshift
• Experience of working with large datasets
• Play a hands on role as part of an Agile team to develop, test and maintain high quality systems that fulfil business needs.
• Extracting data from various files, systems, cloud sources, databases and APIs through writing and executing code (SQL, Python and similar)
• Cleaning and combining offline, online or mixed sources into datasets. Building in manual or automatic validation and accuracy checks. Making use of Python, SQL or specialist Big Data frameworks
• Help support the team in maintaining existing software and data infrastructure
• Strong focus on quality. Execute practices such as continuous integration and test driven development to enable the rapid delivery of working code.
• Write documentation of new processes and products you’ve developed so that knowledge is shared
• Create pattern based data pipeline using Python and SQL using industry standard loading patterns in accordance with guidelines set by the Senior Data Engineers
• Help to design, build and launch new data models
• 2+ years development experience within a commercial environment
• Knowledge of Agile development methodologies
• Some experience of working with data sources and Python
• Knowledge of SQL programming and code optimisation.
• Awareness of cloud technology particularly AWS.
• Knowledge of automated delivery processes
• Some experience designing and building data pipelines
• Understanding of best engineering practices (handling and logging errors, system monitoring and building human-fault-tolerant applications)
• Ability to write efficient code and comfortable undertaking system optimisation and performance tuning tasks
• Experience working within a unix based environment
• Comfortable working with relational databases such as PostgreSQL, MySQL, MariaDB or Redshift
• Teamwork – Encourages cooperation, collaboration and partnerships
• Quality Improvement – strives for high quality performance.
• Problem Solving – Identifies problems and seeks best solutions by being creative and innovative
• Autonomy – Works under direction of the Senior Data Engineers within a clear framework of accountability. Exercises personal responsibility and autonomy. Plans own work to meet given objectives and processes.
• Influence – Participates in external activities related to own specialism. Contributes to decisions which influence the success of projects and team objectives.
• Complexity – Performs a range of work, sometimes complex and non routine, in a variety of environments. Applies methodical approach to issue definition and resolution.
• Business skills – Selects appropriately from applicable standards, methods, tools and applications. Communicates fluently, orally and in writing, and can present information to both technical and non-technical audiences. Plans, schedules and monitors work to meet time and quality targets. Absorbs new information and applies it effectively. Maintains an awareness of developing technologies and how they could be applied to improve their solution.