Data Engineer II - Big Data
Location:
Chesterbrook , Pennsylvania
Posted:
February 03, 2017
Reference:
00001G5N
POSITION SUMMARY:


Individuals in this role understand how information is turned into knowledge and how this knowledge supports and enables key business processes. They must have a solid understanding of logical data warehousing design principles and data access requirements for business analytics and exploration. Also required are analytical skills, the ability to establish and maintain effective working relationships with team members, as well as an innate curiosity around wanting to understand business processes, business strategy and strategic business initiatives to help drive incremental business value from enterprise data assets.

PRIMARY DUTIES AND RESPONSIBILITIES:
  • Works on moderately to complex tasks in support of one or more projects as a project team member, or independently on small projects.

  • Increased skill in multiple technical environments and possesses knowledge of a specific business area.
  • May participate in project planning processes.
  • May identify project tasks and assists with project task effort estimations.
  • W
    orks with business analytics team members to develop business requirements.
  • Leverages knowledge of underlying data sources and builds up specific business data domain expertise.
  • Helps translate business analytics needs into semantic data access requirements.
  • Translates business requirements into conceptual, logical and physical data model.
  • Develops and maintains an integrated logical and physical data model.
  • Assists in creating a framework for representing the data elements including the entities, relationships and attributes.
  • Works with Delivery Management team to understand potential impacts to physical/virtual infrastructure and assists in remediation plans as needed.
  • Recognizes and resolves conflicts between models, ensuring that data models are consistent with the enterprise model (e.g. entity names, relationships and definitions).
  • Works closely with other IT groups to use the data models and/or HDFS data assets throughout the whole life cycle.
  • Develops technical design of data sourcing, transformation and aggregation logic.
  • Works with Information Delivery team to determine data sourcing options and recommend implementation approach.
  • Leverages enterprise standard tools and platforms to develop data transformation and aggregation logic, typically working with and/or leading a small team.
  • Works with business stakeholders, data visualization specialists and/or data scientists to determine analytics data consumption alternatives and recommend optimal approach based on analytics requirements.
  • Verifies that data access mechanisms satisfy analytics consumption needs (may include testing various approaches).
  • Transfers knowledge of data access/consumption mechanisms to business stakeholders, data visualization specialists and/or data scientists.
  • Collects, analyzes and summarizes data to support business decisions.
  • Identifies patterns and trends regarding data quality.
  • Helps to maintain the data dictionary.
  • Ensures that the information is named properly and defined consistently across the organization.
  • Provides data that is congruent and reliable and is easily accessible by the user.
  • Identifies opportunities for reuse of data across the enterprise.
  • Validates data sources.
  • Manages the flow of information between departments.
  • Leverages master data as needed by business processes.
  • Consults Delivery Management team on batch/bulk data load scheduling to optimize performance
  • Works with data stewards to gather requirements on merging, de-duping, cleansing rules.
  • Ensures and maintains a high level of data integrity by using tools to monitor and mass update data changes.
  • Develops data profiling and preventative procedures to improve data quality.
  • Monitors and resolves daily exception reports.
  • Analyzes data inaccuracies and recommends process improvements or system changes to enhance overall quality of the data.
  • Communicates data integrity accuracy to the business, and escalates/communicates issues when necessary.
  • Identifies opportunities and supports the development of automated solutions to enhance the quality of enterprise data.
  • Analyzes data issues and works with development teams for problem resolutions.
  • Monitors data dictionary statistics, analyze reports of data duplicates or other errors to provide ongoing appropriate data reports.
  • Identifies problematic areas and conducts research to determine the best course of action to correct the data, identify, analyze and interpret trends and patterns in complex datasets.
  • May define test plans and system documentation to monitor testing for the implementation of business analytics solution enhancements, modifications and new releases.
  • Conducts unit testing to ensure business analytics solutions meet user specifications
  • May participate in integration testing.
  • Interfaces with testing teams to incorporate plans into the testing process.
  • May analyzes existing data platform to identify weaknesses and develop opportunities for improvements, when assigned.
  • Provides technical coaching and mentoring to less-experienced team members.



Qualifications:

EXPERIENCE AND EDUCATIONAL REQUIREMENTS:
  • Bachelor's degree in Programming/Systems, Computer Science, Statistics, Electrical Engineering or equivalent work experience.
  • >3 years of related technical experience.
  • Experience with Cloud based analytics, data management and visualization technologies.

MINIMUM SKILLS, KNOWLEDGE AND ABILITY REQUIREMENTS:
  • Knowledge and exposure to Big Data technologies including Hadoop/HDFS, MapReduce, Yarn, Hive, Spark.
  • Knowledge and exposures to cloud or on premises MPP data warehousing systems (e.g. Microsoft APS, Teradata, Snowflake, Azure SQL Data Warehouse).
  • Knowledge and experience with Informatica or Talend as an ETL environment.
  • Knowledge in cloud resource provisioning and management on Azure and/or AWS is a plus.
  • Familiar with Lambda and Kappa architecture implementations.
  • Familiar with Data Lake implementations and design patterns.
  • Familiar with Storm/Spark Streaming and with streaming concepts and patterns is a plus.
  • Experience with analytics model management and analytics workflow tools (e.g. SAS Model Manager, Knime, ML Studio, Alteryx) is a plus.
  • Demonstrated knowledge of data management concepts as well as an outstanding command of the SQL standard.
  • Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySql) and understanding of columnar database technology (e.g. Amazon Redshift, Microsoft SQL Data Warehouse, Oracle Exadata).
  • Object Oriented Programming proficiency using Java EE and/or .Net technology stack.
  • Knowledge of Test Driven Development, Continuous Integration, Agile/Scrum.
  • Experience with code versioning tools and a command of configuration management concepts and tools.
  • Demonstrated ability to quickly learn and adapt to new technologies and coding techniques.
  • Experience designing, developing and testing applications using proven or emerging technologies in a variety of technologies and environments.
  • Scientific computing experience (e.g. R, Matlab, Python) is a plus.
  • Knowledge of NoSql data management systems (e.g. MongoDB, HBase, Cassandra) is a plus.
  • Theoretical and practical background in data mining and machine learning is a plus.
  • Self-driven, possesses the ability to work on multiple tasks and adapt to change.
  • Outstanding interpersonal and communication (written & verbal) skills.
  • Strong collaboration skills and ability to thrive in a fast-paced environment.
  • Successful candidate must be able to work with controlled technology in accordance with US export control law.

A little about us:
Where knowledge, reach and partnership shape healthcare delivery.

Know someone who would be interested in this job? Share it with your network.