We are seeking a highly skilled
Data Software Engineer
to join our team and contribute to the development of a secure and innovative document flow solution hosted on AWS.
As part of our mission, you will collaborate with a team of experienced professionals to evolve an end-to-end information lifecycle solution, leveraging modern technologies like AWS Glue, Athena, and Apache Spark.
Your role will focus on ensuring the scalability, efficiency, and reliability of a cutting-edge system that simplifies digital document management for our global clientele.
Responsibilities
- Design, develop, and implement data pipelines and workflows using AWS Glue and related technologies
- Build and optimize scalable and efficient data models using Athena and S3 to support reporting and analytics
- Develop and maintain ETL processes with tools like Apache Spark to process high-volume data workloads
- Collaborate with BI analysts and architects to enhance processes for Business Intelligence and analytics
- Optimize the cost and performance of cloud solutions by adopting fully managed AWS services
- Maintain and improve CI/CD pipelines to ensure seamless integration and deployment
- Monitor the solution for performance, reliability, and cost efficiency using modern observability tools
- Support the development of reporting dashboards by providing accurate and timely data models
- Deliver high-quality code while following best practices for testing and documentation
- Troubleshoot and resolve issues with data workflows, ensuring system uptime and reliability
Requirements
- 2+ years of working experience in data engineering or software development with a strong focus on AWS services
- Proficiency in AWS Glue, Amazon Athena, and core Amazon Web Services tools like S3 and Lambda
- Expertise in Apache Spark, with a strong background in developing large-scale data processing systems
- Competency in BI process analysis, with the ability to work with analytics teams to optimize reporting workflows
- Familiarity with SQL, building complex queries for data extraction and transformation
- Understanding of data lake and ETL architecture concepts for scalable data storage and processing
- Knowledge of CI/CD pipelines and competency in integrating data workflows into deployment frameworks
- Flexibility to use additional tools such as Amazon Kinesis, Apache Hive, or Elastic Kubernetes Service
- Excellent communication skills in English, with a minimum proficiency level of B2
Nice to have
- Experience with Amazon Elastic Kubernetes Service (EKS) for containerized application orchestration
- Familiarity with Amazon Kinesis for real-time data streaming and event processing
- Understanding of Apache Hive and its applications in data warehousing
- Background in BI toolset operations, improving Business Intelligence platform efficiencies
- Proficiency in Java or for extending data processing capabilities
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn