Datadog Monitoring Systems Engineer
Job Description
Job Title: Lead Monitoring Systems Engineer
Location: Washington, DC (100% Remote)
Reports To: Manager, Systems Monitoring Team
Job Category/Level: Systems/Monitoring/Lead
Background: We are seeking a Lead Systems Engineer to support Systems Monitoring initiatives for several upcoming projects in 2024 and beyond. This role will focus on the administration of systems and applications monitoring tools, with a strong emphasis on DataDog.
Key Responsibilities:
- Administer and maintain the DataDog monitoring tool on a Linux platform, including application performance monitoring (APM), log management, and network monitoring.
- Instrument Java-based applications running on Tomcat using DataDog.
- Configure centralized logging for various sources, ensuring seamless integration with DataDog.
- Create dashboards and data visualizations in DataDog to monitor key performance metrics.
- Develop and implement end-user monitoring and synthetic monitoring solutions using CloudBeat and Selenium scripts.
- Analyze tool data, prepare weekly status reports, and communicate potential issues to management.
- Collaborate with Systems and Application Architecture teams to ensure monitoring requirements are met during the development process.
- Provide training and documentation for monitoring tools and procedures.
Qualifications:
- Education: Bachelor of Science in Computer Science or related field, or equivalent experience.
- Experience: 5-8 years in IT with a focus on monitoring tools, including a minimum of 3 years of hands-on experience with DataDog.
- Strong experience with Linux platforms (preferably Red Hat) and Java application instrumentation.
- Familiarity with the ELK Stack (Elasticsearch, Logstash, Kibana) is a plus.
- Proficient in scripting languages (Python, Shell, Ansible) for automation tasks.
- Understanding of SSL setup, encryption methods, and network components (e.g., routers, switches, load balancers).
- Experience with systems monitoring strategies in large-scale environments and service level management.
Competencies:
- Excellent organizational, interpersonal, and analytical skills.
- Self-motivated with the ability to adapt to changing priorities and tight deadlines.
- Strong problem-solving initiative and technical proficiency.
- Effective communication skills, both verbal and written.
- Familiarity with Agile methodologies and software development life cycles (SDLC).
Preferred Qualifications:
- ITIL Foundations v3 certification (to be obtained within 180 days).
- SAFe certification.
- Experience integrating cloud monitoring solutions (e.g., AWS CloudWatch) with DataDog.
Similar Jobs
Explore other opportunities that match your interests
Technical Threat Investigator
OpenAI
Design Manager for Foundational Systems
OpenAI
Internal Audit Assurance and Advisory Director for Digital Technology