Systems Engineer - Req. 1900745
The primary purpose of the Site Reliability Engineering (SRE) Systems Engineer at Raymond James is to improve and sustain the reliability of Raymond James’ critical IT systems. The role helps establish, shape, and measure service level objectives for critical systems. In addition SRE’s will continuously identify engineering and automation opportunities to help effectively manage enterprise production systems at scale. The SRE team focuses on minimizing impact of issues by identifying them in non-prod environments and watching system trends to find issues before they become outages.
Performs analysis and design tasks related to Application Performance Monitoring. Executes on strategic direction and develops tactical plans for monitoring mission critical applications. Position requires extensive contact with development, QA, and admin/operational staff. Is expected to communicate, analyze, schedule, and implement Instrumentation and Monitoring for critical business facing systems. Effectively identifies opportunities for change, implements change and introduces new concepts, procedures, policies and tools while providing a clear explanation of benefits and purpose.
- Collaborate with development teams to understand the most critical paths for applications under development.
- Use monitoring tools to uncover the backend dependencies for these applications and make sure that they are instrumented appropriately.
- Determines if all of endpoints identified above have been adequately covered with synthetic transactions.
- Engages development teams to ensure that operationally significant events are being emitted by each sub-system.
- Conduct negative testing to make sure that the notification mechanism setup within the various APM products are working and alerting the correct teams.
- Develop dashboards which show the overall health of a complex application. This will likely be accompanied by other dashboards showing the health of dependent systems.
- Work with QA Performance team to identify bottlenecks in applications where stress tests are being performed.
- Triage degraded / outages situations in a production environment in order restore system health.
Experience and Skills
Minimum of a B.S. in Computer Science, MIS or related degree and five (5) years of related experience or combination of education, experience and training.
Technical Skillset highly preferred:
- Experience with monitoring, alerting and enterprise tools such as: Dynatrace, Splunk, Data Dog, ServiceNow, MS Powershell etc.
- Event Management and Integrations (Tools like CA Service Operation Insight and Service Now, leveraging REST)
- Understanding of standard protocols/technologies such as Linux, DNS/WINS, TCP/IP, FTP, SSH, RDP, Active Directory, HTTP/S, IIS, JBoss, F5, etc.
- Experience creating dashboards and relevant visualizations.
- Strong team and collaboration skills
- Strong troubleshooting skills and experience developing/architecting enterprise applications
- Demonstrated experience influencing and selling new ideas or mindsets to peers, leadership and senior leadership.
- Proficient with Dynatrace, including:
- Building custom measures
- Building Business Transactions
- Creating incident rules avoiding false positives
- Building dashboards to show application health/KPI
- Use Dynatrace to triage a performance problem in any environment
- Proficient with Splunk
- Understand query, search and Alerts
- Build dashboards which will be packaged into Splunk applications
- Analysis: Identify and understand issues, problems and opportunities; compare data from different sources to draw conclusions.
- Communication: Clearly convey information and ideas through a variety of media to individuals or groups in a manner that engages the audience and helps them understand and retain the message.
- Exercising Judgment and Decision Making: Use effective approaches for choosing a course of action or developing appropriate solutions; recommend or take action that is consistent with available facts, constraints and probable consequences.
- Technical and Professional Knowledge: Demonstrate a satisfactory level of technical and professional skill or knowledge in position-related areas; remains current with developments and trends in areas of expertise.
- Building Effective Relationships: Develop and use collaborative relationships to facilitate the accomplishment of work goals.
- Client Focus: Make internal and external clients and their needs a primary focus of actions; develop and sustain productive client relationships.