About Us Acceldata is the market leader in Enterprise Data Observability. Founded in 2018, Silicon Valley-based Acceldata has developed the world's first Enterprise Data Observability Platform to help build and operate great data products. Enterprise Data Observability is at the intersection of today’s hottest and most crucial technologies such as AI, LLMs, Analytics, and DataOps. Acceldata provides mission-critical capabilities that deliver highly trusted and reliable data to power enterprise data products. Delivered as a SaaS product, Acceldata's solutions have been embraced by global customers, such as HPE, HSBC, Visa, Freddie Mac, Manulife, Workday, Oracle, PubMatic, PhonePe (Walmart), Hersheys, Dun & Bradstreet, and many more. Acceldata is a Series-C funded company and its investors include Insight Partners, March Capital, Lightspeed, Sorenson Ventures, Industry Ventures, and Emergent Ventures. As a Site Reliability Engineer, you'll join a team responsible for delivering support services to gold and enterprise customers. You’ll use your expertise to enhance the reliability and performance of Hadoop Data Lake clusters and data management services. Our SREs are expected to be platform- and vendor-agnostic while implementing and stabilizing Hadoop ecosystems, and they will develop strong business, interpersonal, and technical skills to provide high-quality service to our customers.
What makes you the right fit for this position?
Proven experience as a product SME, Support Engineer, or similar role in assisting customers, including exceptional troubleshooting and optimizing services, performance, security, and deployments, and effectively describing this in verbal and written format.
Solid expertise in Hadoop Operations, configurations, service tuning, implementation, and performance tuning in various Hadoop components, including HDFS, Yarn, Spark, Hive, HBase, Ranger, and Kafka. Additionally, expertise in Linux distributions such as RHEL, Ubuntu, and SUSE, ensuring confidence supporting our products so our customers succeed.
Soft Skills: Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts clearly.
Experience: 4+ years of related experience in customer-facing support, post-sales or consulting roles, focusing on assistance provided during mission-critical production systems.
Collaboration: Willingness to work with customers to understand their use cases, identify pain points, solid root cause analysis (RCA), and deliver tailored solutions within the agreed service levels.
Act as a designated or dedicated engineer for specific customers. Must be willing to work flexible shifts. Participate in a rotational weekend on-call roster for critical support needs. This involves building long-term successful relationships with customers and leading weekly status calls on customer sites.
We’re looking for someone who can:
You will use your expertise to improve the reliability and performance of Hadoop Data lake clusters and data management services. Just as our products, our SREs are expected to be platform and vendor-agnostic when it comes to implementing, stabilizing, and tuning Hadoop ecosystems.
Our Site reliability engineers work on improving the availability, scalability, performance, and reliability of enterprise production services for our products as well as our customer’s data lake environments.
You’d be required to provide implementation guidance, best practices framework, and technical thought leadership to our customers for their Hadoop Data Lake implementation and migration initiatives.
You need to be 100% hands-on and, as required, test, monitor, administer, and operate multiple Data lake clusters across data centres.
Troubleshoot issues across the entire stack - hardware, software, application, and network.
Dive into problems with an eye to both immediate remediations as well as the follow-through changes and automation that will prevent future occurrences.
Must demonstrate exceptional troubleshooting and strong architectural skills and clearly and effectively describe this in both a verbal and written format.