What's next in DevOps and what on EARTH is Platform Engineering? Nati Shalom, Cloudify (Dell) & Eran Bibi, Firefly

בפרק הזה של פיתוח בהפרעה, ישי בארי, CTO ב LinearB מארח את נתי שלום, CTO ו Co-Founder ב Cloudify ואת ערן ביבי CPO ו Co-Founder ב Firefly. הם צוללים לעומק על כל עולם ה DevOps שעבר אבולוציה מעניינת ב 15 שנים האחרונות - מ SRE ל Platform Engineering ו Production Engineering וכל מה שביניהם. איך ייראה עולם ה DevOps בעתיד הקרוב, ואפילו היותר רחוק, וגם למה אנשי DevOps צריכים להיערך ואיזה skills לפתח כדי לעמוד באתגרים המשמעותיים הבאים.

In our next episode of Dev Interrupted - the Hebrew Edition, Yishai Beeri, CTO at LinearB, hosts Nati Shalom CTO & Co-Founder at Cloudify, and Eran Bibi CPO and Co-Founder at Firefly. Together they take a deep dive on the evolution in the world of DevOps over the last 15 years - from SRE to Platform and Production Engineering, and everything in between. What the world of DevOps will look like in the immediate future and farther along, and even the things DevOps engineers need to prepare themselves for and which skills they need to evolve to overcome the next big challenges coming their way.

Episode Transcript תמליל הפרק

Hebrew, then English* בעברית ואז אנגלית:

(*Translated with Google Translate - so there may be some errors)

(מוסיקת פתיח)

ישי: בפרק הזה אני שמח לארח את נתי שלום, מייסד ושותף ו-CTO ב-Cloudify, שממש לאחרונה נרכשה ע"י Dell, מברוק.

נתי: תודה רבה.

ישי: ואת ערן ביבי, מייסד ושותף ו-Chief Product Officer ב-Firefly.

ערן: אהלן.

ישי: אהלן, איזה כיף שבאתם, אנחנו ממש מתרגשים, אנחנו בפתיחת העונה השנייה שלנו, והיום אנחנו נדבר על developer experience, על „DevOps, על platform engineering, אבל קודם כל אני אשמח אם כל אחד יספר לי בכמה מילים על המסלול שלו, איך הגעת עד הלום, ככה נקודות בקריירה. נתי, בוא נתחיל איתך.

נתי: סבבה, אז ערן אני מתנצל, זה יהיה טיפה ארוך כי אני קצת יותר מבוגר.

ערן: זה בסדר גמור.

נתי: אז אני התחלתי בעצם בכלל בתחום הטכנולוגיה בכיתה ח', מכון ויצמן, היה שם איזה קורס שאין לי מושג למה זרקו אותי אליו אבל זה היה להרים טרנזיסטורים וכל מיני דברים כאלה ואז התחלתי לגלות עולם וזה המשיך בהמשך עם מתנת בר מצווה שזה גם היה לבנות קיטים של רדיו וכל מיני דברים כאלה. ואז הבנתי שאוקיי, זה מה שאני רוצה לעשות, המשיך בתיכון במגמת אלקטרוניקה, בצבא עשיתי איזה פיבוט. אמרתי אוקיי, עכשיו אני רוצה לגוון קצת, לעשות דברים שאני לא אעשה בחיים, שזה היה בעצם ללכת ליחידה קרבית, בשונה מהמסלול הרגיל של האנשים מהסוג הזה, וזה באמת היה אחד הדברים היותר טובים שעשיתי אני חושב בחיים. ואז חזרתי למסלול, סיימתי לימודים באוניברסיטת קובנטרי באנגליה, כחלק ממסלול של החלפת סטודנטים, שזו גם הייתה חוויה בפני עצמה. ומאז אני בעצם, אני חושב שאני מביא בעיקר הרבה מאוד ידע מולטי-דיסציפלינארי, גם מצד אחד hardware, גם מצד אחד software, DevOps, היום בעולמות ה-cloud. אבל בזמנו זה היה גם distributed computing, hpc, מערכות מידע וגם אפליקציות, hardware ו-software כבר תיארתי. ובעולמות של ה-cloud אני חושב שהידע המולטי-דיסציפלינארי הזה כמעט הכרחי כי בעצם ה-cloud הופך את הכל לקוד ובעצם מערכות הופכות להיות משהו שהוא הרבה יותר הוליסטי ולא רק פונקציה מסוימת של אפליקציה א פונקציה מסוימת של data center. זה in a nut shell, היום מן הסתם יזם סדרתי, הקמתי חברה שנקראת GigaSpaces שהתעסקה באמת במערכות מבוזרות לפתרון של בעיות דאטה, נקרא NoSQL, והתרכזה בעיקר בלעשות את זה על בסיס של פלטפורמת של זיכרון. משם צמחה Cloudify שבעצם נבנתה כתשתית לניהול של סדרות מבוזרות. ו-Cloudify כמו שאמרת, נקנתה לא מזמן ע"י חברת Dell כשאנחנו אומרים לא מזמן זה החודש האחרון. מן הסתם התהליך הרבה יותר ארוך, ואנחנו אולי נדבר על זה קצת יותר בהמשך, טרי מאוד ואני מתרגש לחזור לדבר על דברים אחרים בוא נקרא לזה ככה.

ישי: כן, אז שוב מזל טוב ואנחנו קצת נצלול למה זה אומר ואיך זה נראה חודש אחרי ה-acquisition. ערן, ספר גם קצת עליך.

ערן: אז אני טיפה יותר צעיר מנתי, בן 38, אבל עדיין רלטיבי לחבר'ה שאני עובד איתם היום אני נחשב המבוגר שבצוות.

ישי: דינוזאור.

ערן: ממש, וקשה לי לחשוב על עצמי ככה אבל זה נהיה ככה לאט לאט. אז באמת אני התחלתי את החיבה למחשבים בגיל יחסית צעיר, שהיה לי את ה-PC הראשון אז מאוד נהניתי לפרק אותו ולהבין מה כל רכיב עושה והיה לי איזשהו passion לא מוסבר לדבר, היה לי את הזכות להשתתף בניסוי של האינטרנט המהיר, לפני שזה היה מה שנקרא mainstream. ואז זה באמת חשף אותי מאוד מוקדם לגישה לאינפורמציה 24/7 בזמן שאנשים, התחברו לאינטרנט רק בימי שבת כשהיה זול, עם הקו המאוד איטי. ואז ממש העמקתי את התשוקה הזאתי ואת הסקרנות בעולמות האלה. מאוד התחברתי לנושא התשתיות והנטוורקינג ו-system administration. עסקתי בזה בצבא ואחרי שהשתחררתי אפילו העמקתי קצת את הידע שלי בעולמות של מערכות הפעלה לינוקס, עשיתי איזשהו מסלול של Red Hat, בזמן שכולם כזה למדו מייקרוסופט אני דווקא הלכתי לצד השני להעמיק את הידע שלי בעולמות של היוניקס ולינוקס, מאוד אהבתי את זה. כמו כל אחד בן גילי עבדתי בחברת קומברס, זה איזשהו צ'ק כזה שצריך לעשות בקריירה, ומשם התגלגלתי גם לחברות סטארטאפ וגם לחברות אנטרפרייז קצת יותר גדולות. שהגיע הענן בשבילי זה היה חיבור מאוד טבעי כי זה שילב גם את האהבה שלי לתשתיות ומערכות הפעלה וגם פיתוח וסקריפטים, אז נכנסתי לנישה של ה-DevOps בשלב יחסית מאוד מוקדם, זה היה בערך ב-2012, הייתי hands on הרבה מאוד שנים ואז עברתי לתפקידי הובלה ו-leadership. לפני שייסדתי את Firefly הייתי אחד העובדים הראשונים ב-Aqua Security, היה לי שמה את הזכות להקים צוות DevOps, מאוד משמעותי, עם כל הדיסציפלינות הפופולריות SRE, platform engineering, וקיבלתי שם הרבה מאוד תיאבון לסטארטאפים ויזמות. והחלטתי לצאת למסע משל עצמי וחברתי עם שותפי והקמנו לפני שנה וחצי את Firefly, מוצר שבאמת נותן מענה בעולמות האלה שאני מרגיש מאוד בנוח איתם, עולמות ה-DevOps.

ישי: וכשהלכת לייסד את Firefly, אז זה פעם ראשונה שאתה פורמלית בתפקיד product? בניגוד ל-hands on והובלה של DevOps וקצת יותר ברזלים?

ערן: בדיוק ככה, אבל לי זה הרגיש מאוד טבעי בגלל שהרבה מאוד מהעשייה, כמישהו שמוביל DevOps היה לייצר מוצרים פנימיים ללקוחות הפנימיים שלנו בתוך הארגון, אז לאפיין מוצר ולדעת איך לעשות אותו זה היה משהו שהרגיש לי מאוד טבעי אבל כן, נכון, זה התפקיד product הראשון שלי, מה שנקרא במקצוע, בטייטל.

ישי: הלכת עד הסוף, ל-Chief. פעם ראשונה ואני ב-Chief.

ערן: כן, קיצרתי, עשיתי כמה קיצורי דרך בהקשר הזה, אבל בסוף אני בונה היום מוצר לקהל שאני בעצם הקהל יעד שלו. אז אני הכי מכיר את הפרסונה, אני הכי מכיר בעצם איך לייצר מוצר ל-target audience הזה. אז זה מצד אחד קיצור דרך, מצד שני אני חושב שזה אחד התכונות הכי חשובות למנהל product זה באמת להכיר למי אתה מוכר.

נתי: האמת שזה אגב נקודה מאוד חשובה, כשאני מסביר נניח איך ב-Elasticsearch ואיך אפילו מארק צוקרברג הצליח בפייסבוק, זה הרבה פעמים הנקודה הזאתי. שהרבה פעמים קל לראות את הדברים כמו גוונים של אפור, זאת אומרת מישהו מבחוץ שלא חי את ה-domain, יראה את הניואנסים הקטנים האלה של הפיצ'ר, נניח במקרה של שי, אז הוא ראה את הצורך ב-API פשוט. לא רק דאטה בייס שיודע לעשות סקיל, אלא גם שיהיה מאוד נגיד למפתחים, פתוח UI, יותר API, שזה מה שמפתחים רואים. אותם דבר גם Terraform, Terraform הבינו שהמשתמש שלהם זה איש DevOps שחי בתוך פייפליין של CI/CD, רוב הפתרונות, כולל Cloudify, חיו מאוד טוב בעולמות של אופרטורים אבל פחות חיו טוב בעולמות של developers. הזיהוי הזה והיכולת לזהות את החדות הזאת היא הרבה פעמים יכולה לבוא מאנשים שבאמת חוו את הבעיה, והרבה פעמים כשפונים כל מיני מנג'מנטים שבאים ואומרים אוקיי, אז יש לו ניסיון קודם בזה, אבל הם לא מכירים את המשתמש בקצה, הם לא היו המשתמש בקצה, אז זה נקודה סופר חשובה, הם יראו את הגוונים של האפור, והגוונים של האפור היום הם שמבדילים בין הצלחה לאי הצלחה. וזה לדעתי נקודה שהרבה מפספסים אותה אבל גם בעיקר בעולם של DevOps שכמעט כל הכלים נראים דומים, מדברים כמעט את אותה שפה, עושים דברים מאוד דומים, וגם בעולמות של security, וזה הרבה פעמים מה שאני רואה שמבדיל בין מוצר שמצליח בסוף למוצר שלא מצליח. כולם מנסים לפתור בעיות דומות אבל אלה שמצליחים לזהות את ה-user בקצה ולהבין בדיוק את הבעיה שלו, הם אלה שבסוף מנצחים.

ישי: אני מסכים, אני חושב שהחברות, הסטארטאפים שעובדים על מוצרים שהם בעצמם הלקוח, ושהמפתחים, בסוף מי שפונה למפתחים, לצוותי DevOps והאנשים שבונים את המוצר הם אל שמכירים firsthand את השימוש וזה עולם אחר. אני רואה אצלנו ב-LinearB, אנחנו מוכרים מוצרים לארגוני פיתוח, וכל אחד מהמפתחים אצלנו יודע, מבין בעצמות, מה זה המוצר הזה. והוא גם משתמש בו והוא לא צריך תרגום מאנשי ה-product או מי שמדבר עם הלקוחות למה הם צריכים, וזה עושה הבדל של שמיים וארץ ביכולת.

ערן: אני חושב שאפילו ב-domain הזה של DevOps זה אפילו עוד יותר קיצוני בגלל ש-DevOps הוא תמיד בארגון מן קפסולה כזאת שגם ה-IR management לא באמת מבין למה, בוא נגיד את השאלה הכי חשובה, למה צריך כל כך הרבה אנשי .DevOps. זה אנשים מאוד מוכשרים אבל מאוד יקרים מבחינת העלות של הארגון, וכל הזמן יש דרישה לעוד וגם הם יושבים בדרך כלל באיזשהו separation מה-developers, מה-day2day שלהם, אז להכיר באמת איך נראה היום יום של האיש DevOps או של המנהל DevOps, זה מאוד קשה להבין את זה מבחוץ, אבל הרבה יותר קל אם עשית קדנציה באחד התפקידים האלה.

ישי: כן, והם גם לרוב לא מקושרים לפיצ'רים או ליכולות, אז המנהלים אומרים רגע, כל כך הרבה מאמץ הולך שמה אבל למה זה הולך? במה זה תומך לי או איך, הם לא בונים פיצ'רים, ברוב המקרים, אז זה עוד רמה של איך אני כמנהל מבין שצריך להשקיע פה. כי רוב ה-value נתפס כפיצ'ר, אוקיי? יש עוד capability, אני הולך ומוכר אותו.

נתי: אני חושב שדווקא היום במשבר שיש עכשיו בתעשייה, יש לדעתי כבר יותר הבנה של ה-value של הדבר הזה. כי מן הסתם כשאתה צריך לעשות יותר בפחות, אתה חייב אוטומציה, שאתה צריך אוטומציה אתה צריך אנשים מהסוג הזה. ואני חושב שההבנה הזאתי כבר יותר קיימת גם ב-management layers. זאת אומרת למעשה, פודקאסט קודם שעשיתי עם Wix, ה-initiative של platform engineering הגיעה מהמנכ"ל, שזה היה מאוד מעניים לשמוע אותו, הוא לא הגיע מהמפתחים, הוא הגיע מהמנכ"ל, למה? כי הוא הבין שהוא צריך velocity, למה הוא הבין שהוא צריך velocity? כי התחרות שלו רצה קצת יותר מהר. אז אני חושב שזה משתנה בהקשר הזה וזה מאוד נכון, הם לא יודעים לתרגם את זה בסוף לאופרציה. זה הם צריכים עדיין את האנשים עצמם, אבל אני חושב שדווקא המגמה הזאת בשוק שבו עכשיו כולם צריכים להסתכל על efficiency ולא רק כמה שיותר משתמשים, לא משנה מה ה-cost. מחזק את ההבנה של הערך, אני חושב של אוטומציה ושל אנשי DevOps, ופתאום הם הופכים להיות אנשים שיותר מעריכים את הערך שלהם בתוך הארגון ובאמת כשאנחנו נדבר על platform engineering זה באמת מקרה שונה בגלל שהוא כבר פחות צומח מלמטה אלא יותר גם צומח מלמעלה.

(מוסיקת מעבר)

ישי: מעולה, אז אנחנו מדברים על DevOps, זאת מילה טעונה, מונח שהשתנה, ואם אני חושב על מה עושה איש DevOps או מה זה אומר תפקיד או מה התפקיד של DevOps בחברה, זה עבר אבולוציה, זה השתנה מאוד ב-10, 15 שנים האחרונות. עם כל מיני תתי התמחויות ויש SRE, יש כל מיני עולמות שונים, איך אתם מסתכלים על השינויים האלה שעברו ועוברים על הסל הזה שנקרא DevOps?

נתי: אני, דווקא בגלל שעכשיו עברתי ל-Dell, מצאתי את עצמי פתאום מסביר את זה להדיוטות מה שנקרא, לאנשים שפחות באים מהעולם, אז ניסיתי למצוא דרך להסביר את זה בצורה נורא פשוטה, את האבולוציה הזאתי שקרתה. ואז בעצם אם מסתכלים על זה היו פה 3 גלים עיקריים, זאת אומרת היה את הגל שפה באו ואמרו אוקיי, עברנו לענן, אנחנו צריכים אוטומציה, רוב המפתחים שלנו לא יודעים לעשות אוטומציה, אנחנו צריכים אנשים שמבינים באוטומציה, קראנו לזה קבוצה, וזה בד"כ הייתה קבוצה מאוד מרכזית בארגון שהייתי אומר תעשו לי אוטומציה ל-RDS, תעשו לי אוטומציה ל-VM, תעשו לי אוטומציה לזה, והם היו כותבים סקריפטים שעושים את האוטומציה. זה עבר למצב שבו רוב המפתחים כבר הבינו מה זה ענן ואז הצוות המרכזי הזה נהיה bottleneck, אמרו אוקיי, אני לא יכול לחכות לצוות המרכזי הזה שיעשה לי כל דבר ושאני אבקש ממנו ואני אקבל תוצר וזה לא יהיה בדיוק מה שאני רוצה, אז אני כצוות פיתוח אקח אחריות על הדבר הזה ואנשים שלי, זאת אומרת המפתחים, יתחילו להתעסק באוטומציה. ואז נוצר מצב של ביזור האוטומציה בתוך הצוותי פיתוח וה-infrastructure as code השתלב פה בצורה מאוד מאוד יפה. הגענו לנקודה שבה נוצרה בעיה, כי הביזור הזה היה ביזור גדול מידי ושיצר בעיה של קונסיסטינטיות ושל governance, זאת אומרת אוקיי, עכשיו כל אחד עושה מה שהוא רוצה, דמוקרטי מידי, המילה דמוקרטי היום טעונה, לא ניכנס לזה עכשיו, (צוחק)

ישי: (צוחק) …מי מעביר ??? את אנשי הדבאופס?

נתי: אבל זה נהיה דמוקרטי מידי...צריך רפורמה. אז בכל מקרה הבינו שצריך פה סדר. עכשיו חזרה לצוות מרכזי גם לא מסתדר כי אז אנחנו נחזור לבעיה של ה-bottleneck, שהצוות המרכזי יהיה bottleneck. ואז נכנס העולם של platform engineering שבעצם אומר במקום אנשים שיהיו דמות מרכזית, תהיה פלטפורמה מרכזית שהיא תהיה ה-self service, שאנשים יבנו את הפלטפורמה אבל הם לא יהיו ה-interface של המפתחים. ה-interface של המפתחים יהיה API, service שבעיקר ייקח את הדברים שהם חוזרים על עצמם וברוב הארגונים בסוף יש איזה cookie cutter, יש איזה משהו שהוא חוזר על עצמו והוא לא, לא כל ארגון ממציא את ה-cloud שלו כל פעם מחדש. ואז זה הרבה יותר סקיילבילי, זה כאילו איזשהו אבולוציה, הייתי אומר למקום שבו עכשיו חזרתי כאילו לצוות מרכזי, אבל הצוות מרכזי הוא לא ה-interface שלי אלא ה-API הוא ה-interface שלי ולכן לא תהיה לי בעיה סקיילינג.

ישי: אז הצוות המרכזי ירד קומה והוא בונה את התשתית.

נתי: בונה את הפלטפורמה, בדיוק.

ישי: ואם ניקח רק דוגמה סופר פשוטה, אם פעם הייתי צריך לפתוח ticket כדי שירימו לי bucket, אח"כ אני פתחתי את ה-bucket לבד, בתור צוות, אבל זה יצר בעיות של אחידות או של policy, ועכשיו יש לי API או שירות שמייצר לי באקטים לפי ה-policy הארגוני אבל שאני עושה את זה ב-self-service ולא מחכה לבנאדם.

ערן: נכון, בדיוק, אני חושב שנתי תיאר את זה בצורה פנטסטית, מה שקרה בעשור האחרון זה כל מיני ניסוי וטעיה של מודלים של איזה מודל עובד הכי טוב. בשלב מסוים ראינו שבאמת צוותי ה-DevOps הפכו להיות bottleneck לא הגיוני, במקום להגביר את ה-velocity הם האטו את הפיתוח בגלל שכולם היו תלויים ב-DevOps ואף פעם לא היה מספיק אנשים. וזה בעצם פגע ביכולת של הפיתוח לפתח בצורה מהירה. הם גם שמרו על איזשהו expertise שכל הנושא הזה של הענן לעצמם, ואני לא חושב שה-DevOps אשמים בזה, אני חושב שיש גם לצד הפיתוח, הפיתוח לא תמיד הראה נכונות לבוא וללמוד את כל הנושאים האלה לעומק, דברים שקשורים לתשתיות או ל-infrastructure as code, לא תמיד ראינו את הנכונות מהצד של הפיתוח להיות expert בזה ובסופו של דבר אני יכול להבין את זה. הם רצו להתעסק ב-business logic ובדברים שהם באו לעשות, שזה בעצם לפתח פיצ'רים ולהביא את המוצר עוד שלב קדימה ופחות להתעסק ב-operations, והנושא הזה של platform engineering. אני חושב שזה איזשהו מודל שהוא נותן מענה בעצם לשני עולמות. מצד אחד יש לך צוות אחד שהוא באמת expert בנושא הזה של תשתיות, מצד השני הוא לא bottle neck בגלל שהוא מספק את אותו framework שמאפשר חופש פעולה מבוקר לעומת, אתה יודע, שכל אחד עושה מה שהוא רוצה, זה באמת מבוקר בעזרת הכלים ששומרים על אחידות ו-governance ו-security וכל האספקטים גם של עלויות, זה עכשיו נושא מאוד פופולרי, אני פותח סוגריים, לפני כמה שנים למנכ"ל לא היה כל כך חשוב כמה אתה משלם ענן כי כולם רצו לרוץ מהר ואף אחד לא היה בפוקוס של optimization, היית בפוקוס של בוא נרוץ מהר ואל תשקיע את הזמן שמה. והיום כולם מסתכלים על העלות ענן, פותחים עיניים, אומרים וואו, אני מוציא שמה מלא כסף, חייב לעשות optimization, והרבה מזה נובע בגלל שאנשים רצו ללא בקרות.

ישי: כן, זה פשוט, המשקיעים חיפשו growth בכל מחיר, אז אני הולך על growth בכל מחיר. עכשיו הם מסתכלים לי על margin, אני צריך לטפל ב-margin.

ערן: בדיוק, ה-platform engineering בעצם מאפשר את השליטה, השליטה בכל האספקטים, חלק מזה גם השליטה ב-cost. לדאוג שאותם מכונות שהמפתחים מרימים בבוקר יהיו סגורים בלילה וזה סתם דוגמה מאוד קלאסית למשהו שאפשר לעשות בקלות.

נתי: עכשיו הרבה פעמים כשיש גלים כאלה אז אומרים רגע, זה נורא, זה כל כך obvious, איך לא חשבנו על זה קודם? ופה נכנס הנקודה של maturity, זאת אומרת לקח זמן להגיע למצב שיש איזושהי סטנדרטיזציה מסוימת בשוק. והסטנדרטיזציה נניח סביב Kubernetes, סביב Terraform, בלי זה לא היינו מדברים על platform engineering, וזה גם חשוב להבין. כי כשכל אחד היה לו את הכלים שלו ודברים מאוד שונים בין ארגון לארגון, אז מן הסתם אתה לא יכול לבנות פלטפורמה גנרית שתפתור את הבעיה הזאתי ואתה לא יכול לייצר כלים כמו Firefly ועוד דברים אחרים שהיום צומחים בגלל שיש את הסטנדרטיזציה הזאתי. יש תעשיה שלמה שצמחה סביב ה-ecosystem של קוברנטיס שלא הייתה קיימת אם לא היה דבר כזה שנקרא קוברנטיס וכל הארגונים היו משתמשים בזה. ואז נוצר מצב שפתאום כולם מוצאים את עצמם באיזשהו, עושים את אותו דבר, כולם מרימים קוברנטיס קלאסטר namespace ל-development קלאסטר ל-production, קלאסטר נפרד ל-development, נהיה משהו שהוא מאוד קאנוני ולמה אני צריך לבזבז את הזמן כל פעם מחדש?

ישי: 95 אחוז עושים בדיוק את אותו דבר.

נתי: בדיוק את אותו דבר עם ניואנסים שונים, אני צריך את זה בריג'נים כאלה ולא ריג'נים כאלה, עם אופטימיזציה כזאת, אבל לא הרבה הבדלים וזה נהיה מאוד חוזר על עצמו. ואז זה מקרה קלאסי שבו, כשהבעיה הופכת להיות גנרית, אז באה הפלטפורמה. זה לא היה קורה לפני, ב-2015-16, אז אני רק שם את זה בפרספקטיבה שלא נחשוב בואנה, היינו טיפשים אז, עכשיו אנחנו חכמים היום, איך לא היינו חכמים אז, מה זה התעשייה המוזרה הזאתי שקוראים לה הייטק? שכל הזמן ממציאה את הגלגל.

ישי: כן, טוב, לפני 20 שנה אנחנו, כדי להרים שרת היינו צריכים לשלם 15 אלף דולר ולחכות חודשיים שהוא יגיע לכלוב, ומישהו היה צריך לנסוע לשם ולעשות לו install.

ערן: המעבר הזה בכלל מאנשי תשתיות קלאסיים, system administrator ל-DevOps, ל-platform, הוא באמת נבע מטכנולוגיות שמאפשרות לעשות את הדברים האלה. אם לא היה ענן, עדיין היינו מתעסקים בדאטה סנטרים, וזה לגמרי נכון, האחידות שפלטפורמות קונטיינרים נותנת היום היא באמת מייצרת איזשהו יישור קו ומכנה משותף מאוד גדול בין הארגונים.

ישי: מה לגבי השיפט בפוקוס של פונקציית ה-DevOps משאלות של uptime ו-runtime, בסוף בקצה אנשים לפעמים מותחים את ה-DevOps ל-runtime, ל-SRE, ל-Ops ממש מי קם בלילה או איך אני יודע שמשהו נפל בלילה, ובצד השני, אני חושב שזה שיפט שחלק מהתהליך הזה של לעבור ל-platform engineering והשאלות של הכמעט כל מה שאתה תיארת עכשיו בשלושה שלבים, נדבר על ה-velocity של הפיתוח. אז יש מצד אחד velocity ו-enablement לפיתוח, ואיך אני משפר את ה-developer experience, ובקצה השני uptime. איך זה השתנה הפוקוס?

נתי: אני חושב שיש פה שתי נקודות, זאת אומרת אני דווקא אגע ב-developer experience שבעצם נזנח הרבה זמן. למה נזנח הרבה זמן? שוב פעם, עניין של אבולוציה, כי רוב הזמן התעסקנו באיך עושים אוטומציה לתשתיות, אז מן הסתם ה-developer experience קיבל תוקף משני לצורך העניין. היום, כשהפלטפורמות טיפה התייצבו אז פתאום developer experience נהיה מאוד מאוד חשוב. developers מוצאים את עצמם עושים הרבה מאוד דברים 90 אחוז מהזמן שלהם בחלק מהסטטיסטיקות, שהם לא ה-core business שלהם, לא יודעים לנהל תשתיות באמת, ובמקום לפתח קוד, אז הנושא של developer experience פתאום מקבל עכשיו חשיבות. ולכן האנשי platform לצורך העניין היום צריכים לתת הרבה יותר דגש על זה, הם נמדדים על זה והם צריכים לתת את זה כי זה הציפייה של המפתחים בארגון מהדבר הזה. ה-uptime נוצר מעצם זה שאתה עובר לפלטפורמה, זאת אומרת אם עד היום הייתי צריך להחזיק את ה-database ולהחזיק את הקלאסטרים ולהחזיק את זה ולשים בן אדם שמקבל אלרטים ואז הוא צריך לקום בלילה ולעשות את זה, אני הופך להיות יותר product, זאת אומרת ה-DevOps בעצמם הופכים להיות אנשי product, שזה גם שינוי מאוד גדול ה-DevOps. בתוך הארגון הוא product, ואני מנהל אותו כמו product ואני לא מנהל אותו כמו פייפליינים וסקריפטים שכל הזמן זזים ונעים וזה שינוי מאוד גדול, כי אלף זה אומר שאנשי DevOps הם כבר לא רק אנשי אופרציה, הם מפתחים לכל דבר בעצמם. זה אומר שהם צריכים לחשוב כמו product והרבה פעמים אתה רואה שחלק מהשינוי הארגוני הוא לשים איש product לצוות עצמו, ולתכנן ריליסים ולתכנן את כל מה שאנחנו מכירים מעולם ה-product.

ערן: אגב, אתה רואה את זה?

נתי: אני רואה את זה קורה בארגונים כמו מן הסתם Wix ו-AppsFlyer ועוד ארגונים אחרים שהם גדולים יחסית, שיש בהם הרבה מאוד צוותים, אז זה קורה הלכה למעשה. ארגונים יותר קטנים זה עוד לא קורה, וזה בד"כ גם עניין של אבולוציה, אני חושב שככל שהפלטפורמות יתייצבו, נראה את זה גם בארגונים קטנים. בארגונים גדולים זה בוודאות.

ישי: אתה בעצם אומר נתי שהחלק מה-maturity, חלק מהבשלות של platforms ו-platform engineering הופך את ה-uptime למשהו שהוא פטור? אם יש לי service, הוא בנוי טוב, אז ה-uptime שלו יהיה בסדר, ועכשיו אני יכול להתמקד ב-velocity?

נתי: אז אני אומר כזה דבר, בוא נסתכל שנייה מה זה היה ה-uptime בעבר ומה זה uptime היום. היום הענן עצמו נותן לך הרבה מאוד תשתיות שהן, נניח אם אנחנו מדברים על EKS ועל קוברנטיס ועל כל מיני דברים, יש המון פתרונות שמבטיחים לך יחסית תקרה מאוד גבוהה למה ה-cloud מבטיח לך כ-uptime הרבה יותר גבוה בסטאק ממה שהיה קודם. מקודם זה היה VM-ים ובסטוראג'ים ואת כל ה-layer העליון היית צריך לדאוג לו. אז יש לך כבר כיסוי הרבה יותר גבוה שהוא granted מה-cloud providers. נשאר לך עדיין ה-uptime של האפליקציה וזה עוד layer שיש לך אחריות עליו אבל הוא הצטמצם. הדבר השני שקשור ל-uptime זה באמת הפרוצדורות. זאת אומרת איך אני בעצם דואג שכל התהליך עצמו של ה-continuous updates וכל הדברים האלה לא ישבור לי תהליכים? וגם פה נוצרו הרבה מאוד כלים שכבר יודעים לעשות את העבודה הזאת בצורה יותר יציבה. אז פשוט הדלתא שהיום אנשי DevOps צריכים להתרכז בה הצטמצמה. ומצד שני החלק שהם לא נגעו בו כמעט, שזה פיתוח ממש של מוצר, ושזה התמודדות עם UX של מפתחים, זה הופך להיות חלק יותר גדול שהם נמדדים עליו וזה transition. זה חלק קשה אגב, לא הרבה יעברו אותו, יהיו הרבה אנשי DevOps שלא יתאימו ל-transition הזה. וזה קורה עכשיו וזה ב-transition והוא לא קל, להרבה ארגונים הוא מאוד קשה אפילו.

ישי: ערן, איך אתה רואה את ה-transition הזה מפוקוס על uptime ובנייה של infrastructure לפוקוס על developer experience ו-velocity?

ערן: בסופו של דבר, איך שאני רואה את זה–פעם כל מי שהוא לא היה בפיצ'ר teams או ב- scrum team שכותב את הפיתוח, זה היה אנשי ה-DevOps, ואז זה לא משנה אם בספקטרום אתה מסתכל על developer experience מצד אחד, או על production engineering, זה כאילו אלה האנשים שעושים את זה. אבל אם אתה באמת מבין את הדינמיקה, מדובר בעצם במקצוע אחר קצת. כלומר אותם אנשים שאחראים על ה-reliability של ה-service וה-scalability וכל הדברים של מה שנקרא non-functional testing, האם זה עומד ב-scale, האם זה bulletproof, כל הנושא הזה של ריג'נים ומה קורה ב- disaster recovery. זה בעצם אותו תת קטגוריה של DevOps, ה-SRE או ה-production engineering, אנחנו רואים את זה בהרבה מאוד ארגונים, שזה צוותים ממש אחרים שזה ה-expertise שלהם. מה שכן רואים ואיך זה מתחבר לפלטפורמה, שהם גם אחראים באיך לייצר את אותו service של מפתח עם אותן מטריקות, עם אותם ה-guardrails שקשורים ל-monitoring, ואני רואה את זה מחלחל לתוך הצוותי פיתוח. כלומר העולם הזה הוא לא באמת מנוהל בצד ע"י צוות ייעודי, אלא אותו צוות ייעודי שהוא ה-subject matter expert בעולמות האלה של monitoring ו-engineering, הוא בעצם נותן את התורה ודואג באמת ש-service שהוא מונגש ע"י הפלטפורמה, יכיל גם את אותם אלמנטים שקשורים ל-monitoring של האפליקציה ולא רק ה-business logic. כלומר האם יש לך את ה-health check שצריך? האם זה יודע לעשות scale ומה קורה במצבים אם ה-service למטה? וכמו שנתי אמר גם הרבה מאוד מהטכנולוגיות החדשות כמו קוברניטיס, הם עושים אבסטרקציה לזה. קוברניטיס הוא בגישה ש pod, שבעצם מריץ את האפליקציה, הוא יכול למות בכל רגע וזה לא משנה, אין עם זה בעיה ש pod קורס, כי קוברניטיס הוא יעשה את האורקסטרקציה ויעלה pod אחר או שתמיד יש לך ב deployment יותר מ pod אחד.

ישי: תחת ההנחה שהאפליקציה מכירה את זה ויודעת להסתדר.

ערן: נכון, וזה התפקיד באמת של אותם אנשי פלטפורמה, לדאוג שמה שמתאר את האפליקציה יכיל את אותם רכיבים כדי שקוברניטיס יעשה את העבודה שלו כמו שצריך.

נתי: פה נכנס גם הנושא של ה shift left, זאת אומרת בעבר אותם SRE מה שהם היו עושים זה מתקנים בדיעבד. זאת אומרת מישהו היה עושה משהו, היה עושה אותו משהו לא נכון, בודקים, אה, לא עשית את הקונפיגורציה נכון? ואומרים 95 אחוז פחות או יותר מה-downtime הם נובעים מטעויות אנוש. ואחד ה-downsides של אוטומציה זה שזה עושה, אני מנסה לחשוב על המילה בעברית, איזושהי אמפליפיקציה של הבעיה.

ישי: הטעות מגיעה על production.

נתי: בדיוק, יש לך בסופו של דבר בלאסט ספקטרום הרבה יותר גדול לכל טעות. וה shift left נועד בעצם להעביר את האחריות מ-, לצורך העניין לזהות את הבעיה ב-production, לזהות אותה בשלבי הפיתוח. אז התפקיד של אנשי ה-DevOps היה משתנה מהתפקיד של מוניטורינג ולזהות בעיות שאחרי, ללייצר policy, סקריפטים שעושים ולידציה, זאת אומרת לא להיות הבנאדם שהוא ה-bottleneck לצורך העניין, אלא לייצר את ה-policy שמראש ידאג שכל מה שעובר דרך הפייפליין הזה כנראה יהיה production grade. וזה תפיסה אחרת לאיך אני, לתפקיד שלי לצורך העניין כ-SRE או מה שזה לא יהיה, אני עכשיו מייצר policies. אני לא מדבר עם המפתח ומבקר אותו, אני מייצר policy שמבקר אותו כדי שהמפתח לבד יראה שהוא טעה והוא הכניס פה משהו לא נכון ויתקן את זה בעצמו.

ישי: מה לדעתכם הדבר הבא בעולם של DevOps, מה האתגר שמעבר לפינה, שלשם DevOps ו-platform engineering יתמקדו בשנה-שנתיים הקרובות, מה ה-frontier הבא?

ערן: אני חושב שהרבה מאוד מהשירותי ענן יעברו לתת עוד יותר אבסטרקציות, כלומר אם היום אני צריך בעצם לדאוג לקלאסטר קוברניטיס וכל מה שקשור לאופרציה שלו, אז אנחנו נראה עוד אבסטרקציות on top of that ואז החיים של המפתח יהיו הרבה יותר קלים בהקשר של איך הוא מביא את הקוד שלו לרוץ וכמה תהליכים בדרך צריך להגיע כדי שהדבר הזה יהיה reliable וה-service יהיה למעלה. אז הרבה זה, הרבה מהאבולוציה שאנחנו הולכים לראות קשורות באמת ל-offering חדש שיהיה של שירותי ענן.

ישי: בעצם חלקים מה-platform engineering יהפכו להיות managed service כחלק מה-cloud service.

ערן: כן, אני רואה נגיד עכשיו הם הוציאו ב-Re:Invent האחרון, AWS הכריזו על שירות שנקרא Code Catalyst, שזה בדיוק זה. הוא לוקח את כל האלמנטים של platform engineering, ומביא פלטפורמה אחת שמאפשרת, כלומר מנגיש blueprints, עושה CI, כלומר all in one, אז המפתח שהוא ניגש ל AWS זה כבר לא איזשהו חנות עם מלא סרביסים ומלא קונפיגורציות - אלא הכל מאוד מופשט ומאוד מונגש. אנחנו נראה את זה בא מהוונדורים הגדולים, לא רק מסטארטאפים, ולאט לאט זה ממש יהפוך את ה-platform engineering למיינסטרים ולא לנישה מגניבה שרק ה-early adopters מאמצים.

נתי: אני חושב שפה נכנס, אתה יודע, היום כל דבר ששואלים מה יהיה בעתיד? אז ChatGPT וכאלה, כבר נמאס לי לשמוע את זה אפילו או לשמוע את עצמי מדבר על זה,

ישי: הפודקאסט הזה יוצר ע"י אנשים אמיתיים.

נתי: בדיוק, כן, אבל האמת היה לי איזה ניואנס מעניין לספר על זה, אבל אני לא בטוח שזה ישתלב בנושא היום. אבל בכל מקרה אני חושב שאם אני מסתכל על הדבר הכי קרוב בעתיד של ChatGPT וההשפעה שלו על DevOps, זה Co-Pilot למיניהם שמבוסס כל מנוע שנקרא Codex שבעצם זה code generation הרבה יותר חכם כי בסוף הזמן שלוקח לאנשי DevOps הכי הרבה זמן, זה לכתוב את הטמפלייטים ולדאוג שהם עובדים כמו שצריך. התנסיתי בזה לאחרונה, אמרתי אפילו ב-Cloudify תייצר לי טמפלייט של Cloudify מגרסה כזו וכזו. בום, קיבלתי אותו בלי לקרוא שורת תיעוד אחת, משמה להרים בסוף את הקוברנטיס קלאסטר או מה שזה לא יהיה, לקח עוד קצת זמן. זה קיצר לי את הזמן באופן מאוד משמעותי, זה ממש פה, זה לא משהו בעתיד, זה כבר קורה.

ערן: אחד הדברים שעשינו לאחרונה, ממש כל הנושא של generative AI התפוצץ לפני איזה חודשיים. הוצאנו ממש CLI tools, קראנו לנו AIaC, שזה שילוב של AIC infrastructure as code ו-AI, שבעצם עושה את זה מהטרמינל, כל אחד יכול לבקש טמפלייט לאיזה קונפיגורציה שהוא רוצה Terraform, Pulumi, Dockerfile, לא במפתיע זה צבר הרבה מאוד פופולריות, מן הסתם זה offering שהוא open source בחינם. לא קשור למוצר ה-commercial ש-Firefly מוכרים, אבל זה בדיוק נוגע בזה. כי העולם הולך לשמה. אין ספק שהיכולות של ה-AI ליצור קודם הם, לטעמי, מאוד מרשימות. ואני לגמרי מסכים עם נתי שאנחנו נראה עוד ועוד מוצרים משלבים AI. מן הסתם גם אנחנו ב-Firefly עשינו את זה, וונדורים אחרים יעשו את זה וכבר עושים את זה, אני רואה, כמעט כל פעם שאני פותח את הלינקדאין אני רואה איזה וונדור ש-,

נתי: כן, אני אומר, זה גם יהיה הטכנולוגיה אני חושב שעוברת הכי מהר מ hype ל commodity. אנחנו נראה את זה גם, זאת אומרת מן הסתם מיליון משתמשים תוך 5 ימים, שזה שבר כל שיא עולמי. אגב, גם בגלל נושאים של UX, כי מן הסתם ChatGPT היה קיים כבר הרבה קודם אבל מה שהקפיץ אותו זה שהם הפכו אותו לגוגל,

ישי: זה ה-chat.

נתי: בדיוק, שהם הפכו אותו לגוגל כזה, תכתוב טקסט, תרשום, תקבל תוצאה. וכבר אתה לא צריך להיות מדען או scientist כדי לעבוד עם זה, כל בנאדם, כולל הילדים שלי ואשתי כותבים טקסט. ואתה רואה, שדרנים ברדיו, בקיצור זה נהיה מנוע חיפוש. והדבר השני שבאמת אני חושב קורה, זה באמת האינטגרציה שבתוך מוצרים. זאת אומרת זה נהיה יותר ויותר אינטגרטיבי מה-IDE שלך, עם כל מיני כלים אחרים, ופתאום זה נהיה spread all over the place ולכן אני חושב שזה יהיה די commodity, אוקיי, ברור, זה כמו search, זה עכשיו חלק מהעניין.

ערן: אנחנו מדברים פנימית בתוך החברה כל מיני גם על עתידנות ומנסים לחשוב מה יהיה. ואחד הדברים שמאוד ברור לנו שיקרה בגלל הנושא של AI, זה שכמו שעכשיו ענן הוא משהו שמאוד קל לצרוך ומאוד קל גם לבזבז הרבה מאוד כסף, זה הפך להיות cost center מאוד משמעותי לארגון, AI הולך להיות משהו אותו דבר, כלומר אנשים יצרכו AI לשימוש פנימי בארגונים, זה כמובן יהיה AI as a service כמו ש-API של OpenAI ומייקרוסופט מנגישים את זה כ-AI service וזה הולך להיות ממש, לדעתי בשנים הקרובות, משקל מאוד גדול מההוצאה הכספית של ארגונים יהיה על שירותי AI ואז אפילו דמיינו שיצא חברות סטארטאפ שעושות cost reduction או לטוקנים של AI.

ישי: בוא נעשה שכבת caching ו...

ערן: תחשוב שיש לך כמה ספקים, אז אתה יודע, תצטרך איזה מוצר צד שלישי שיגיד לך אוקיי, אם אתה תיקח ה-AI מהוונדור השני אתה תחסוך ככה וככה.

ישי: עכשיו, ל-query הזה.

ערן: או saving plans ל-AI. האמת שזה מעלה שאלה שאני זוכר ששנים ניסו לענות עליה ולא הצליחו והיא הופכת להיות הפוכה או מעניינת במובן הזה של העיבוד דאטה. זאת אומרת עד היום הצורה שבה מערכות מוניטורינג וכאלה דברים, אתה אוסף את הדאטה אליך, הוא שלך, ואתה מייצר ממנו תובנות. ובעבר זה היה כאילו הנכס שלך גם. היום אנחנו רואים שככל שהמידע הוא לא שלך, זאת אומרת המידע שהוא לא שלך כנראה אפילו יותר איכותי מהמידע שלך, לגבי טרנדים, לגבי cost analysis, לגבי כל הדברים האלה. אז ChatGPT הוא דוגמה מצוינת כי בעצם הוא עושה אגרגציה של כל הקוד בעולם שנצבר ב Github ובעוד מקורות אחרים או של Quora או של כל מיני דברים כאלה, ומייצר תובנות הרבה יותר טובות מעצם זה שהמידע הופך להיות שיתופי. ועכשיו אם אתה רוצה שהמערכת שלך תהיה מערכת לומדת, אז אם אתה תצמצם אותו רק למידע שאתה צובר, אז מן הסתם יש לך פחות דאטה לעבוד עליו. אז פתאום זה משנה גם, וזה משהו שאני עוד לא חושב שאנשים הפנימו לגמרי - את כל התפיסה שלנו על עיבוד מידע, על דאטה אנליטיקס, על אינסייטס ואיך מייצרים אינסייטס. אנחנו פתאום הופכים להיות מן משהו שצריך לעשות supervised learning ולא משהו שצובר את המידע ועושה את האנליזות אצלנו. וזה שינוי מאוד גדול לדעתי.

ישי: וכבר נפתחה תיבת הפנדורה של בעלות על הדאטה, של copyrights.

נתי: בדיוק, אני חושב ששמה יש עוד הרבה מאוד שאלות שפתוחות, אבל אני חושב שמה ש-ChatGPT ממחיש זה שה-ownership על הדאטה הוא כבר לא הופך להיות הנכס. אבל עדיין אתה צריך לשמור עליו באיזושהי צורה, אבל להנגיש את המידע הזה החוצה כנראה ייתן לך יותר ערך. מה שיהפוך להיות יותר ערך זה ה-supervised learning זאת אומרת איך אני יכול להוסיף domain-specific knowledge שהוא ידע להגיד כמו במקרה שלכם, אני לא רואה חיפוש על כל Terraform בעולם, אני רוצה חיפוש על Terraform Modules ב-AWS שעושים ככה והנה יש מישהו שיודע לפלטר לי את זה ולהביא לי את זה בצורה יותר מונגשת.

ישי: אוקיי, אז אם אנחנו קצת נחזור ונחזיר את הדיון לעולם של DevOps. אז אוקיי, generative AI בתור עוד כלי, אולי אפילו משהו שיעשה שיפט ב skillset שאיש DevOps צריך כדי לבנות את הפלטפורמה או כדי לשפר אותה. מה עומד להשתנות לדעתכם מבחינת סוג הבעיות, או במה יהיה ההתמקדות של התפקיד של ה-DevOps? אם אני מסתכל קדימה, איזה בעיות עיקריות הם יפתרו או מתחילים לפתור היום ויפתרו יותר בשביל הארגון?

נתי: אני יכול להעיד עכשיו ממקום שבתי ב-Dell, אז יש פה עולם שלם שאני חושב שפחות מונגש היום לעולמות של DevOps, זה באמת הנושא של IoT ונושאים של דיבייסים ובאמת להסתכל על ה-cloud מעבר לדאטה סנטר המרכזיים שיושבים עכשיו בענן. ואיפה אנחנו רואים את זה? בכל מקום, המצלמה מחוברת ל Wifi ומחוברת לזה ומן הסתם מקבלת continuous updates בלי שאנחנו אפילו שמים לב. המכונית, אם יש לכם טסלה, אז אתם רואים את זה מן הסתם, כל אחד רואה את זה במכונית שלו, המקר בבית, הטלוויזיה, הטלפון כמובן, כל דבר שאנחנו זזים איתו היום, השעון שלי מן הסתם, הכל כבר מחובר לאינטרנט, הכל מחובר לענן, אבל כשאנחנו מדברים על עולמות של DevOps, אתה רואה שהתשתיות DevOps לא בנויות להתמודד עם זה בצורה גנרית, זאת אומרת כל אחד תפר לעצמו את ה-update.

ערן: כלומר פייפליין של איך אני עושה פיתוח וריליס ופריסה לאותו, לאותם מכשירי קצה,

נתי: בצורה גנרית, בצורה שהיא לא תפורה לכל device ולכל ארגון ואני לא צריך שכל אחד יעשה את זה בעצמו.

ישי: אני חושב שה-public clouds עוד לא פתרו את השאלות האלה ברמה שמתקרבת בכלל לאיך שהם פתרו את ה centralized compute layer ל-b2b SaaS קלאסי.

נתי: נכון, וזה פותח לדעתי עולם שלם של פידבק לופ, כי מן הסתם הרכיבים האלה כבר ממש דוגמים את הבנאדם ומחברים את הבן אדם לתוך הענן ומייצרים איזה אינטראקציה קצת מפחידה, גם מעניינת, עם הרבה מאוד פוטנציאל. אני חושב שזה uncharted territory עדיין. אפרופו מה שקרה לנו ב-Cloudify זה שחשבנו על זה קצת מוקדם מידי, זה עשה לנו איזה disruption בדרך, אבל בסוף מישהו זיהה שזה הדבר שהוא מחפש וככה פחות או יותר הגענו גם ל-Dell. אז אני חושב, ממה שאני רואה, שזה הולך להיות משהו שכנראה הרבה יצטרכו אותו בעתיד אפילו קרוב.

ישי: זאת אומרת סוג אחד של פרונטיר חדש ל-DevOps זה לפתור את השאלות של חברות שה-footprint שלהם הוא לא רק מוצר אחד SaaS בענן web application,

נתי: כן, זה כל ה point of sales מצד אחד, זה כל ה-5G שאנחנו רואים, זה כל הנושא של ה-manufacturing שיש בחברות שמתעסקות בזה. אבל בעיקר בצד האנליטיקס, זאת אומרת יש הרבה מאוד חברות תעשייתיות, יצרניות שיש להן מן הסתם מיחשוב שהוא מבוזר, האנשים עצמם מבוזרים, לפעמים הרבה מאוד רכיבים, אני אתן את הדוגמה של ה Garmin שיש לי, אז זה לדעתי הפרונטיר הבא.

ישי: מעניין.

ערן: אני הייתי לוקח צעד אחד מה שנקרא יותר מוקדם בזמן, ואני חושב שתהליכים שאנחנו רואים היום שיותר צוברים תאוצה כמו GitOps, שכל העולם בעצם של איך נראה הקוד ב-production ואיך נראה המניפסט ב-production הם בעצם ע"י source of truth שהוא בעצם הגיט. אז היום זה מאומץ באופן חלקי, זה עדיין נחשב מה שנקרא זה ה-cool kids, ההיפסטרים עושים את זה,

ישי: הם עושים את זה ו-trunk-based ביחד.

ערן: נכון. אז אני חושב שכאלה מתודולוגיות כאלה שיותר עושים centralized לאותו מקום. ובפחות, אוקיי, יש לי את הקוד, יש לי את הקיד, אחרי זה יש לי את ה-CI ואחרי זה יש לי את ה ArgoCD שפורס. אנחנו נראה שהכל מתרכז לפחות מערכות, יהיה הרבה יותר קל להבין מה רץ ב-production, גם למפתח הבודד, הוא יסתכל על אותו ב branch של אותו main, יראה את ה manifest, יבין אוקיי, זה הגרסה שרצה. זה בעצם המהות של ה-GitOps. אני חושב שזה יצבור יותר פופולריות, אני רואה את זה איכשהו לאט לאט נכנס גם לארגונים קצת יותר מיושנים בתפיסה שלהם.

(מוסיקת מעבר)

ישי: לקראת סיום אני רוצה לעשות דאבל קליק על המונח הזה של developer experience. ובמשקפיים של צוות DevOps או platform engineering, לנסות לעבור קצת על מה הם ה-, איפה האתגרים הכי גדולים של ה-developer experience? איפה ה-developers נעצרים היום או מרגישים או יש להם experience שבסוף מוביל גם ל-velocity יותר נמוך או לתחושה של, עד כדי מיאוס / churn? איפה האתגרים הגדולים שבהם DevOps יכול ומתחיל לעזור או לפתור שאלות של developer experience.

ערן: אני חושב שהנושא הכי קלאסי זה באמת אזורים שבהם אין להם את הידע או את הניסיון להיות בהם מצוינים. אז אם אתה לוקח את אותו developer הטיפוסי שמאוד טוב ב-Java, ב-Go או בכל שפה אחרת ב Python, ואז מכניס לו layer של אוקיי, אתה אחראי על התשתיות של אותו service, זה יכול להכניס אותם למקום של קצת אי נוחות. רק בגלל הפערים, מצד אחד הם בקונטקסט של ספרינט, חייבים לדלוור, הכל מדוד, ומצד שני הם צריכים לעשות משהו שהם לאו דווקא מומחים בהם. אני חושב שזו באמת באמת הנקודה שבה אתה רואה developers קצת לוקחים צעד אחורה או זה האזור שהם לא מרגישים הכי בנוח.

"אז אם אתה לוקח את אותו developer הטיפוסי שמאוד טוב ב-Java, ב-Go או בכל שפה אחרת ב Python, ואז מכניס לו layer של אוקיי, אתה אחראי על התשתיות של אותו service, זה יכול להכניס אותם למקום של קצת אי נוחות."

ישי: אז כאילו זה הצעד השלישי שאתה תיארת נתי, המעבר הזה מהשיפט לפט המלא למצב שאוקיי, חלק מהדברים זה, אני יכול לכתוב את הפונקציה וזה ירוץ.

נתי: אני חושב ש-Backstage הוא דוגמה מצוינת להיום ל-open source platform engineering, מגיע מהעולם של Spotify. Spotify נחשבים סוג של רוקסטאר בעולמות של DevOps, כמעט כולם מאמצים את התורה שלהם לאיך בונים צוותים וסקוואדים,

ישי: זה הנטפליקס החדש?

נתי: אבל הם יותר מושקעים מנטפליקס ברמה של פיתוח, ופחות זורקים קוד לקומיוניטי ואומרים קחו, הם ממש הופכים את זה למוצר ויש להם אספרציות לדעתי דומות של אמאזון. זאת אומרת מי שככה לא עקב אחרי אמאזון, אמאזון הייתה חנות ספרים פעם, היא לא הייתה חברת תשתיות ואף אחד לא צפה שזה יקרה. לספוטיפיי יש איזה אספרציות דומות כבר ממה שדיברתי איתם, זה שמה, בתודעה, אני לא יודע כמה הם יצליחו, ו-backstage זו איזושהי התחלה, מהבחינה הזאתי שמתחילה להראות שהם נכנסים להשקעה הרבה יותר גדולה, כולל זה שהם הכריזו על מוצר מסחרי של ה-platform engineering. אז הם חוו, ויצא לי לראיין אנשים מהם, החווית משתמש שמה, החווית מפתח הייתה כזאתי שעכשיו אני בונה microservice, אני רוצה לפתח microservice, אוקיי? אני נכנס לסביבה שיש בה עוד microservices, איך אני יודע מה ה-interface שלהם, מה ה-API שלהם, איפה הם נמצאים, איך אני יודע עכשיו מה הסטטוס שלהם, זה בשל, זה לא בשל, איזה גרסה אני עובד איתם.

ערן: גם ה-ownership, מי ה-owner?

נתי: מי ה-owner, בדיוק, זאת אומרת המון דברים שעד היום היו מבוזרים לך בין CI/CD לגיט ל-monitoring system לדוקומנטציה,

ישי: ל-organization chart.

נתי: ל-organization chart, והיו זורקים מפתחים, אומרים להם: ״קח, תתחיל לפתח״. אז מן הסתם זה לא experience טוב. והדבר הראשון הוא בכלל מאוד פשוט, בוא תרכז לי את המידע, אני רוצה להגיע לפרויקט, אני צריך לפתח microservices, תביא לי את כל המידע שאני צריך כדי לפתח למקום אחד. אחרי זה נכנס בזה שאני לא רוצה לעשות אופרציה במקומות שבהם אני לא צריך. זאת אומרת, אם זה משהו שחוזר על עצמו, אני כנראה לא צריך להתעסק עם זה, אני מבזבז זמן על דברים שלא צריך. אם זה משהו חדש שהוא רק עכשיו יצא, פיצ'ר חדש באמאזון, כנראה הוא עוד לא בפלטפורמה, אבל הוא קריטי לי לפיתוח, אז אני גם ארצה אולי כי זה מעניין אותי. אני ארצה ללמוד אותו, אני אתעסק איתו, אבל צמצם את זה לזה, אל תכניס אותי עכשיו להרים קוברנטיקס קלאסטרס 100 פעם על אותו דבר ולהכיר Helm Chart שאני לא צריך להכיר.

ישי: קפקא.

נתי: או קפקא, בדיוק.

ישי: או. איזה כיף. כיף-קא. (צוחק)

נתי: אז אני חושב ששני הדברים האלה ברמה של developer experience, ההנגשה של המידע ויכולת כתוצאה מזה לפתח בצורה יותר קלה ופשוטה. והדבר השני זה באמת לאפשר לי את הגמישות איפה שאני צריך כדי שלא להיתקע על ב sysadmin של ה-cloud. את הגמישות רק בדברים שאני צריך, לא בדברים שאני לא צריך, דברים שהם חדשים, דברים שהם קריטיים יותר.

ישי: אוקיי, אז דיברנו על discovery ובעצם,

ערן: קטלוג,

ישי: תן לי לדעת איפה הסרביסים, מי ה-owner, מי יעשה לי code review, מי יעשה לי design review, עם מי אני צריך לדבר בכלל? וגם איפה API-ים, מה בכלל קיים, אני חושב זה גם developer experience משתנה לאורך החיים של המפתח בארגון.

נתי: נכון.

ישי: ורוב הרגונים, אולי קצת פחות עכשיו ב-down economy, אבל רוב המפתחים הם מפתחים חדשים. הם עכשיו הצטרפו, גייסנו אותם, כולם צומחים, כולם צריכים לחפש כל הזמן מידע. אח"כ כשאני כבר סניור אני מכיר את כל הארגון אז…

נתי: אגב, גם אם הם לא חדשים, הרבה פעמים אתה מצניח אותם כל פעם לאיזה פרויקט אחר, במיוחד בעולמות של microservice, זה כבר לא איזה monolith שאתה חי בתוכו הרבה מאוד זמן, אתה כל פעם נזרק מסוג של פרויקט לפרויקט, אפילו באותו פרויקט, ואז ההנגשה הזאת הופכת להיות… ונניח כשדיברתי אפילו עם המפתח הזה של ספוטיפיי על החוויה שלו הוא אומר אני הגעתי ל-Backstage בגלל שבעצם מצאתי את עצמי משקיע כל פעם הרבה מאוד זמן רק בללמוד להכיר עם מי אני צריך לדבר, מי זה האנשים שכן יודעים ויש להם את ה-domain expertise, ועד שאני מתחיל לכתוב שורת קוד לקח לי הרבה מאוד זמן. ומשם צמח ה-Backstage.

ערן: וגם אתה רואה ארגונים שמאמצים את זה, AppsFlyer זה דוגמה שהעלית, הם מאמצים platform engineering ברמה מאוד גבוהה וגם משתמשים ב-backstage, המערכת הזאת הופכת להיות business critical לכל דבר. כלומר זה לא איזה nice to have, יש לי איזה מערכת קטלוג של סרביסים, אלא זה מערכת שיש צוות שיושב ומתחזק אותה וזה כל כך crucial ל-velocity של הארגון, כי אתה שם דגש על ה-developer experience. אתה לא יכול לחיות בלי זה ברגע שאתה מתניע את זה.

ישי: כלומר היא תיפול הלילה אז ה-production של AppsFlyer לא יקרה לו כלום, אבל ה-developers קשה להם לזוז.

ערן: היא תיפול בלילה, יהיה מישהו שיתעורר,

ישי: כי ה-developers ל-velocity זאת אומרת,

ערן: כי זה הפך להיות חתיכה מאוד מרכזית באיך הם מפתחים מוצר, ככה שזה כבר לא nice to have, זה business critical system.

נתי: אני רוצה לתת אנלוגיה בשביל שזה יהיה מובן כי הרבה פעמים, גם בשאלה שלך אני חושב שזה הראה את הטעות שהרב פעמים אנשים עושים. זאת אומרת צריך לחשוב על software delivery pipeline כמו קו ייצור. זאת אומרת עכשיו אם הייתי אומר לך אוקיי, הקו ייצור בטסלה נעצר, היית אומר לי ברור שמישהו צריך לקום ולדאוג שזה יקרה ואין מצב שה-pipeline הזה נעצר. אומנם המכונית עוד לא ב-production אבל ה-pipeline נעצר אז זה עוד יותר קריטי כי ברור שכל החברה יושבת על ה-pipeline הזה. הבגרות. אני חושב של ההייטק בכלל, זה שעוד לא לגמרי מבינים ש-pipeline של פיתוח הוא לא שונה מקו יצור של טסלה, זה בדיוק אותו דבר. זה pipeline שבסופו של דבר נכנס קוד מצד אחד, יוצא פיצ'ר מצד שני, בקצה יש לקוח שצורך את זה ומשלם כסף לחברה. ואיכשהו התובנה הזאתי, ההפנמה הזאתי קיימת אבל לא עד כדי כך. ולכן הרבה פעמים אתה שומע את השאלה אז אוקיי, אז נעצר הפיתוח, הוא יקום בבוקר הוא יעשה את זה, ואני אומר לו לא, זה לא מפתח אחד, אלף זה הרבה מפתחים, ובית זה מה שבסוף ימכור לך ויהיה לך delyay ב revenue בסוף וזה יפגע לך בשורה התחתונה.

"צריך לחשוב על software delivery pipeline כמו קו ייצור. זאת אומרת עכשיו אם הייתי אומר לך אוקיי, הקו ייצור בטסלה נעצר, היית אומר לי ברור שמישהו צריך לקום ולדאוג שזה יקרה..."

ישי: אני חושב שבהרבה מקרים לא רק דיליי ברבניו ופיצ'רים חדשים, אם הארגונים שעברו למהלכים מודרניים של cicb וכו', אם אין לי pipe line אמין בכל רגע נתון, אני לא יכול לטפל בתקלות. באמצע הלילה עכשיו כן יש תקלה ב-production, בלי pipeline אני,

נתי: ולא היית אפילו שואל את השאלה הזאת אם זה היה קו יצור של מכוניות, נכון? או של מקררים או של משהו אחר.

ישי: אמת.

נתי: אני אומר זה לא שונה, אנחנו פשוט עוד לא הפנמנו שזה לא שונה.

ערן: כן, הארגונים האלה ממש נמדדים, יש להם KPIs שהם נגזרת של כמה זמן חסכתי בזמן של פיתוח, כמה מפתחים חסכתי, כמה אוטומציות עשיתי שבסופו של דבר גרמו לי להיות הרבה יותר יעיל? זה דברים שהם סופר מדידים.

ישי: איך אני מודד כמה זמן חסכתי? מה הגישה לפחות להתחיל לענות על השאלה הזאת, אוקיי, הרמתי Backstage, כמה זמן חסכתי?

נתי: אז אני חושב שהדבר שלצורך העניין מגשר על זה מעולמות של מרקטינג לצורך העניין, זה PLG - product lead growth, שבעצם הוא באופן עקיף עונה על השאלה הזאתי אבל הוא עונה על השאלה הזאת. זאת אומרת אתה יכול לראות היום, זאת אומרת היום חלק מהצורה שבה מרקטינג עובד, מאוד מחובר ל-development. איך הוא עומד? אם אנחנו מסתכלים על Slack דוגמה ל-PLG שכולם משתמשים בה, וזה דיון אחר PLG כן עובד לא עובד…אבל בגדול אני יודע היום, בהרבה מאוד מקרים, למדוד איזה פיצ'ר בסוף מביא לי מוניטיזציה. כן הביא לי מוניטיזציה, שיפר לי את ה-UX, לא שיפר לי את ה-UX של המפתח עצמו. עכשיו מן הסתם, ככל שיש לי דיליי בפיצ'ר הזה, אני יודע שעכשיו נופלים לי 20 אחוז מהמשתמשים ב-login, למה? כי הם לא מוצאים את הכפתור הנכון, או מה שזה לא יהיה. אז די ברור לי שדיליי בדבר הזה יפגע לי בלקוח. יש לי פיצ'ר שמתחרה פיתח ולי אין אותו, למשל אינטגרציה עם ChatGPT, אז אני יודע להגיד שהוא יפגע לי במכירות. אז יש חלק מהפיצ'רים שאנחנו יודעים בוודאות להגיד איך הם מתורגמים ל-business value וכתוצאה מזה למה הם קריטיים. ויש הרבה non-functional שזה soft, הייתי אומר metrics. זה לא משהו שאני יכול למדוד ממש בשורה התחתונה אבל אנחנו יודעים שאם המערכת שלנו לא תהיה highly available בצורה טובה, זה גם יפגע בחוויה של הלקוחות שלנו אבל המדדים הם כבר לא קשיחים או לא, אי אפשר לתרגם אותם בצורה שהיא quantified, זאת אומרת שהיא, לא מוצא את המילה בעברית, היא לא יושבת ישר על הדולרים, אבל יודעים שהיא חשובה וזה יותר סופט מטריקס. אבל יש היום כבר לא מעט דברים, בגלל שיש חיבור בין ה-, ב-SaaS, בין החווית משתמש לבין הפיצ'רים לבין conversion, יש הרבה מאוד מהפיצ'רים שאתה יכול כן להגיד את זה, ממש כמעט בצורה קשיחה.

ישי: אז אני חושב שבאמת בעולם של, אם אני שואל את עצמי איך אני מודד כמה זמן חסכתי או האם הפיתוח שלי יותר יעיל בגלל ששמתי כלי, אפשר להסתכל על שאלות כמו אוקיי, איך ה cycle time שלי התקצר, meantime to restore לתקלות, האם אני צריך לחפש בלי קטלוג את ה-service ומי ה-owner שלו כדי לטפל בבעיה, אז אני יכול להגיד אוקיי, שמתי backstage או שיפרתי איזשהו יכולת או נתתי cookie cutter, במקום שכל מפתח יצטרך לעשות לבד. הקטנתי עכשיו את הסייקל שנדרש בשביל להרים פיצ'ר end to end, וזה דברים שכבר מתקבעים היום כי הדרך למדוד velocity ולמדוד productivity של ארגוני פיתוח. אז אולי אני לא תמיד יודע לקשור בין ההשקעה שעשיתי לתוצאה, אבל אני כן יכול לראות השקעה ושיפורים במטריקות, ולהצדיק עוד השקעה ב-platform engineering ובכלים כדי לעזור למפתחים לרוץ יותר מהר.

נתי: כן, זה מטריקות שאני חושב שהרבה ארגונים נמדדים עליהם, זה velocity שזה בד"כ כמה מהר אתה מפתח פיצ'רים ל-production, תוסיף לזה באמת את ה-business value, זאת אומרת את היכולת לראות באמת איך זה בא לידי ביטוי בשורה התחתונה, אני חושב נותן לך תמונה טובה לכמה זה קריטי לארגונים.

ישי: אז דיברנו על developer experience, הזכרת את הנושא של דיסקברי ולמצוא את הבעיות, דיברנו על לעשות פחות או פחות נקרא לזה toil של לעשות שוב ושוב את אותו טיפול ב-infrastructure שאולי זה גם לא ה-comfort zone שלי. מה עוד? איפה אתם עוד רואים כאבים או שאלות של developer experience ש-DevOps ו-platform engineering יכולים לפתור?

ערן: אני חושב שזה גם בנושא של האחידות והאם מה שאתה מייצר הוא compliant עם ה-policy הארגוני. אז אחד מהדברים באמת ש-platform engineering מאפשר או ה toolib באקוסיסטם הזה, זה שקובעים פעם אחת את אותו policy, ואז יש בעצם enforcement כבר בשלב הפיתוח. האם אתה עושה את זה מיושר או לא מיושר? וזה לא חייב להיות security, זה יכול להיות כל דבר, יכול להיות אפילו ברמת העלות. היום אנחנו רואים פרויקטים שמאפשרים לך לעשות cost projection עוד בשלב ה-CI. מפתח ידע כמה עולה ה service עוד לפני שהוא פרס אותו והסתכל על החשבונית החודשית ב retrospective.

ישי: כן, אני חושב ש-, אם אני משליך את זה בחזרה ל-experience, אז כמובן היכולת שלי לא לחטוף את הבום בראש אח"כ ולתקן,

ערן: אתה פחות טועה, יש לך פחות מקום לטעויות, אני חושב שזה דווקא משפר משמעותית. אף אחד לא רוצה לדעת שהוא עשה איזושהי עבודה שהוא השקיע בה הרבה מאוד זמן ואז בסוף יש איזשהו, לא יודע, security vulnerability לצורך העניין, או שזה עולה הרבה יותר ממה שזה היה יכול לעלות אם הוא היה משנה משהו.

ישי: ועכשיו הוא צריך לתקן תחת אש, זה כבר ב-production, זה experience לא טוב. אני חושב שמה שקראת לו compliance גם אומר כשאני ניגש לתחזק קוד של מישהו אחר או ניגש להצטרף לאיזשהו פרויקט, זה יהיה לי מוכר, זה נראה דומה, זה נראה אותו דבר, אני לא צריך פתאום ללמוד את, אה, ככה הם עשו את ה-infrastructure. ככה הם תפרו את הדברים, כי אני מכיר את זה מהפרויקט הקודם שלי וזה שלפניו שכולם נראים דומים במובנים האלה, ואני יכול להתמקד בלוגיקה ולא בללמוד את ה-,

נתי: נכון, יש המון דברים שהם לא streamlined בחיים של מפתחץ אפילו אנחנו מתחילים לייצר לי dev environment היא רוצה עכשיו לייצר dev ו testing. אז גם לוקח לי זמן לקבל את הדבר הזה, בטח אם אני רוצה לעשות sandbox ולשחק עם איזה משהו לפני שבכלל הוא נכנס לאיזושהי מערכת מסודרת. יש את ה-, כמעט בכל ארגון תמצא יותר מ-pipeline אחד, זאת אומרת יש לך ArgoCD לקוברנטיקס ויש לך Jenkins או CircleCI לצורך העניין לדברים אחרים. ויש עוד הרבה כאלה ואיפשהו אפילו לראות מה קורה אם הקוד שלי באמת לוקח לי יחסית הרבה מאוד זמן. אז אם אנחנו מסתכלים על כל החוויה של מפתח היום, בכל אחד מהשלבים של הפיתוח, יש הרבה מאוד הייתי אומר ג'ונגל בתוך העולם הזה. Backstage עוזר בזה שהוא מרכז את המידע, אבל עדיין יש הרבה דברים שאני צריך להפוך אותם ופה נכנס הנושא של platform engineering להפוך אותם ל-self service. אוקיי, אתה רוצה לייצר סביבה, קליק, הנה, נוצרה לך סביבה, אתה יכול לעשות מה שאתה רוצה. זה סביבה שהיא sandbox אז זה סביבה שהיא כבר semi-production, אתה יכול לפוך את הדברים להרבה יותר self-service מהבחינה הזאתי. pipeline באמת GitOps מאפשר לנרמל את כל הנושא הזה של ה-pipelines. אז אתה מדבר רק עם GitOps ו-GitOps כבר מתחבר ל ArgoCD או למה שצריך שמה, אתה לא צריך להיות זה שמתעסק עם זה. הדבר שנשאר, אני חושב ויישאר עדיין מורכב זה כל הנושא של ה-troubleshooting. זאת אומרת אוקיי, אז פיתחתי משהו, משהו שמה קרה והתקלקל וכאלה דברים, זה עדיין תהליך שאני חושב עדיין לוקח זמן. ואני לא רואה הרבה פתרונות שלגמרי פותרים אותו. יש כל מיני ניסיונות, אני חושב שהוא עדיין rough edge בהרבה מאוד מובנים.

(מוסיקת מעבר)

ישי: לסיום, נשמח אם כל אחד מכם ייתן ככה טיפ אחד למישהו שחי את עולם ה-DevOps ורוצה להתפתח או להסתכל מה הדבר הבא שהוא צריך לשים אליו לב, הוא או היא. אז ככה נקודה אחת לשים לה אם אני practitioner או leader בעולם של DevOps ו-platform engineering, על מה לקרוא מחר או על מה לחשוב בשבועות הקרובים.

ערן: אני אגיד משהו ש- זה קצת פילוסופי אבל אני אומר את זה מפוזיציה, אני באמת מאמין בזה היום שיש לי פרספקטיבה קצת אחרת לא רק כמנהל DevOps אלא כוונדור, תמיד לחשוב האם מוצר שתקנה אותו מהמדף הוא יהיה שווה לך יותר מאשר לעשות in-house. גם אני חטאתי בזה, אני חושב שהרבה מאוד מנהלי DevOps התמכרו ל-build your own solution.

ישי: NIH, כן.

ערן: כן. וזה נחמד וזה מגניב, יש המון open source היום וקהילה מטורפת בכל מה שקשור באקוסיסטם של ה-DevOps engineering ו-platform engineering וזה מאוד קל להראות ערך מבלי להוציא כסף מהכיס. זה לא תמיד נכון, שוב, אומר את זה מפוזיציה אני מוכר לאנשי DevOps, אז זה נוח לי להגיד את זה, אבל אני אומר את זה ממש בהבנה שלהסתכל בצורה בוגרת על ROI והאם שווה יותר לשלם על מוצר מדף מאשר לפתח, זה משהו שכל אחד, כל מי שהוא לידר באזורים האלה חייב לאמץ לעצמו את היכולת לעשות את זה.

ישי: זה שאני מדבר מפוזיציה לא אומר שאני טועה.

ערן: אני בוודאות לא טועה.

נתי: אני חושב שאני אחזור לשני דברים שאני חושב שאני אגע בהם, אחד מהם חצי בפוזיציה, אחד מהם בכלל לא. אז אני אתחיל דווקא מהזה שלא בפוזיציה, שזה כל הנושא של באמת אם אני ג'וניור ואני נכנס לעולמות האלה, החוויה שלי גם עם הבן שלי וגם עם חברים, ללמוד היום איך להיכנס לעולמות של DevOps מאוד קל עם העולמות שקיימים היום לChatGPT ו-Copilot, זה לא ייאמן כמה אפילו זה מקצר את תהליך הלימוד. כי זה מסנן לך בדיוק את מה שאתה רוצה לאותה נקודת שאלה, איך אני כותב קוד ב Python לדבר הזה, איך אני מוצא טמפלייט שמרים לי RDS, תוך שנייה אתה מוצא, תוך שנייה אתה יכול למצוא את הפתרון, זה חוויה שממש מקצרת את תהליך הלימוד, אפילו בתהליך של קוד review, זאת אומרת כתבתי קוד, אני רוצה לדעת אם הוא מספיק טוב או לא, הסיכוי שאני אמצא מישהו בארגון שבאמת יעשה review וייתן לי פידבק, הרבה יותר מזה שאני אריץ את הקוד הזה עם Copilot ואני אקבל על זה פידבק הרבה יותר איכותי, לפני שאני מגיע בסוף ל-production, וזה מאוד מומלץ, זה מאוד יכול לקצר, אני רואה שעדיין הרבה מאוד אנשים עוד לא יודעים לנצל את היכולת הזאתי, זה חייב לקרות ומי שיידע לעשות את זה אני חושב יעשה קפיצת מדרגה ב skill שלו.

ישי: זאת אומרת תשקיעו בללמוד לעשות פרומפטים וללמוד לדבר engineering.

ערן: זה נקרא היום prompt engineering.

נתי: כן, וזה דורש סקיל בפני עצמו איך לעבוד עם הכלים האלה ואיך ללמד אותם ואיך להביא אותם לתת לך את התוצרים הנכונים וזה לא מה שנקרא ישר קדימה.

ישי: מי מביננו שמספיק זקן כדי לזכור את הימים הראשונים של גוגל, ועוד ה-search engine לפני שהיה, אמנות לדעת איך לבנות את ה-search query שלי ככה שאני אקבל תוצאות שהם הגיוניות.

נתי: זה בדיוק ככה, ופה זה עוד יותר אמנות. פה זה עוד יותר אמנות כי זה ממש אינטראקטיבי וזה ממש. אתה יודע, לא כל אחד שישאל שאלה יקבל את אותה תשובה, אז אתה צריך לדעת גם לשאול את השאלה, אתה גם צריך לדעת לבדוק את התשובה ואתה צריך, זאת אומרת אתה צריך להבין מאוד בדומיין כדי להבין שהתשובה שלך, לשאול את השאלה הנכונה וגם לקבל את התשובה הנכונה. וזה סקיל שצריך לדעת ללמוד אותו ויש פה הזדמנות לדעתי למי שרוצה לפתח את הקריירה שלו, להיכנס לעולמות האלה עכשיו ובעולמות האלה. הנקודה השנייה זה להבין באמת שבסוף, בטח בתעשייה של היום, אנחנו נמדדים על כמה מהר אנחנו מביאים ערך לארגון, לכתוב את ה Terraform טמפלייט או מה שזה לא יהיה, זה לא בהכרח הדבר שמביא לי את הערך לארגון, יש היום הרבה מאוד טמפלייטים קיימים שכבר כתבו אותם, היכולת לעשות את ה-assembly הזה, זה יביא הרבה יותר ערך לארגון. ולדעת למצוא את הדברים האלה או להשתמש בכלים שיודעים לייצר את זה, יביא ערך הרבה יותר מהר לארגון. אז פחות להתאהב בלעשות את כל הדברים בעצמך וה-not invented here שאני חושב שכולנו, במיוחד בעולמות ה-DevOps חוטאים בו. ולחשוב דווקא מכיוונים של ערך לארגון ויש המון innovation בתוך הדבר הזה, פשוט צריך לעשות את הסוויץ' בראש, להבין אוקיי, עברנו קומה, אנחנו עולים קומה אחרת, ועכשיו אנחנו בונים גורד שחקים ולא בניין מבלוקים, וגם בגורד שחקים יש אתגר הנדסי מאוד משמעותי, אבל הוא מסתכל,

ישי: ב-level אחר.

נתי: בדיוק.

ישי: נתי, ערן, תודה רבה, היה כיף, שמחתי לדבר איתכם על DevOps, על platform engineering ועל developer experience.

ערן: תודה רבה.

נתי: תודה רבה, נהניתי מאוד.

(מוסיקת מעבר)

** לינקים לקהילות שהוזכרו בפרק - כאן.**

Yishai: Welcome to Season 2 of Dev Interrupted - the Hebrew Edition, LinearB's podcast which discusses everything that interests the daily work of engineering managers. This season we will host industry leaders and talk with them about everything that interests engineering managers, those who work with them, and those who one day would like to manage an engineering organization. This season we will place special emphasis on Developer Experience. I'm Yishai Beeri, CTO at LinearB. Let's get started.

Yishai: In this episode I am happy to host Nati Shalom,Co-Founder and CTO at Cloudify, which was recently acquired by Dell, congrats!

Nati: Thank you.

Yishai: And Eran Bibi, Co-Founder and Chief Product Officer at Firefly.

Eran: Hello.

Yishai: Hello, what a pleasure you came, we are really excited, we are at the start of our second season, and today we will talk about developer experience, about DevOps, about platform engineering, but first of all I would be happy if everyone could tell me a few words about their path, how did you get to this point in your career. Nati, let's start with you.

Nati: Okay, so Eran, I apologize, it will be a bit long because I'm a little older.

Eran: It's fine.

Nati: So I actually started in the field of technology in the 8th grade, the Weizmann Institute, there was some course there that I have no idea why I was thrown into it but it was to pick up transistors and all kinds of things like that and then I started to discover a whole new world and it continued later with a Bar Mitzvah gift which was also to build radio kits and all kinds of things like that. Then I realized that ok, this is what I want to do, continued in high school majoring in electronics, in the army I did some sort of pivot. I said ok, now I want to diversify a bit, do things that I would never do in life, which was actually going to a combat unit, different from the usual path of people of this type, and it was really one of the best things I've done in my life. Then I got back on track, I graduated from Coventry University in England, as part of a student exchange program, which was also an experience in itself. And since then I actually, I think I mainly bring a lot of multidisciplinary knowledge, both on the one hand hardware, on the other hand software, DevOps, today in the worlds of the cloud. But at the time there was also distributed computing, HPC, information systems and also applications, hardware and software I have already described. And in the world of the cloud, I think that this multidisciplinary knowledge is almost necessary because the cloud actually turns everything into code and basically systems become something that is much more holistic and not just a certain function of an application or a certain function of a data center. This, in a nutshell, today I'm a serial entrepreneur, I founded a company called GigaSpaces that really dealt with distributed systems for solving data problems, called NoSQL, and mainly concentrated on doing it on the basis of a memory platform. From there, Cloudify grew, which was actually built as an infrastructure for the management of distributed systems. And Cloudify, as you said, was acquired not long ago by Dell, when we say not long ago it's the last month. Obviously the process is much longer, and we might talk about it a little more later, very fresh and I'm excited to get back to talking about other things.

Yishai: Yes, so congratulations again and we'll dive a little into what that means and what it looks like a month after the acquisition. Eran, tell us a little about yourself.

Eran: So I'm a bit younger than I am, 38 years old, but still relative to the folks I work with today I'm considered the oldest in the team.

Yishai: Dinosaur.

Eran: Really, and it's hard for me to think of myself that way, but it's slowly becoming that way. So really I started my love for computers at a relatively young age, when I had my first PC so I really enjoyed taking it apart and understanding what each component does and I had some kind of inexplicable passion for it, I had the privilege of participating in an experiment of high speed internet, before it was what called mainstream. And then it really exposed me very early to access to information 24/7 while people would connect to the Internet only on Saturdays when it was cheap, with the very slow line. Then I really deepened this passion and curiosity in these worlds. I really connected with the topic of infrastructure and networking and system administration. I did it in the army and after I was released I even deepened my knowledge a little in the worlds of Linux operating systems, I did some kind of Red Hat course, while everyone else was learning Microsoft, I actually went to the other side to deepen my knowledge in the worlds of Unix and Linux, I really liked it. Like everyone my age, I worked at Comverse, it's some kind of checkmark that you have to make in your career path, and from there I rolled into both startup companies and somewhat larger enterprise companies. When the cloud arrived for me it was a very natural connection because it combined my love for infrastructure and operating systems as well as development and scripts, so I entered the DevOps niche at a relatively early stage. It was around 2012, I was hands on for many years and then I moved to my role in leadership. Before I founded Firefly, I was one of the first employees at Aqua Security, I had the privilege of establishing a leading DevOps team, with all the popular disciplines SRE, platform engineering, and I got a lot of appetite for startups and entrepreneurship there. And I decided to go on my own journey and I got together with my partners and a year and a half ago we founded Firefly, a product that really provides a solution in these worlds that I feel very comfortable with, the DevOps worlds.

Yishai: And when you went to found Firefly, is this the first time you are formally in a product position? As opposed to hands on and leading DevOps and a little more irons?

Eran: Exactly like that, but it felt very natural to me because a lot of what I did, as someone who leads DevOps, was to produce internal products for our internal customers within the organization, so defining a product and knowing how to build it was something that felt very natural to me, but yes, that's right, it's the first product role, in the profession, with the title.

Yishai: You went all the way, to Chief. First time and I'm in Chief.

Eran: Yes, I cut corners, I took some shortcuts in this regard, but in the end, today I am building a product for an audience that I am actually the target audience for. So I know the persona best, I actually know best how to produce a product for this target audience. So on the one hand it's a shortcut, on the other hand I think it's one of the most important qualities for a product manager is to really know who you're selling to.

Nati: The truth is, by the way, this is a very important point, when I explain, let's say, how in Elasticsearch and how even Mark Zuckerberg succeeded in Facebook, it is often this point. That many times it is easy to see things as shades of gray, that is, someone from the outside who does not live and breathe the domain, will not see these small nuances of the feature. Let's say in Shay's case, so he saw the need for a simple API. Not only a database that knows how to make a query or search, but also that it will be very telling to developers, open UI, more API, which is what developers see. The same with Terraform, Terraform realized that their user is a DevOps person who lives within a CI/CD pipeline, most of the solutions, including Cloudify, lived very well in the worlds of operators but less well lived in the worlds of developers. This recognition and the ability to recognize this sharply can often come from people who have actually experienced the problem, and many times when approached by all kinds of management who come and say okay, then he has previous experience with this, but they don't know the end user, they were not the end user , so this is a super important point, they will see the shades of gray, and the shades of gray today are what differentiate success from failure. And I think this is a point that a lot of people miss, but also especially in the world of DevOps where almost all the tools look similar, speak almost the same language, do very similar things, and also in the worlds of security, and this is often what I see that differentiates between a product that succeeds in the end and a product that doesn't. Everyone tries to solve similar problems, but those who manage to identify the user at the end and understand exactly his problem, are the ones who win in the end.

Yishai: I agree, I think that the companies, the startups that work on products that are themselves the customer, and that the developers, in the end the ones who turn to the developers, to the DevOps teams and the people who build the product are the ones who know firsthand the use and it's a different world. I see at LinearB, we sell products to development organizations, and every one of our developers knows, understands by heart, what this product is. And he also uses it and he doesn't need a translation from the product people or whoever talks to the customers about what they need, and that makes a world of difference in ability.

Eran: I think that even in this domain of DevOps it is even more extreme because DevOps is always in an organization from such a capsule that even the IR management does not really understand why, let's say the most important question, why do we need so many DevOps people. These are very talented people but very expensive in terms of the cost of the organization, and there is always a demand for more, and they also usually sit in some sort of separation from the developers, from their day2day, so really getting to know what the day-to-day life of the DevOps person or the DevOps manager looks like, is very It's hard to understand from the outside, but it's much easier if you've done a term in one of these positions.

Yishai: Yes, and they are also often not linked to features or capabilities, so the managers say wait, so much effort is going there, but why is it going? What does it support me or how, they don't build features, in most cases, so it's another level of how I as a manager understand that we need to invest here. Because most of the value is perceived as a feature, okay? There is another capability, I'm going to sell it.

Nati: I think that today in the current crisis in the industry, I think there is already more understanding of the value of this thing. Because obviously when you need to do more with less, you need automation, when you need automation you need people of this type. And I think that this understanding is already more present in the management layers as well. I mean actually, in a previous podcast I did with Wix, the platform engineering initiative came from the CEO, which was very painful to hear, it didn't come from the developers, it came from the CEO, why? Because he realized he needed velocity, why did he realize he needed velocity? Because his competition ran a little faster. So I think it changes in this context and it's very true, they don't know how to translate it into an operation at the end. They still need the people themselves, but I think that this trend in the market where everyone now has to look at efficiency and not just as many users as possible, no matter what the cost. Strengthens the understanding of the value, I think of automation and of DevOps people, and suddenly they become people who appreciate their value more within the organization and really when we talk about platform engineering it is really a different case because it is no longer growing from the bottom but also growing more from the top.

(transitional music)

Yishai: Great, so we're talking about DevOps, it's a loaded word, a term that has changed, and if I think about what a DevOps person does or what that role means or what the role of DevOps is in a company, it's evolved, it's changed a lot in the last 10, 15 years. With all kinds of sub-specialties and there is SRE, there are all kinds of different worlds, how do you look at these changes that have gone through and go over this basket called DevOps?

Nati: Me, precisely because I now moved to Dell, I suddenly found myself explaining it to the so-called laymen, to people less from the world, so I tried to find a way to explain it in a very simple way, this evolution that happened. And then actually if you look at it, there were 3 main waves here, I mean there was the wave where they came and said okay, we've moved to the cloud, we need automation, most of our developers don't know how to do automation, we need people who understand automation, we called it a group, and it's a fabric" When there was a very central group in the organization I would say automate me to RDS, automate me to VM, automate me to this, and they would write scripts that do the automation. It moved to a situation where most of the developers already understood what cloud is and then this core team We'll become a bottleneck, they said OK, I can't wait for this central team to do anything for me and that I'll ask them and I'll get a product and it won't be exactly what I want, so I as a development team will take responsibility for this thing and my people, I mean the developers, will start dealing with automation Then there was a situation of automation decentralization within the development teams and the infrastructure as code integrated here in a very, very nice way. We reached a point where a problem arose, because this decentralization was too great a decentralization and created a problem of consistency and governance, I mean okay, now everyone does What he wants, too democratic, the word democratic today is loaded Eh, we won't get into that now, (laughs)

Yishai: (laughs) …Who transfers??? the devops people?

Nati: But it has become too democratic...a reform is needed. So in any case they understood that order is needed here. Now a return to a central team doesn't work either because then we will return to the problem of the bottleneck, that the central team will be a bottleneck. Then the world of platform engineering entered which basically means that instead of people being a central figure, there will be a central platform that will be the self service, that people will build the platform but they will not be the interface of the developers. The developers' interface will be an API, a service that will mainly take the things they repeat and in most organizations there is some cookie cutter in the end, there is something that repeats itself and it doesn't, not every organization reinvents its cloud every time. And then it's much more scalable, it's like some kind of evolution, I would say to the place where now I'm back as if to a central team, but the central team is not my interface but the API is my interface, so I won't have a scaling problem.

Yishai: So the central team came down and is building the infrastructure.

Nati: Building the platform, exactly.

Yishai: And if we just take a super simple example, if I once had to open a ticket to get a bucket lifted, then I opened the bucket alone, as a team, but that created problems of uniformity or policy, and now I have an API or service that generates for me In acts according to the corporate policy, but that I do it in self-service and do not wait for a person.

Eran: Right, exactly, I think Natty described it fantastically, what has happened in the last decade is all kinds of trial and error of models of which model works best. At some point we saw that the DevOps teams really became an illogical bottleneck, instead of increasing the velocity they slowed down the development because everyone depended on DevOps and there were never enough people. And it actually hurt the development's ability to develop quickly. They also kept some kind of expertise on this whole subject of the cloud for themselves, and I don't think that the DevOps are to blame for this, I think that there is also on the development side, the development has not always shown a willingness to come and learn all these subjects in depth, things related to infrastructure or infrastructure as code, we have not always seen the willingness from the development side to be an expert in this and in the end I can understand it. They wanted to deal with business logic and the things they came to do, which is actually to develop features and bring the product another step forward and less to deal with operations, and this topic of platform engineering. I think it's some kind of model that basically addresses two worlds. On the one hand, you have one team that is truly an expert on this topic of infrastructure, on the other hand, it is not a bottle neck because it provides the same framework that allows for controlled freedom of action as opposed to, you know, everyone doing what they want, it is really controlled with the help of the tools that maintain uniformity and governance and security and all the aspects of costs, this is now a very popular topic, I open brackets, a few years ago the CEO was not so important how much you pay for a cloud because everyone wanted to run fast and no one was focused on optimization, You were focused on let's run fast and don't waste your time on it. And today everyone looks at the cloud, opens their eyes, says wow, I'm spending a lot of money on it, I have to do optimization, and a lot of that is because people ran without controls.

Yishai: Yes, it's simple, the investors were looking for growth at any price, so I'm going for growth at any price. Now they look at my margin, I have to take care of the margin.

Eran: Exactly, platform engineering actually enables control, control of all aspects, part of which is also cost control. Making sure that the same machines that the developers pick up in the morning are closed at night is just a very classic example of something that can be done easily.

Nati: Now many times when there are waves like this, they say wait, it's terrible, it's so obvious, how come we didn't think of it before? And this is where the point of maturity comes in, I mean it took time to reach a situation where there is some kind of standardization in the market. And the standardization, let's say around Kubernetes, around Terraform, without that we wouldn't be talking about platform engineering, and that's also important to understand. Because when everyone had their own tools and things that were very different from organization to organization, then obviously you can't build a generic platform that will solve this problem and you can't produce tools like Firefly and other things that are growing today because there is this standardization. There is an entire industry that has grown up around the Kubernetes ecosystem that wouldn't exist if there wasn't such a thing called Kubernetes and all the organizations were using it. Then a situation arises where suddenly everyone finds themselves in some kind of situation, doing the same thing, everyone picks up a Kubernetes cluster namespace for development, a cluster for production, a separate cluster for development, we become something that is very canonical and why do I have to waste time every time?

Yishai: 95 percent do exactly the same thing.

Nati: Exactly the same thing with different nuances, I need it in such regions and not such regens, with such optimization, but not many differences and it becomes very repetitive. And then it's a classic case where, when the problem becomes generic, then comes the platform. This wouldn't have happened before, in 2015-16, so I'm just putting it in perspective that we won't think about it, we were stupid then, now we're smart today, how come we weren't smart then, what is this weird industry called hi-tech? who keeps reinventing the wheel.

Yishai: Yes, well, 20 years ago we, in order to pick up a server, had to pay 15 thousand dollars and wait two months for it to arrive in the cage, and someone had to go there and install it.

Eran: This transition in general from classic infrastructure people, system administrator to DevOps, to the platform, it was really due to technologies that make it possible to do these things. If there was no cloud, we would still be dealing with data centers, and it is completely true, the uniformity that container platforms provide today really creates some kind of alignment and a very large common denominator between the organizations.

Yishai: What about the shift in the focus of the DevOps function from questions of uptime and runtime, at the end at the end people sometimes stretch the DevOps to runtime, to SRE, to Ops really who gets up at night or how do I know that something fell at night, and on the other hand , I think it's a shift that part of this process of moving to platform engineering and the questions of almost everything you've just described in three stages, we'll talk about the velocity of development. So on the one hand there is velocity and enablement for development, and how do I improve the developer experience, and on the other end uptime. How did the focus change?

Nati: I think there are two points here, I mean I will actually touch on the developer experience which has actually been neglected for a long time. Why are we neglected for a long time? Once again, a matter of evolution, because most of the time we were busy with how to automate the infrastructures, so obviously the developer experience received secondary validity for the purpose. Today, when the platforms have stabilized a bit, then suddenly developer experience becomes very, very important. developers find themselves doing a lot of things 90 percent of their time in some statistics, which are not their core business, do not know how to really manage infrastructures, and instead of developing code, so the issue of developer experience suddenly becomes important now. And so the platform people for that matter today need to give much more emphasis on this, they are measured on this and they need to give it because this is the expectation of the developers in the organization from this thing. The uptime is created by the very fact that you move to the platform, I mean if until today I had to maintain the database and maintain the clusters and maintain it and put a person who receives alerts and then he has to get up at night and do it, I become more of a product, I mean DevOps themselves become product people, which is also a very big change in DevOps. Within the organization it is a product, and I manage it like a product and I don't manage it like pipelines and scripts that are constantly moving and moving and this is a very big change, because this means that DevOps people are no longer just operations people, they develop for everything themselves. This means they have to think like a product and many times you see that part of the organizational change is to put a product person on the team itself, and plan releases and plan everything we know from the product world.

Eran: By the way, do you see this?

Nati: I see it happening in organizations such as Wix and AppsFlyer and other organizations that are relatively large, that have a lot of teams, so it happens in practice. Smaller organizations it’s not happening yet, and it is usually also a matter of evolution, I think that as the platforms stabilize, we will see it in small organizations as well. In large organizations it is for sure.

Yishai: Are you actually saying that part of the maturity, part of the maturity of platforms and platform engineering makes the uptime something that is exempt? If I have a service, it is well built, so its uptime will be fine, and now I can focus on velocity?

Nati: So I say something like this, let's take a second look at what uptime was in the past and what uptime is today. Today the cloud itself gives you a lot of infrastructure that, let's say if we're talking about EKS and Kubernetes and all kinds of things, there are a lot of solutions that guarantee you relatively a very high ceiling for why the cloud guarantees you a much higher uptime in the stack than it was before. Previously it was VMs and storages and you had to take care of the entire upper layer. So you already have a much higher coverage that is granted by the cloud providers. You still have the uptime of the application left and this is another layer that you are responsible for, but it has been reduced. The second thing related to uptime is really the procedures. I mean, how do I actually make sure that the entire process of continuous updates and all these things doesn't break my processes? And here, too, many tools have been created that already know how to do this job in a more stable way. So simply the delta that today's DevOps people have to concentrate on has shrunk. And on the other hand, the part they almost didn't touch, which is the actual development of a product, and which is dealing with the UX of developers, it becomes a bigger part that they are measured on and that is transition. This is a difficult part by the way, not many will pass it, there will be many DevOps people who will not fit this transition. And it's happening now and it's in transition and it's not easy, for many organizations it's even very difficult.

Yishai: Eran, how do you see this transition from focusing on uptime and building infrastructure to focusing on developer experience and velocity?

Eran: In the end, the way I see it - once everyone who wasn't in the feature teams or scrum team writing the development, it was the DevOps people, and then it doesn't matter if on the spectrum you look at developer experience on the one hand, or about production engineering, it's like these are the people who do it. But if you really understand the dynamics, it is actually a slightly different profession. That is, the same people who are responsible for the reliability of the service and the scalability and all the things of the so-called non-functional testing, does it meet the scale, is it bulletproof, this whole issue of regens and what happens in disaster recovery. It's actually the same subcategory of DevOps, the SRE or the production engineering, we see it in a lot of organizations, which are completely different teams and this is their expertise. What you do see is how it connects to the platform, which is also responsible for how to produce the same developer service with the same metrics, with the same guardrails related to monitoring, and I see this seeping into the development teams. That is, this world is not really managed on the side by a dedicated team, but the same dedicated team that is the subject matter expert in these worlds of monitoring and engineering, it actually gives the Torah and really makes sure that the service accessed by the platform contains Also those elements related to the monitoring of the application and not only the business logic. That is, do you have the necessary health check? Does it know how to scale and what happens in situations if the service is down? And as Shanti also said, a lot of the new technologies like Kubernetes, they abstract it. Kubernetes takes the approach that a pod, which actually runs the application, can die at any moment and it doesn't matter, there is no problem with it if a pod crashes, because Kubernetes will do the orchestration and another pod will come up or you always have more than one pod in the deployment.

Yishai: Under the assumption that the application knows this and knows how to manage.

Eran: True, and this is really the job of those platform people, to make sure that what describes the application contains the same components so that Kubernetes does its job properly.

Nati: This is also where the issue of the shift left comes in, I mean in the past those SREs what they would do was fix them retroactively. I mean, someone would do something, would do something wrong, check, oh, you didn't do the configuration right? And they say more or less 95 percent of the downtime is due to human error. And one of the downsides of automation is that it does, I'm trying to think of the Hebrew word, a kind of amplification of the problem.

Yishai: The error comes on production.

Nati: Exactly, you end up with a much larger blast spectrum for each mistake. And the shift left is actually intended to transfer the responsibility from, for that matter, to identify the problem in production, to identify it in the development stages. So the role of the DevOps people would change from the role of monitoring and identifying problems afterwards, to creating policies, scripts that do validation, that is, not to be the person who is the bottleneck for the matter, but to create the policy that in advance will ensure that everything that passes through this pipeline probably will be production grade. And it's a different perception to how I am, to my role for that matter as an SRE or whatever it is, I now make policies. I don't talk to the developer and audit him, I create a policy that audits him so that the developer himself will see that he made a mistake and he put something wrong here and will fix it himself.

Yishai: What do you think is the next thing in the world of DevOps, what is the challenge around the corner, where will DevOps and platform engineering focus in the next year or two, what is the next frontier?

Eran: I think that a great many of the cloud services will move to provide even more abstractions, that is, if today I actually have to take care of the Kubernetes cluster and everything related to its operation, then we will see more abstractions on top of that and then the developer's life will be much easier in the context of how he brings the Its code to run and how many processes on the way need to arrive so that this thing will be reliable and the service will be up. So a lot of this, a lot of the evolution that we are going to see is really related to a new offering that will be of cloud services.

Yishai: Basically, parts of the platform engineering will become a managed service as part of the cloud service.

Eran: Yes, I see let's say now they released at the last Re:Invent, AWS announced a service called Code Catalyst, which is exactly that. He takes all the elements of platform engineering, and brings one platform that enables, that is, makes blueprints accessible, does CI, that is, all in one, so the key that he accesses AWS is no longer some kind of store with full of services and full of configurations - but everything is very abstract and very accessible. We will see it coming from the big vendors, not only from startups, and little by little it will really make platform engineering mainstream and not a cool niche that only the early adopters embrace.

Nati: I think it's come in, you know, today everything that asks what will happen in the future? So ChatGPT and such, I'm tired of even hearing it or hearing myself talk about it,

Yishai: This podcast is created by real people.

Nati: Exactly, yes, but the truth is I had some interesting nuance to tell about it, but I'm not sure it will fit into today's topic. But in any case, I think that if I look at the closest thing in the future of ChatGPT and its impact on DevOps, it is a Co-Pilot of sorts that is based on an engine called Codex, which is actually a much smarter code generation, because at the end of the day, what takes DevOps people the most time is to write the the templates and make sure they work properly. I've been experimenting with this recently, I even said Cloudify will create a Cloudify template for me from such and such a version. Boom, I got it without reading a single line of documentation, so finally picking up the Kubernetes cluster or whatever, took a little more time. It shortened my time very significantly, it's right here, it's not something in the future, it's already happening.

Eran: One of the things we did recently, literally the whole issue of generative AI exploded about two months ago. We actually released CLI tools, we called ourselves AIaC, which is a combination of AIC infrastructure as code and AI, which actually does it from the terminal, anyone can request a template for any configuration they want Terraform, Pulumi, Dockerfile, not surprisingly it has gained a lot of popularity, from It's probably a free open source offering. Not related to the commercial product that Firefly sells, but it touches on that. Because the world goes to her. There is no doubt that the abilities of the AI to create first are, in my opinion, very impressive. And I completely agree with Nati that we will see more and more products incorporating AI. We at Firefly have probably done it too, other vendors will do it and are already doing it, I see, almost every time I open LinkedIn I see some vendor who-,

Nati: Yes, I say, it will also be the technology that I think goes the fastest from hype to commodity. We will see it too, I mean probably a million users in 5 days, which broke every world record. By the way, also because of UX issues, because ChatGPT probably existed long before, but what made it pop was that they turned it into Google,

Yishai: This is the chat.

Nati: Exactly, that they turned it into such a Google, write a text, write it down, you will get a result. And you don't have to be a scientist to work with it, every person, including my children and my wife, write text. And you see, radio broadcasters, in short it becomes a search engine. And the second thing that I really think is happening, is really the integration within products. I mean, it becomes more and more integrated from your IDE, with all kinds of other tools, and suddenly it becomes spread all over the place, so I think it will be quite a commodity, okay, of course, it's like search, it's now part of the matter.

Eran: We talk internally within the company about futurism and try to think about what will happen. And one of the things that is very clear to us will happen because of the topic of AI, is that just as now cloud is something that is very easy to consume and very easy to spend a lot of money, it has become a very significant cost center for the organization, AI is going to be something the same, that is, people will consume AI for internal use In organizations, it will of course be AI as a service as the OpenAI and Microsoft APIs make it accessible as an AI service and it is going to be really, in my opinion, in the coming years, a very large weight of the financial expenditure of organizations will be on AI services and then even imagine that there will be startup companies that do cost reduction or AI tokens.

Yishai: Let's make a caching layer and...

Eran: Think you have several vendors, so you know, you'll need some third party product that will tell you okay, if you take the AI from the other vendor you'll save this way and that way.

Yishai: Now, to this query.

Eran: or saving plans for AI. The truth is that it raises a question that I remember trying to answer for years and not succeeding and it becomes the opposite or interesting in this sense of data processing. I mean to this day the form in which monitoring systems and such things, you collect the data to you, it is yours, and you generate insights from it. And it used to be like your property too. Today we see that as much as the information is not yours, I mean the information that is not yours is probably even better quality than your information, regarding trends, regarding cost analysis, regarding all these things. So ChatGPT is an excellent example because it actually aggregates all the code in the world accumulated on Github and other sources or Quora or all kinds of things like that, and produces much better insights from the fact that the information becomes collaborative. And now if you want your system to be a learning system, then if you reduce it to only the information you collect, then you probably have less data to work with. So suddenly it also matters, and this is something that I still don't think people have fully internalized - our whole concept of information processing, of data analytics, of insights and how to generate insights. We suddenly become something that needs to be supervised learning and not something that accumulates the information and does the analysis for us. And this is a very big change in my opinion.

Yishai: And the Pandora's box of ownership of the data, of copyrights, has already been opened.

Nati: Exactly, I think there are still many, many open questions, but I think what ChatGPT illustrates is that the ownership of the data no longer becomes the property. But you still have to maintain it in some form, but making that information available out there will probably give you more value. What will become more valuable is the supervised learning, I mean how can I add domain-specific knowledge that he could say like in your case, I don't see a search for all Terraform in the world, I want a search for Terraform Modules in AWS that do this and here is someone Who knows how to filter it for me and bring it to me in a more accessible way.

Yishai: Okay, so if we go back a bit and bring the discussion back to the world of DevOps. So okay, generative AI as another tool, maybe even something that will make a shift in the skillset that a DevOps person needs to build the platform or to improve it. What do you think is going to change in terms of the type of problems, or what will be the focus of the DevOps role? If I look ahead, what main problems will they solve or start to solve today and solve more for the organization?

Nati: I can testify now from my place at Dell, so there is a whole world here that I think is less accessible today than the worlds of DevOps, it's really the topic of IoT and topics of devices and really looking at the cloud beyond the central data centers that are now sitting in the cloud. And where do we see it? Everywhere, the camera is connected to Wifi and connected to it and probably receives continuous updates without us even noticing. The car, if you have a Tesla, then you probably see it, everyone sees it in their car, the refrigerator at home, the TV, the phone of course, everything we move with today, my watch probably, everything is already connected to the Internet, everything is connected to the cloud, but when we talk about DevOps worlds, you see that the DevOps infrastructures are not built to deal with it in a generic way, that is to say, everyone will violate the update for themselves.

Eran: That is, a pipeline of how I do development and release and deployment for the same, for the same end devices,

Nati: In a generic way, in a way that is not tailored to every device and every organization and I don't need everyone to do it themselves.

Yishai: I think that the public clouds have not yet solved these questions at a level that even comes close to how they solved the centralized compute layer for classic b2b SaaS.

Nati: True, and in my opinion, this opens up a whole world of feedback loop, because apparently these components already really sample the person and connect the person into the cloud and produce some kind of scary, but also interesting, interaction with a lot of potential. I think this is still uncharted territory. Speaking of what happened to us at Cloudify is that we thought about it a little too early, it caused us some disruption along the way, but in the end someone recognized that it was what they were looking for and that's more or less how we ended up at Dell as well. So I think, from what I see, it's going to be something that probably a lot of people will need in the even near future.

Yishai: I mean, one type of new frontier for DevOps is to solve the questions of companies whose footprint is not just one SaaS product in the web application cloud,

Nati: Yes, that's allpoint of sales On the one hand, it's all the 5G we see, it's the whole issue of manufacturing in the companies that deal with it. But mainly on the analytics side, I mean there are a lot of industrial companies, manufacturers that obviously have distributed computing, the people themselves are distributed, sometimes a lot of components, I'll give the example of the Garmin I have, so I think this is the next frontier.

Yishai: interesting.

Eran: I would take one step, so-called earlier in time, and I think that processes that we see today that are gaining momentum like GitOps, the whole world of what the code looks like in production and what the manifest looks like in production are actually by the source of truth which is actually Git So today it is partially adopted, it is still considered the so-called cool kids, the hipsters do it,

Yishai: They do it and-trunk-based together.

Eran: Right. So I think that such methodologies are more centralized to the same place. And at least, okay, I have the code, I have the kid, after that I have the CI and after that I have the deployed ArgoCD. We will see that everything is concentrated on at least systems, it will be much easier to understand what is running in production, even for the individual developer, he will look at it in the branch of the same main, see the manifest, understand ok, this is the version he wanted. This is essentially the essence of GitOps. I think it will gain more popularity, I see it somehow slowly entering even organizations that are a little more old-fashioned in their concept.

(transitional music)

Yishai: Towards the end I would like to double click on this term of developer experience. And through the eyes of a DevOps or platform engineering team, try to go over a little about what are the, where are the biggest challenges of the developer experience? Where do the developers stop today or do they feel or have an experience that in the end also leads to a lower velocity or a feeling of, to the point of disgust / churn? Where are the biggest challenges where DevOps can and does start to help or solve developer experience questions.

Eran: I think the most classic issue is really areas where they don't have the knowledge or the experience to be excellent. So if you take that typical developer who is very good at Java, Go or any other language in Python, and then put in a layer of OK, you are responsible for the infrastructure of that service, it can put them in a place of some discomfort. Only because of the differences, on the one hand they are in the context of a sprint, they have to deliver, everything is measured, and on the other hand they have to do something they are not necessarily experts in. I think that's really really the point where you see developers kind of take a step back or it's the area where they don't feel the most comfortable.

"If you take that typical developer who is very good at Java, Go...and then say - you are responsible for the infrastructure of that service, it can put them in a place of some discomfort."

Yishai: So like it's the third step that you described Nati, this transition from shift to full pat to a state where, okay, some of the things are, I can write the function and it will run.

Nati: I think Backstage is a great example today of open source platform engineering, coming from the world of Spotify. Spotify are considered a kind of rockstar in the worlds of DevOps, almost all of them adopt their theory of how to build teams and squads,

Yishai: Is this the new Netflix?

Nati: But they are more invested than Netflix in terms of development, and less throw code to the community and say take it, they really turn it into a product and have, in my opinion, similar aspirations of Amazon. I mean who didn't follow Amazon like that, Amazon was a bookstore once, it wasn't an infrastructure company and no one expected this to happen. Spotify already has some similar aspirations from what I've talked to them about, it's their name, in the mind, I don't know how successful they'll be, and backstage is some sort of beginning, in this respect that's beginning to show that they're entering into a much larger investment, including the fact that they announced a commercial product of the platform engineering. So they experienced, and I got to interview people from them, the user experience, the developer experience was such that now I'm building a microservice, I want to develop a microservice, okay? I enter an environment that has other microservices, how do I know what their interface is, what their API is, where they are, how do I know now what their status is, it's mature, it's not mature, which version I'm working with.

Eran: Also the ownership, who is the owner?

Nati: Who is the owner, exactly, I mean a lot of things that until today were distributed among your CI/CD for monitoring system for documentation,

Yishai: To the-organization chart.

Nati: to the organization chart, and they would throw developers, telling them: "Take it, start developing." So it's probably not a good experience. And the first thing is generally very simple, come gather the information for me, I want to get to the project, I need to develop microservices, bring me all the information I need to develop in one place. After that it came in that I don't want to do surgery in places where I don't need to. I mean, if it's a recurring thing, I probably shouldn't bother with it, I'm wasting time on things I don't need to. If it's something new that just came out, a new feature on Amazon, it's probably not on the platform yet, but it's critical to my development, so I'll also want to maybe because it interests me. I'll want to learn it, I'll tinker with it, but narrow it down to this, don't make me now pick up Kubernetes Clusters 100 times for the same thing and get to know a Helm Chart I don't need to know.

Yishai: Kafka.

Nati: Or Kafka, exactly.

Yishai: Oh how fun. “Fun”-ka (Kef-ka - transliterated) (laughing)

Nati: So I think that these two things are at the level of developer experience, the accessibility of the information and, as a result, the ability to develop in an easier and simpler way. And the second thing is to really allow me the flexibility where I need it so that I don't get stuck on the sysadmin of the cloud. The flexibility only in things I need, not in things I don't need, things that are new, things that are more critical.

Yishai: Okay, so we talked about discovery and basically,

Eran: catalog,

Yishai: Let me know where the servers are, who is the owner, who will do a code review for me, who will do a design review for me, who should I talk to anyway? And where are APIs, what even exists, I think it's also developer experience that changes throughout the life of the developer in the organization.

Nati: Right.

Yishai: And most of the organizations, maybe a little less now in the down economy, but most of the developers are new developers. They have now joined, we have recruited them, everyone is growing, everyone needs to constantly search for information. Later, when I'm already a senior, I know all the organization. So…

Nati: By the way, even if they are not new, many times you drop them each time into some other project, especially in microservice worlds, it is no longer some monolith that you live in for a very long time, you are always thrown from one type of project to another, even in the same project, and then This accessibility becomes... and let's say when I even talked to this Spotify developer about his experience, he says I came to Backstage because I actually found myself investing a lot of time each time just learning to get to know who I need to talk to, who are the people who do know and have the domain expertise, and until I start writing a line of code it took me a very long time. And from there the Backstage grew.

Eran: And you also see organizations that adopt it, AppsFlyer is an example you brought up, they adopt platform engineering at a very high level and also use backstage, this system becomes business critical for everything. I mean it's not a nice to have, I have some sort of service catalog system, but it's a system that has a team that sits and maintains it and this is so crucial to the velocity of the organization, because you put emphasis on the developer experience. You can't live without it once you fire it up.

Yishai: That is, it will fail tonight, so the production of AppsFlyer will not happen to it, but the developers find it difficult to move.

Eran: It will always fail at night, so there will be someone who will wake up,

Yishai: Because the developers for velocity mean,

Eran: Because it has become a very central piece of how they develop a product, so it's no longer a nice to have, it's a business critical system.

Nati: I want to give an analogy for it to be understandable because many times, even in your question, I think it showed the mistake that many times people make. That means you have to think of a software delivery pipeline like a production line. I mean now if I told you okay, the production line at Tesla has stopped, you would tell me that obviously someone has to make sure it continues working and there is no way that this pipeline has stopped. It is true that the car is not yet in production, but the pipeline has stopped, so it is even more critical because it is clear that the whole company is sitting on this pipeline. the matriculation I think of high-tech in general, the fact that they still don't fully understand that a development pipeline is no different from a Tesla production line, it's exactly the same. It's a pipeline, in the end code goes in on one side, a feature comes out on the other, at the end there is a customer who consumes it and pays money to the company. And somehow this insight, this internalization exists but not to that extent. And so many times you hear the question, okay, then the development is stopped, he will get up in the morning, he will do it, and I tell him no, it's not one developer, a thousand is many developers, and a house is what will eventually sell to you and you will have a delay in revenue at the end and that's it. will hurt your bottom line.

"That means you have to think of a software delivery pipeline like a production line. I mean now if I told you, the production line at Tesla has stopped, you would tell me that obviously someone has to make sure it continues..."

Yishai: I think that in many cases it's not just delays in updates and new features, if the organizations have moved to modern cicb moves, etc., if I don't have a reliable pipe line at any given moment, I can't handle the faults. In the middle of the night now there is a problem with production, without a pipeline I,

Nati: And you wouldn't even be asking that question if it was a production line of cars, would you? or of refrigerators or of something else.

Yishai: Truth.

Nati: I say it's not different, we just haven't internalized that it's not different yet.

Eran: Yes, these organizations really do measure, they have KPIs that are a derivative of how much time I saved in development time, how many developers I saved, how many automations I did that ultimately made me much more efficient? It's things that are super measurable.

Yishai: How do I measure how much time I have saved? What's the approach to at least begin to answer that question, okay, I picked up Backstage, how much time did I save?

Nati: So I think the thing that for that matter bridges it from the worlds of marketing for that matter, is PLG - product lead growth, which actually indirectly answers this question but it answers this question. I mean you can see today, I mean today part of the way in which marketing works, is very connected to development. how is he standing If we look at Slack, an example of a PLG that everyone uses, and that's a different discussion, PLG does work, doesn't work...but by and large I know today, in many cases, how to measure which feature in the end brings me monetization. Yes it brought me monetization, improved the UX for me, did not improve the UX of the developer itself. Now obviously, as much as I have a delay in this feature, I know that now I am losing 20 percent of the login users, why? Because they can't find the right button, or whatever. So it is quite clear to me that my delay in this matter will hurt my client. I have a feature that a competitor has developed and I don't have it, for example integration with ChatGPT, so I can tell that it will hurt my sales. So there are some features that we know for sure how they translate into business value and as a result why they are critical. And there is a lot of non-functional that is soft, I would say metrics. It's not something I can really measure in the bottom line, but we know that if our system is not highly available in a good way, it will also harm the experience of our customers, but the indicators are no longer rigid or not, they cannot be translated in a way that is quantified, I mean that it is, Can't find the word in Hebrew, it doesn't sit right on the dollars, but we know it's important and it's more like Soft Matrix. But today there are quite a few things, because there is a connection between the, in SaaS, between the user experience and the features and conversion, there are a great many of the features that you can say that, really almost in a rigid way.

Yishai: So I think that really in the world of, if I ask myself how I measure how much time I saved or whether my development is more efficient because I put a tool in place, you can look at questions like OK, how did my cycle time get shorter, meanwhile to restore for faults, do I need To search without a catalog for the service and who is its owner in order to address the problem, so I can say okay, I put backstage or improved some capability or gave a cookie cutter, instead of each developer having to do it alone. I have now reduced the cycle required to pick up an end-to-end feature, and these are things that are already established today because the way to measure velocity and measure productivity of development organizations. So maybe I don't always know how to link the investment I made to the result, but I can see investment and improvements in the tools, and justify further investment in platform engineering and tools to help developers run faster.

Nati: Yes, these are metrics that I think a lot of organizations are measured on, this is velocity, which is usually how quickly you develop features for production, really add the business value to it, that is, the ability to really see how it is reflected in the bottom line , I think gives you a good picture of how critical this is to organizations.

Yishai: So we talked about developer experience, you mentioned the issue of Discovery and finding the problems, we talked about doing more or less we call it toil of repeatedly doing the same infrastructure work that may not be my comfort zone either. What’s more? Where else do you see developer experience pains or questions that DevOps and platform engineering can solve?

Eran: I think it is also on the subject of uniformity and whether what you produce is compliant with the corporate policy. So one of the things that platform engineering really enables or the toolib in this ecosystem, is that the same policy is determined once, and then there is actually enforcement already in the development phase. Are you doing it aligned or unaligned? And it doesn't have to be security, it can be anything, it can even be at the cost level. Today we see projects that allow you to do cost projection even at the CI stage. A developer knows how much a service will cost even before he deployed it and looked at the monthly invoice in retrospect.

Yishai: Yes, I think that -, if I throw it back to experience, then of course my ability not to grab the boom in the head later and correct,

Eran: You make less mistakes, you have less room for mistakes, I think it actually improves significantly. No one wants to know that he did some kind of work that he spent a lot of time on and then in the end there is some kind of, I don't know, security vulnerability for that matter, or it costs much more than it would have cost if he had changed something.

Yishai: And now he has to fix it under fire, it's already in production, it's not a good experience. I think what you called compliance also means when I go to maintain someone else's code or go to join some project, it will be familiar to me, it looks similar, it looks the same, I don't have to suddenly learn the, oh, that's how they did the infrastructure. That's how they sewed things, because I know it from my previous project and the one before it that they all look similar in these ways, and I can focus on the logic and not on learning the,

Nati: True, there are a lot of things that are not streamlined in the life of a developer even just when we start to develop, there’s the dev environment they now have to set up - for dev and testing. So it also takes me a while to accept this thing, certainly if I want to make a sandbox and play with something before it even enters some kind of real system. There is the, in almost every organization you will find more than one pipeline, I mean you have ArgoCD for Kubernetes and you have Jenkins or CircleCI for that matter to other things. And there are many more of these and somewhere even to see what happens if my code really takes me a relatively long time. So if we look at the whole experience of a developer today, at each of the stages of development, there is a lot of I would say jungle within this world. Backstage helps in that it centralizes the information, but there are still many things that I need to change and this is where the topic of platform engineering comes in to turn them into self service. Ok, you want to create an environment, click, there you have an environment created, you can do what you want. It's an environment that is a sandbox, so it's an environment that's already semi-production, you can turn things into much more self-service from that point of view. pipeline GitOps really makes it possible to normalize this whole topic of pipelines. So you only talk to GitOps and GitOps already connects to ArgoCD Or for that matter, you shouldn't be the one dealing with it. The thing that remains, I think and will still remain complex, is the whole issue of troubleshooting. I mean, okay, so I developed something, something that happened and broke and things like that, it's still a process that I think still takes time. And I don't see many solutions that completely solve it. There are all kinds of attempts, I think it is still rough edge in many ways.

(transitional music)

Yishai: Finally, we would be happy if each of you would give one tip like this to someone who lives the world of DevOps and wants to develop or look at what the next thing he or she should pay attention to. So this is one point to put if I am a practitioner or leader in the world of DevOps and platform engineering, what to read tomorrow or what to think about in the coming weeks.

Eran: I'll say something that - it's a bit philosophical but I say it from a position, I really believe it today that I have a slightly different perspective not only as a DevOps manager but as a vendor, always think about whether a product you buy off the shelf will be worth more to you than doing it in-house. I was guilty of this too, I think a lot of DevOps managers got addicted to building their own solution.

Yishai: NIH, yes.

Eran: Yes. And it's nice and it's cool, there's a lot of open source today and a crazy community in everything related to the DevOps engineering and platform engineering ecosystem and it's very easy to show value without spending money out of your pocket. It's not always true, again, I'm saying this from a position I'm familiar with DevOps people, so it's comfortable for me to say it, but I say it really understanding that looking at ROI in a mature way and whether it's worth paying for an off-the-shelf product rather than developing, is something that everyone , anyone who is a leader in these areas must embrace the ability to do this.

Yishai: The fact that I speak from a position does not mean that I am wrong.

Eran: I'm certainly not wrong.

Nati: I think I'll come back to two things that I think I'll touch on, one of them half in position, one of them not at all. So I will start with the one that is not in the position, which is the whole issue of really if I am a junior and I enter these worlds, my experience both with my son and with friends, learning today how to enter the worlds of DevOps is very easy with the worlds that exist today for ChatGPT and Copilot , it's unbelievable how much even this shortens the learning process. Because it filters for you exactly what you want for that point of question, how do I write code in Python for this thing, how do I find a template that saves me RDS, in a second you find, in a second you can find the solution, it's an experience that really shortens the learning process , even in the process of code review, I mean I wrote code, I want to know if it is good enough or not, the chances that I will find someone in the organization who will actually do a review and give me feedback, much more than if I run this code with Copilot and I will get a lot of feedback on it Better quality, before I finally get to production, and this is highly recommended, it can be very short, I see that there are still a lot of people who still don't know how to take advantage of this ability, it has to happen and whoever knows how to do it, I think, will make a leap in their skill .

Yishai: This means investing in learning to make prompts and learning to speak engineering.

Eran: Today it is called prompt engineering.

Nati: Yes, and it requires skill in itself how to work with these tools and how to teach them and how to get them to give you the right products and this is not what is called straight forward.

Yishai: Who among us is old enough to remember the early days of Google, and the search engine before it was, the art of knowing how to structure my search query so that I get results that make sense.

Nati: It's just like that, and here it's even more art. Here it's even more art because it's really interactive and it's real. You know, not everyone who asks a question will get the same answer, so you have to know how to ask the question, you also have to know how to check the answer and you have to, I mean you have to understand a lot in the domain to understand that your answer, ask the right question and also get the correct answer. And this is a skill that needs to be learned and there is an opportunity here, in my opinion, for those who want to develop their career, to enter these worlds now and in these worlds. The second point is to really understand that in the end, certainly in today's industry, we are measured on how quickly we bring value to the organization, to write the Terraform template or whatever it is, it's not necessarily the thing that brings me value to the organization, today there are a lot of existing templates that have already been written, the ability to do this assembly, it will bring much more value to the organization. And knowing how to find these things or use tools that know how to produce it, will bring value much faster to the organization. So less falling in love with doing all the things yourself and the not invented here that I think we all, especially in the DevOps worlds are guilty of. And to think specifically in terms of value to the organization and there is a lot of innovation in this thing, you just have to make the switch in your head, understand okay, we moved a floor, we are going up another floor, and now we are building a skyscraper and not a building made of blocks, and even in a skyscraper there is a very significant engineering challenge, But he looks

Yishai: on another level.

Nati: exactly.

Yishai: Nati, Eran, thank you very much, it was fun, I was happy to talk with you about DevOps, platform engineering and developer experience.

Eran: Thank you.

Nati: Thank you very much, I enjoyed it very much.

(transitional music)

Go to devinterrupted.com to subscribe, you can also find all our episodes in English there. I remind you that we at LinearB are in rapid growth and are recruiting for a variety of positions in all fields. Visit linearb.io/careers to find your next challenge. I'm Yishai Beeri, we'll hear from you in the next episode.

(Closing music)

Links to the nifty tools and resources mentioned in the episode: