PinnedPublished inDataDarvishUnit Testing in Data Engineering: Python, PySpark, and GitHub CI WorkflowLearn how to implement unit tests for Python and PySpark, automate testing with CI, and boost data pipeline reliability.Mar 53Mar 53
How I Saved Money, Time, and Stress by Optimizing Databricks the Right WayI recently wrapped up a Databricks cost optimization project where we slashed unnecessary spend, improved workload performance, and freed…4d ago4d ago
Published inDataDarvishDatabricks Cost Optimization: Practical Tips for Performance and SavingsIn a recent Databricks cost optimization project I led, I achieved significant results: reducing unnecessary compute spend, improving…4d ago14d ago1
Published inDataDarvishMastering Budget Policies in DatabricksLearn how to set Databricks budget policies to control costs, monitor usage, and prevent overruns — ensuring efficient and optimized spend.Feb 16Feb 16
How to cherry-pick on GitHubImagine you have some changes in the dev branch. And you need one or few of the changes go to the main branch, and it can’t wait until the…Jul 5, 2024Jul 5, 2024
Databricks-SDK for PythonI needed to run Databricks CLI in a workspace via a notebook. But this is what I got:Apr 27, 2024Apr 27, 2024
How I automated uploading local files to AWS S3 bucketI have a lot of files in my local machine that need to be uploaded to AWS S3 bucket. The files are stored in subdirectories within…Mar 24, 2024Mar 24, 2024
Resolving Access Issues in Databricks DBFS with Unity CatalogAre you experiencing sudden issues accessing Databricks DBFS? Unsure about the difference between path patterns like /dbfs/ and dbfs:/…Feb 27, 20241Feb 27, 20241
Managing System Schemas in Databricks: Enabling, Access Control, and Dashboard CreationIf you have recently enabled Unity Catalog in your account, you should see a system catalog along with other default catalogs included with…Feb 21, 2024Feb 21, 2024
Creating a PySpark DataFrame with Timestamp Column for a Given Range of Dates: Two MethodsThis article explains two ways one can write a PySpark DataFrame with timestamp column for a given range of time.Feb 7, 2023Feb 7, 2023