Resolving Access Issues in Databricks DBFS with Unity Catalog

Dilorom
3 min readFeb 27, 2024
Photo by Khara Woods on Unsplash

Are you experiencing sudden issues accessing Databricks DBFS? Unsure about the difference between path patterns like /dbfs/ and dbfs:/? Continue reading to discover the reasons behind these issues.

According to Databricks, the Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.

Databricks uses two patterns of access path in the platform:

  1. URI-style path: It starts with `dbfs` and then follows by a semi-colon and the next address name in the path. Example: `dbfs:/mnt/my_mount_name/my_file_name`.
URI-style path in Databricks

2. POSIX-style path: It is a relative path to the driver root / . It starts with a forward slash and follows by the next piece in the address name. Example: /dbfs/mnt/my_mount/my_file .

POSIX-style path in Databricks

If you’ve confirmed that your file is in DBFS (and you’re able to view it from the UI) but are encountering difficulties accessing or reading the file path, or using the fs ls command, and your workspace is enabled for Unity Catalog, the solution you need is provided below.

Unity Catalog introduces a range of configurations and data governance strategies that significantly differ from those of DBFS. As Unity Catalog serves as a modern alternative to DBFS, Databricks strongly advises transitioning away from DBFS in favor of Unity Catalog.

However, DBFS remains available for use even when your workspace is activated and upgraded to support Unity Catalog. It is recommended to utilize Unity Catalog’s catalogs and volumes over DBFS for optimal data management.

To access data in DBFS, you have two options:

  • Obtain the ANY FILE permission, which grants users access to all data in DBFS. This permission should be used sparingly to minimize security risks.
  • Execute your code in a single-user cluster, which provides full access to DBFS. For scheduled code execution, it’s advisable to use a single access mode job cluster, preferably with a service principal, to ensure security and efficiency.

Read more about Unity Catalog and DBFS best practices.

If you go with the option of granting ANY FILE access, run below command to grant ANY FILE permission on a principal or a group. Note that only account level admins or workspace admins can run this command.

GRANT SELECT ON ANY FILE TO `<user@gomain-name>`

--

--