How to Convert Tableau Hyper for Use in a Microsoft Fabric Lakehouse

How to Convert Tableau Hyper for Use in a Microsoft Fabric Lakehouse
Reading Time: 3 minutes

Tableau has Python packages for working with its Hyper API, both for native operations (tableauhyperapi) or through pandas (pantab). With the recently announced public preview of Microsoft Fabric, you can also easily work with data sourced from Tableau Hyper files in a Fabric Lakehouse.

This post explores using the Notebook feature in Microsoft Fabric to convert a Hyper file that’s been loaded into a lakehouse as a file and convert it to parquet using pantab. From there, you can load as a Delta Lake table for use with other Fabric experiences such as Power BI.

To get started, verify that your Microsoft Fabric workspace is assigned to a Fabric, Power BI Premium, or Trial capacity in the Premium section of Workspace settings.

Select the Library management section and add the pantab library from PyPI. This is the Python package that Tableau pushlishes to work with Hyper in pandas. You do not need to explicitly add the pandas library itself for this example.

After the pantab library finishes loading, create a new or open an existing Lakehouse in the Fabric workspace.

Optionally create a new subfolder under the Files section of the lakehouse, then Upload a Hyper file to any location in Files. For this example, I created subfolders called Hyper, where I store source files, and Converted, where I store parquet files that I’ve converted.

After the Hyper file uploads, create a new or open an existing Notebook.

In the notebook, connect to the appropriate Lakehouse and verify you can see your Hyper file. Add the following script in PySpark and modify the variables to match your own Lakehouse paths, file name, table name, etc. In my example, I use the default lakehouse, reference my Hyper source subfolder under Files, reference my Converted subfolder under Files for my destination, load a Hyper file called GeocodingData.hyper, and load the Country table from the Hyper file.

import pantab

lakehouse_path = '/lakehouse/default/'
lakehouse_source_path = lakehouse_path + 'Files/Hyper/'
lakehouse_source_file = 'GeocodingData.hyper'
lakehouse_converted_path = lakehouse_path + 'Files/Converted/'
table_name = 'Country'

df = pantab.frame_from_hyper(lakehouse_source_path + lakehouse_source_file, table=table_name)print(df)

df.to_parquet(lakehouse_converted_path + table_name + '.parquet')  

Initially, in the screenshot above, I’ve commented out the conversion to parquet and only view the pandas dataframe using the pantab.frame_from_hyper function and view the results to verify I’ve connected to the file without any issue.

After verifying that you’re able to get data from Hyper, uncomment out the final line to convert the dataframe to parquet.

Refresh the lakehouse view and verify you are able to see the parquet file in your destination location.

After creating the parquet, open the lakehouse, select the ellipsis […] next to the parquet file, and select Load to Tables to load the parquet as a Delta table.

From there, you can utilize the Delta table in other Fabric experiences such as Power BI datasets.

Leave a Reply

%d bloggers like this: