Adding gtViews to the existing Python pandas DataFrame and Series class

In Inspecting Python objects with custom inspectors you can learn the basics about adding custom Python inspector gtViews for newly defined classes, in that particular case Movie and MovieCollection to model some IMDB data. Refer to that page for setup instructions (including the installation of the pandas module).

Like in regular GT modable development you might want to add your own gtViews in existing objects in Python.

Next is an example of adding a couple of views to Python pandas DataFrame and Series objects.

The page PythonBridge custom views for pandas DataFrame and Series describes how to load all this code at once.

Let's start by reading some data. The first file contains a lot of data, the second one is way smaller (it might be best to start with that one).

fileName := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'imdb' / 'Movies.csv') fullName
  
fileName := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'imdb' / 'Akira Kurosawa Movies.csv') fullName
  
fileName := (FileLocator temp / #data , #csv) fullName
  
import pandas

df = pandas.read_csv(fileName)
  
df
  
df.head(2).T
  
pandas.DataFrame([[1,2],[3,4]])
  
pandas.DataFrame([[1,2],[3,4]], columns=['A','B'])
  

Inspecting df will show a proxy with only the standard views (raw and print).

A data frame knows about its columns, we start by adding a gtView for that. We define a function and then attach that as a method to an existing class.

from gtoolkit.gt import gtView

@gtView
def dataframe_gt_view_columns(self, builder):
    clist = builder.columnedList()
    clist.title('Columns')
    clist.priority(10)
    clist.items(lambda: list(range(0, len(self.columns))))
    clist.column('Position', lambda each: each)
    clist.column('Column', lambda each: str(self.columns.values[each]))
    clist.column('Type', lambda each: str(self.dtypes.values[each]))
    clist.set_accessor(lambda each: self[self.columns.values[each]])
    return clist

setattr(pandas.DataFrame, 'gtViewColumns', dataframe_gt_view_columns)
  

Inspect df again to see the addition. Now we can try showing the whole table (except when it is empty or too large).

from gtoolkit.gt import gtView

@gtView
def dataframe_gt_view_table(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] > 100:
		return builder.empty()
	clist = builder.columnedList()
	clist.title('Table')
	clist.priority(15)
	clist.items(lambda: list(self.index))
	clist.column('#', lambda index: index)
	for each in self.columns:
		(lambda col: clist.column(col, lambda index: str(self.at[index, col])))(each)
	clist.set_accessor(lambda each: self.loc[list(self.index)[each]])
	return clist

setattr(pandas.DataFrame, 'gtViewTable', dataframe_gt_view_table)
  

For large tables, typically the head and tail are shown, which are subtables, an ideal case for forward views. We don't show head or tail for small tables, since the table view already shows all data.

from gtoolkit.gt import gtView

@gtView
def dataframe_gt_view_head(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] <= 100:
		return builder.empty()
	forward = builder.forward()
	forward.title('Head')
	forward.priority(16)
	forward.object(lambda: self.head())
	forward.view('gtViewTable')
	return forward

setattr(pandas.DataFrame, 'gtViewHead', dataframe_gt_view_head)
  
from gtoolkit.gt import gtView

@gtView
def dataframe_gt_view_tail(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] <= 100:
		return builder.empty()
	forward = builder.forward()
	forward.title('Tail')
	forward.priority(17)
	forward.object(lambda: self.tail())
	forward.view('gtViewTable')
	return forward

setattr(pandas.DataFrame, 'gtViewTail', dataframe_gt_view_tail)
  

When looking at a large data frame with numerical data, you get a better idea of what is there by asking for some statistics. The summary view shows the result of describe.

from gtoolkit.gt import gtView

@gtView
def dataframe_gt_view_summary(self, builder):
	forward = builder.forward()
	forward.title('Summary')
	forward.priority(20)
	forward.object(lambda: self.describe())
	forward.view('gtViewTable')
	return forward

setattr(pandas.DataFrame, 'gtViewSummary', dataframe_gt_view_summary)
  

A series is one row or one column, it is what you get when clicking on either a row or a column. The following view shows its contents.

from gtoolkit.gt import gtView

@gtView
def series_gt_view_series(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] > 100:
		return builder.empty()
	clist = builder.columnedList()
	clist.title('Series')
	clist.priority(10)
	clist.items(lambda: list(self.index))
	clist.column('Key', lambda each: each)
	clist.column('Value', lambda each: str(self.at[each]))
	clist.set_accessor(lambda each: self[each])
	return clist

setattr(pandas.Series, 'gtViewSeries', series_gt_view_series)
  

Like for data frame, we can add head/tail conditional views to series.

from gtoolkit.gt import gtView

@gtView
def series_gt_view_head(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] <= 100:
		return builder.empty()
	forward = builder.forward()
	forward.title('Head')
	forward.priority(11)
	forward.object(lambda: self.head())
	forward.view('gtViewSeries')
	return forward

setattr(pandas.Series, 'gtViewHead', series_gt_view_head)
  
from gtoolkit.gt import gtView

@gtView
def series_gt_view_tail(self, builder):
	if self.empty:
		return builder.empty()
	if self.shape[0] <= 100:
		return builder.empty()
	forward = builder.forward()
	forward.title('Tail')
	forward.priority(12)
	forward.object(lambda: self.tail())
	forward.view('gtViewSeries')
	return forward

setattr(pandas.Series, 'gtViewTail', series_gt_view_tail)
  

And finally we add a summary for a series.

from gtoolkit.gt import gtView

@gtView
def series_gt_view_summary(self, builder):
	forward = builder.forward()
	forward.title('Summary')
	forward.priority(20)
	forward.object(lambda: self.describe())
	forward.view('gtViewSeries')
	return forward

setattr(pandas.Series, 'gtViewSummary', series_gt_view_summary)
  

Assuming you did the above tests with the small dataset, you could switch to the large dataset to see how the views react.