Adding gtViews to the existing Python pandas DataFrame and Series class
In Inspecting Python objects with custom inspectors you can learn the basics about adding custom Python inspector views for newly defined classes. Refer to that page for setup instructions (including the installation of the pandas module).
Next is an example of adding a couple of views to Python pandas DataFrame
and Series
objects. The page PythonBridge custom views for pandas DataFrame and Series describes how to load all this code at once.
We start by setting up the Python application with the pandas module installed:
PBApplication isRunning ifFalse: [ PBApplication start ]. PBApplication uniqueInstance installModule: 'pandas'.
Let's start by reading some data.
fileName := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'imdb' / 'Akira Kurosawa Movies.csv') fullName
Now try these:
import pandas df = pandas.read_csv(fileName)
df
df.head(2).T
pandas.DataFrame([[1,2],[3,4]])
pandas.DataFrame([[1,2],[3,4]], columns=['A','B'])
Inspecting df
will show a proxy with only the standard views (raw and print).
A data frame knows about its columns, we start by adding a gtView for that. We define a function and then attach that as a method to an existing class.
from gtoolkit_bridge import gtView @gtView def dataframe_gt_view_columns(self, builder): clist = builder.columnedList() clist.title('Columns') clist.priority(10) clist.items(lambda: list(range(0, len(self.columns)))) clist.column('Position', lambda each: each) clist.column('Column', lambda each: str(self.columns.values[each])) clist.column('Type', lambda each: str(self.dtypes.values[each])) clist.set_accessor(lambda each: self[self.columns.values[each]]) return clist setattr(pandas.DataFrame, 'gtViewColumns', dataframe_gt_view_columns)
Inspect df
again to see the addition. Now we can try showing the whole table (except when it is empty or too large).
from gtoolkit_bridge import gtView @gtView def dataframe_gt_view_table(self, builder): if self.empty: return builder.empty() if self.shape[0] > 100: return builder.empty() clist = builder.columnedList() clist.title('Table') clist.priority(15) clist.items(lambda: list(self.index)) clist.column('#', lambda index: index) for each in self.columns: (lambda col: clist.column(col, lambda index: str(self.at[index, col])))(each) clist.set_accessor(lambda each: self.loc[list(self.index)[each]]) return clist setattr(pandas.DataFrame, 'gtViewTable', dataframe_gt_view_table)
For large tables, typically the head and tail are shown, which are subtables, an ideal case for forward views. We don't show head or tail for small tables, since the table view already shows all data.
from gtoolkit_bridge import gtView @gtView def dataframe_gt_view_head(self, builder): if self.empty: return builder.empty() if self.shape[0] <= 100: return builder.empty() forward = builder.forward() forward.title('Head') forward.priority(16) forward.object(lambda: self.head()) forward.view('gtViewTable') return forward setattr(pandas.DataFrame, 'gtViewHead', dataframe_gt_view_head)
from gtoolkit_bridge import gtView @gtView def dataframe_gt_view_tail(self, builder): if self.empty: return builder.empty() if self.shape[0] <= 100: return builder.empty() forward = builder.forward() forward.title('Tail') forward.priority(17) forward.object(lambda: self.tail()) forward.view('gtViewTable') return forward setattr(pandas.DataFrame, 'gtViewTail', dataframe_gt_view_tail)
When looking at a large data frame with numerical data, you get a better idea of what is there by asking for some statistics. The summary view shows the result of describe.
from gtoolkit_bridge import gtView @gtView def dataframe_gt_view_summary(self, builder): forward = builder.forward() forward.title('Summary') forward.priority(20) forward.object(lambda: self.describe()) forward.view('gtViewTable') return forward setattr(pandas.DataFrame, 'gtViewSummary', dataframe_gt_view_summary)
A series is one row or one column, it is what you get when clicking on either a row or a column. The following view shows its contents.
from gtoolkit_bridge import gtView @gtView def series_gt_view_series(self, builder): if self.empty: return builder.empty() if self.shape[0] > 100: return builder.empty() clist = builder.columnedList() clist.title('Series') clist.priority(10) clist.items(lambda: list(self.index)) clist.column('Key', lambda each: each) clist.column('Value', lambda each: str(self.at[each])) clist.set_accessor(lambda each: self[each]) return clist setattr(pandas.Series, 'gtViewSeries', series_gt_view_series)
Like for data frame, we can add head/tail conditional views to series.
from gtoolkit_bridge import gtView @gtView def series_gt_view_head(self, builder): if self.empty: return builder.empty() if self.shape[0] <= 100: return builder.empty() forward = builder.forward() forward.title('Head') forward.priority(11) forward.object(lambda: self.head()) forward.view('gtViewSeries') return forward setattr(pandas.Series, 'gtViewHead', series_gt_view_head)
from gtoolkit_bridge import gtView @gtView def series_gt_view_tail(self, builder): if self.empty: return builder.empty() if self.shape[0] <= 100: return builder.empty() forward = builder.forward() forward.title('Tail') forward.priority(12) forward.object(lambda: self.tail()) forward.view('gtViewSeries') return forward setattr(pandas.Series, 'gtViewTail', series_gt_view_tail)
And finally we add a summary for a series.
from gtoolkit_bridge import gtView @gtView def series_gt_view_summary(self, builder): forward = builder.forward() forward.title('Summary') forward.priority(20) forward.object(lambda: self.describe()) forward.view('gtViewSeries') return forward setattr(pandas.Series, 'gtViewSummary', series_gt_view_summary)
Assuming you did the above tests with the small dataset, you could switch to the large dataset to see how the views react.