public class

DataFrame

extends Object
implements Iterable<T>
java.lang.Object
   ↳ joinery.DataFrame<V>

Class Overview

A data frame implementation in the spirit of Pandas or R data frames.

Below is a simple motivating example. When working in Java, data operations like the following should be easy. The code below retrieves the S&P 500 daily market data for 2008 from Yahoo! Finance and returns the average monthly close for the three top months of the year.

 > DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("gspc.csv"))
 >     .retain("Date", "Close")
 >     .groupBy(row -> Date.class.cast(row.get(0)).getMonth())
 >     .mean()
 >     .sortBy("Close")
 >     .tail(3)
 >     .apply(value -> Number.class.cast(value).intValue())
 >     .col("Close");
 [1370, 1378, 1403] 

Taking each step in turn:

  1. readCsv(String) reads csv data from files and urls
  2. retain(Object) is used to eliminate columns that are not needed
  3. groupBy(KeyFunction) with a key function is used to group the rows by month
  4. mean() calculates the average close for each month
  5. sortBy(Object) orders the rows according to average closing price
  6. tail(int) returns the last three rows (alternatively, sort in descending order and use head)
  7. apply(Function) is used to convert the closing prices to integers (this is purely to ease comparisons for verifying the results
  8. finally, col(Object) is used to extract the values as a list

Find more details on the github project page.

Summary

Nested Classes
interface DataFrame.Aggregate<I, O> A function that converts lists of data frame values to aggregate results. 
enum DataFrame.Axis An enumeration of data frame axes. 
interface DataFrame.Function<I, O> A function that is applied to objects (rows or values) in a data frame
enum DataFrame.JoinType An enumeration of join types for joining data frames together. 
interface DataFrame.KeyFunction<I> A function that converts data frame rows to index or group keys. 
enum DataFrame.NumberDefault  
enum DataFrame.PlotType An enumeration of plot types for displaying data frames with charts. 
interface DataFrame.Predicate<I> An interface used to filter a data frame
interface DataFrame.RowFunction<I, O>  
enum DataFrame.SortDirection  
Public Constructors
DataFrame()
Construct an empty data frame.
DataFrame(String... columns)
Construct an empty data frame with the specified columns.
DataFrame(Collection<?> columns)
Construct an empty data frame with the specified columns.
DataFrame(Collection<?> index, Collection<?> columns)
Construct a data frame containing the specified rows and columns.
DataFrame(List<? extends List<? extends V>> data)
Construct a data frame from the specified list of columns.
DataFrame(Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)
Construct a new data frame using the specified data and indices.
Public Methods
DataFrame<V> add(Object column, List<V> values)
Add a new column to the data frame containing the value provided.
DataFrame<V> add(Object column, Function<List<V>, V> function)
Add the results of applying a row-wise function to the data frame as a new column.
DataFrame<V> add(List<V> values)
Add the list of values as a new column.
DataFrame<V> add(Object... columns)
Add new columns to the data frame.
<U> DataFrame<V> aggregate(Aggregate<V, U> function)
Apply an aggregate function to each group or the entire data frame if the data is not grouped.
DataFrame<V> append(Object name, List<? extends V> row)
Append rows indexed by the the specified name to the data frame.
DataFrame<V> append(Object name, V[] row)
DataFrame<V> append(List<? extends V> row)
Append rows to the data frame.
<U> DataFrame<U> apply(Function<V, U> function)
Apply a function to each value in the data frame.
<T> DataFrame<T> cast(Class<T> cls)
Cast this data frame to the specified type.
final DataFrame<V> coalesce(DataFrame...<? extends V> others)
Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.
List<V> col(Object column)
Return a data frame column as a list.
List<V> col(Integer column)
Return a data frame column as a list.
DataFrame<V> collapse()
Set<Object> columns()
Return the column names for the data frame.
final static <V> DataFrame<String> compare(DataFrame<V> df1, DataFrame<V> df2)
final DataFrame<V> convert(Class...<? extends V> columnTypes)
Convert columns based on the requested types.
DataFrame<V> convert(DataFrame.NumberDefault numDefault, String naString)
DataFrame<V> convert()
Attempt to infer better types for object columns.
DataFrame<V> count()
DataFrame<Number> cov()
DataFrame<V> cummax()
DataFrame<V> cummin()
DataFrame<V> cumprod()
DataFrame<V> cumsum()
DataFrame<V> describe()
DataFrame<V> diff(int period)
DataFrame<V> diff()
final void draw(Container container, DataFrame.PlotType type)
Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.
final void draw(Container container)
Draw the numeric columns of this data frame as a chart in the specified Container.
DataFrame<V> drop(Integer... cols)
Create a new data frame by leaving out the specified columns.
DataFrame<V> drop(Object... cols)
Create a new data frame by leaving out the specified columns.
DataFrame<V> dropna()
DataFrame<V> dropna(DataFrame.Axis direction)
Map<Object, DataFrame<V>> explode()
Return a map of group names to data frame for grouped data frames.
DataFrame<V> fillna(V fill)
Returns a view of the of data frame with NA's replaced with fill.
List<V> flatten()
Return the values of the data frame as a flat list.
V get(Integer row, Integer col)
Return the value located by the (row, column) coordinates.
V get(Object row, Object col)
Return the value located by the (row, column) names.
DataFrame<V> groupBy(Object... cols)
Group the data frame rows by the specified column names.
DataFrame<V> groupBy(Integer... cols)
Group the data frame rows by the specified columns.
DataFrame<V> groupBy(KeyFunction<V> function)
Group the data frame rows using the specified key function.
Grouping groups()
DataFrame<V> head(int limit)
Return a data frame containing the first limit rows of this data frame.
DataFrame<V> head()
Return a data frame containing the first ten rows of this data frame.
Set<Object> index()
Return the index names for the data frame.
boolean isEmpty()
Return true if the data frame contains no data.
DataFrame<Boolean> isnull()
Create a new data frame containing boolean values such that null object references in the original data frame yield true and valid references yield false.
ListIterator<List<V>> iterator()
Return an iterator over the rows of the data frame.
ListIterator<List<V>> itercols()
ListIterator<Map<Object, V>> itermap()
ListIterator<List<V>> iterrows()
ListIterator<V> itervalues()
final DataFrame<V> join(DataFrame<V> other)
Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.
final DataFrame<V> join(DataFrame<V> other, KeyFunction<V> on)
Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.
final DataFrame<V> join(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.
final DataFrame<V> join(DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.
final DataFrame<V> joinOn(DataFrame<V> other, DataFrame.JoinType join, Integer... cols)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, Integer... cols)
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, Object... cols)
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, DataFrame.JoinType join, Object... cols)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
DataFrame<V> kurt()
int length()
Return the length (number of rows) of the data frame.
final static void main(String[] args)
Entry point to joinery as a command line tool.
Map<V, List<V>> map(Object key, Object value)
Map<Object, List<V>> map()
Return a map of index names to rows.
Map<V, List<V>> map(Integer key, Integer value)
DataFrame<V> max()
DataFrame<V> mean()
Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> median()
final DataFrame<V> merge(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.
final DataFrame<V> merge(DataFrame<V> other)
Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.
DataFrame<V> min()
DataFrame<V> nonnumeric()
Return a data frame containing only columns with non-numeric data.
DataFrame<Boolean> notnull()
Create a new data frame containing boolean values such that valid object references in the original data frame yield true and null references yield false.
DataFrame<Number> numeric()
Return a data frame containing only columns with numeric data.
DataFrame<V> percentChange()
DataFrame<V> percentChange(int period)
DataFrame<V> percentile(double quantile)
Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> pivot(Object row, Object col, Object... values)
DataFrame<V> pivot(Integer row, Integer col, Integer... values)
DataFrame<V> pivot(Integer[] rows, Integer[] cols, Integer[] values)
<U> DataFrame<U> pivot(KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)
DataFrame<V> pivot(List<Object> rows, List<Object> cols, List<Object> values)
final void plot(DataFrame.PlotType type)
Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.
final void plot()
Display the numeric columns of this data frame as a line chart in a new swing frame.
DataFrame<V> prod()
Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.
final static DataFrame<Object> readCsv(String file)
Read the specified csv file and return the data as a data frame.
final static DataFrame<Object> readCsv(InputStream input, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object> readCsv(InputStream input, String separator)
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault longDefault, String naString)
final static DataFrame<Object> readCsv(InputStream input)
Read csv records from an input stream and return the data as a data frame.
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(InputStream input, String separator, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(String file, String separator)
final static DataFrame<Object> readCsv(String file, String separator, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(InputStream input, String separator, String naString)
final static DataFrame<Object> readSql(ResultSet rs)
Read data from the provided query results into a new data frame.
final static DataFrame<Object> readSql(Connection c, String sql)
Execute the SQL query and return the results as a new data frame.
final static DataFrame<Object> readXls(String file)
Read data from the specified excel workbook into a new data frame.
final static DataFrame<Object> readXls(InputStream input)
Read data from the input stream as an excel workbook into a new data frame.
DataFrame<V> reindex(Integer col, boolean drop)
Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.
DataFrame<V> reindex(Integer... cols)
Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.
DataFrame<V> reindex(Integer[] cols, boolean drop)
Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.
DataFrame<V> reindex(Object... cols)
Re-index the rows of the data frame using the specified column names and removing the columns from the data.
DataFrame<V> reindex(Object[] cols, boolean drop)
Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.
DataFrame<V> reindex(Object col, boolean drop)
Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.
DataFrame<V> rename(Map<Object, Object> names)
DataFrame<V> rename(Object old, Object name)
DataFrame<V> resetIndex()
Return a new data frame with the default index, rows names will be reset to the string value of their integer index.
DataFrame<V> reshape(Integer rows, Integer cols)
Reshape a data frame to the specified dimensions.
DataFrame<V> reshape(Collection<?> rows, Collection<?> cols)
Reshape a data frame to the specified indices.
DataFrame<V> retain(Integer... cols)
Create a new data frame containing only the specified columns.
DataFrame<V> retain(Object... cols)
Create a new data frame containing only the specified columns.
DataFrame<V> rollapply(Function<List<V>, V> function)
DataFrame<V> rollapply(Function<List<V>, V> function, int period)
List<V> row(Object row)
Return a data frame row as a list.
List<V> row(Integer row)
Return a data frame row as a list.
DataFrame<V> select(Predicate<V> predicate)
Select a subset of the data frame using a predicate function.
void set(Object row, Object col, V value)
Set the value located by the names (row, column).
void set(Integer row, Integer col, V value)
Set the value located by the coordinates (row, column).
final void show()
int size()
Return the size (number of columns) of the data frame.
DataFrame<V> skew()
DataFrame<V> slice(Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)
DataFrame<V> slice(Integer rowStart, Integer rowEnd)
DataFrame<V> slice(Object rowStart, Object rowEnd)
DataFrame<V> slice(Object rowStart, Object rowEnd, Object colStart, Object colEnd)
DataFrame<V> sortBy(Comparator<List<V>> comparator)
DataFrame<V> sortBy(Integer... cols)
DataFrame<V> sortBy(Object... cols)
DataFrame<V> stddev()
Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> sum()
Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> tail()
Return a data frame containing the last ten rows of this data frame.
DataFrame<V> tail(int limit)
Return a data frame containing the last limit rows of this data frame.
Object[] toArray()
Copy the values of contained in the data frame into a flat array of length #size() * #length().
<U> U[][] toArray(U[][] array)
<U> U toArray(Class<U> cls)
Copy the values of contained in the data frame into a array of the specified type.
<U> U[] toArray(U[] array)
Copy the values of contained in the data frame into the specified array.
double[][] toModelMatrix(double fillValue)
Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
DataFrame<Number> toModelMatrixDataFrame()
Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
String toString()
final String toString(int limit)
<U> DataFrame<U> transform(RowFunction<V, U> transform)
DataFrame<V> transpose()
Transpose the rows and columns of the data frame.
List<Class<?>> types()
Return the types for each of the data frame columns.
DataFrame<V> unique(Integer... cols)
DataFrame<V> unique()
DataFrame<V> unique(Object... cols)
final DataFrame<V> update(DataFrame...<? extends V> others)
Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.
DataFrame<V> var()
final void writeCsv(String file)
Write the data from this data frame to the specified file as comma separated values.
final void writeCsv(OutputStream output)
Write the data from this data frame to the provided output stream as comma separated values.
final void writeSql(PreparedStatement stmt)
Write the data from the data frame to a database by executing the provided prepared SQL statement.
final void writeSql(Connection c, String sql)
Write the data from the data frame to a database by executing the specified SQL statement.
final void writeXls(OutputStream output)
Write the data from the data frame to the provided output stream as an excel workbook.
final void writeXls(String file)
Write the data from the data frame to the specified file as an excel workbook.
[Expand]
Inherited Methods
From class java.lang.Object
From interface java.lang.Iterable

Public Constructors

public DataFrame ()

Construct an empty data frame.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();
 true 

public DataFrame (String... columns)

Construct an empty data frame with the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value] 

Parameters
columns the data frame column names.

public DataFrame (Collection<?> columns)

Construct an empty data frame with the specified columns.

 > List<String> columns = new ArrayList<>();
 > columns.add("name");
 > columns.add("value");
 > DataFrame<Object> df = new DataFrame<>(columns);
 > df.columns();
 [name, value] 

Parameters
columns the data frame column names.

public DataFrame (Collection<?> index, Collection<?> columns)

Construct a data frame containing the specified rows and columns.

 > List<String> rows = Arrays.asList("row1", "row2", "row3");
 > List<String> columns = Arrays.asList("col1", "col2");
 > DataFrame<Object> df = new DataFrame<>(rows, columns);
 > df.get("row1", "col1");
 null 

Parameters
index the row names
columns the column names

public DataFrame (List<? extends List<? extends V>> data)

Construct a data frame from the specified list of columns.

 > List<List<Object>> data = Arrays.asList(
 >       Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >       Arrays.<Object>asList(1, 2, 3)
 > );
 > DataFrame<Object> df = new DataFrame<>(data);
 > df.row(0);
 [alpha, 1] 

Parameters
data a list of columns containing the data elements.

public DataFrame (Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)

Construct a new data frame using the specified data and indices.

Parameters
index the row names
columns the column names
data the data

Public Methods

public DataFrame<V> add (Object column, List<V> values)

Add a new column to the data frame containing the value provided. Any existing rows with indices greater than the size of the specified column data will have null values for the new column.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value", Arrays.<Object>asList(1));
 > df.columns();
 [value] 

Parameters
column the new column names
values the new column values
Returns
  • the data frame with the column added

public DataFrame<V> add (Object column, Function<List<V>, V> function)

Add the results of applying a row-wise function to the data frame as a new column.

Parameters
column the new column name
function the function to compute the new column values
Returns
  • the data frame with the column added

public DataFrame<V> add (List<V> values)

Add the list of values as a new column.

Parameters
values the new column values
Returns
  • the data frame with the column added

public DataFrame<V> add (Object... columns)

Add new columns to the data frame. Any existing rows will have null values for the new columns.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value");
 > df.columns();
 [value] 

Parameters
columns the new column names
Returns
  • the data frame with the columns added

public DataFrame<V> aggregate (Aggregate<V, U> function)

Apply an aggregate function to each group or the entire data frame if the data is not grouped.

Parameters
function the aggregate function
Returns
  • the new data frame

public DataFrame<V> append (Object name, List<? extends V> row)

Append rows indexed by the the specified name to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("alpha", 1));
 > df.append("row2", Arrays.asList("bravo", 2));
 > df.index();
 [row1, row2] 

Parameters
name the row name to add to the index
row the row to append
Returns
  • the data frame with the new data appended

public DataFrame<V> append (Object name, V[] row)

public DataFrame<V> append (List<? extends V> row)

Append rows to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.length();
 2 

Parameters
row the row to append
Returns
  • the data frame with the new data appended

public DataFrame<U> apply (Function<V, U> function)

Apply a function to each value in the data frame.

 > DataFrame<Number> df = new DataFrame<>(
 >         Arrays.<List<Number>>asList(
 >                 Arrays.<Number>asList(1, 2),
 >                 Arrays.<Number>asList(3, 4)
 >             )
 >     );
 > df = df.apply(new Function<Number, Number>() {
 >         public Number apply(Number value) {
 >             return value.intValue() * value.intValue();
 >         
 >     });
 > df.flatten();
 [1, 4, 9, 16] }

Parameters
function the function to apply
Returns
  • a new data frame with the function results

public DataFrame<T> cast (Class<T> cls)

Cast this data frame to the specified type.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", "1"));
 > DataFrame<String> dfs = df.cast(String.class);
 > dfs.get(0, 0).getClass().getName();
 java.lang.String 

Returns
  • the data frame cast to the specified type

public final DataFrame<V> coalesce (DataFrame...<? extends V> others)

Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.

Parameters
others the other data frames
Returns
  • this data frame with the overwritten values

public List<V> col (Object column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col("value");
 [1, 2, 3] 

Parameters
column the column name
Returns
  • the list of values

public List<V> col (Integer column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col(1);
 [1, 2, 3] 

Parameters
column the column index
Returns
  • the list of values

public DataFrame<V> collapse ()

public Set<Object> columns ()

Return the column names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value] 

Returns
  • the column names

public static final DataFrame<String> compare (DataFrame<V> df1, DataFrame<V> df2)

public final DataFrame<V> convert (Class...<? extends V> columnTypes)

Convert columns based on the requested types.

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("a", "b", "c");
 > df.append(Arrays.asList("one", 1, 1.0));
 > df.append(Arrays.asList("two", 2, 2.0));
 > df.convert(
 >     null,         // leave column "a" as is
 >     Long.class,   // convert column "b" to Long
 >     Number.class  // convert column "c" to Double
 > );
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.lang.Double] 

Returns
  • the data frame with the converted values

public DataFrame<V> convert (DataFrame.NumberDefault numDefault, String naString)

public DataFrame<V> convert ()

Attempt to infer better types for object columns.

The following conversions are performed where applicable:

  • Floating point numbers are converted to Double values
  • Whole numbers are converted to Long values
  • True, false, yes, and no are converted to Boolean values
  • Date strings in the following formats are converted to Date values:
    2000-01-01T00:00:00+1, 2000-01-01T00:00:00EST, 2000-01-01
  • Time strings in the following formats are converted to Date values:
    2000/01/01, 1/01/2000, 12:01:01 AM, 23:01:01, 12:01 AM, 23:01

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "date");
 > df.append(Arrays.asList("one", "1", new Date()));
 > df.convert();
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.util.Date] 

Returns
  • the data frame with the converted values

public DataFrame<V> count ()

public DataFrame<Number> cov ()

public DataFrame<V> cummax ()

public DataFrame<V> cummin ()

public DataFrame<V> cumprod ()

public DataFrame<V> cumsum ()

public DataFrame<V> describe ()

public DataFrame<V> diff (int period)

public DataFrame<V> diff ()

public final void draw (Container container, DataFrame.PlotType type)

Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.

Parameters
container the container to use for the chart
type the type of plot to draw

public final void draw (Container container)

Draw the numeric columns of this data frame as a chart in the specified Container.

Parameters
container the container to use for the chart

public DataFrame<V> drop (Integer... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop(2).columns();
 [name, value] 

Parameters
cols the indices of the columns to be removed
Returns
  • a shallow copy of the data frame with the columns removed

public DataFrame<V> drop (Object... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop("category").columns();
 [name, value] 

Parameters
cols the names of columns to be removed
Returns
  • a shallow copy of the data frame with the columns removed

public DataFrame<V> dropna ()

public DataFrame<V> dropna (DataFrame.Axis direction)

public Map<Object, DataFrame<V>> explode ()

Return a map of group names to data frame for grouped data frames. Observe that for this method to have any effect a groupBy call must have been done before.

Returns
  • a map of group names to data frames

public DataFrame<V> fillna (V fill)

Returns a view of the of data frame with NA's replaced with fill.

Parameters
fill the value used to replace missing values
Returns
  • the new data frame

public List<V> flatten ()

Return the values of the data frame as a flat list.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.flatten();
 [one, two, alpha, bravo] 

Returns
  • the list of values

public V get (Integer row, Integer col)

Return the value located by the (row, column) coordinates.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get(1, 0);
 bravo 

Parameters
row the row index
col the column index
Returns
  • the value

public V get (Object row, Object col)

Return the value located by the (row, column) names.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList("row1", "row2", "row3"),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get("row2", "name");
 bravo 

Parameters
row the row name
col the column name
Returns
  • the value

public DataFrame<V> groupBy (Object... cols)

Group the data frame rows by the specified column names.

Parameters
cols the column names
Returns
  • the grouped data frame

public DataFrame<V> groupBy (Integer... cols)

Group the data frame rows by the specified columns.

Parameters
cols the column indices
Returns
  • the grouped data frame

public DataFrame<V> groupBy (KeyFunction<V> function)

Group the data frame rows using the specified key function.

Parameters
function the function to reduce rows to grouping keys
Returns
  • the grouped data frame

public Grouping groups ()

public DataFrame<V> head (int limit)

Return a data frame containing the first limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head(3)
 >   .col("value");
 [0, 1, 2] 

Parameters
limit the number of rows to include in the result
Returns
  • the new data frame

public DataFrame<V> head ()

Return a data frame containing the first ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head()
 >   .col("value");
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 

Returns
  • the new data frame

public Set<Object> index ()

Return the index names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("one", 1));
 > df.index();
 [row1] 

Returns
  • the index names

public boolean isEmpty ()

Return true if the data frame contains no data.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();
 true 

Returns
  • the number of columns

public DataFrame<Boolean> isnull ()

Create a new data frame containing boolean values such that null object references in the original data frame yield true and valid references yield false.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", null),
 >         Arrays.asList(null, 2, 3)
 >     )
 > );
 > df.isnull().row(0);
 [false, true] 

Returns
  • the new boolean data frame

public ListIterator<List<V>> iterator ()

Return an iterator over the rows of the data frame. Also used implicitly with foreach loops.

 > DataFrame<Integer> df = new DataFrame<>(
 >         Arrays.asList(
 >             Arrays.asList(1, 2),
 >             Arrays.asList(3, 4)
 >         )
 >     );
 > List<Integer> results = new ArrayList<>();
 > for (List<Integer> row : df)
 >     results.add(row.get(0));
 > results;
 [1, 2] 

Returns
  • an iterator over the rows of the data frame.

public ListIterator<List<V>> itercols ()

public ListIterator<Map<Object, V>> itermap ()

public ListIterator<List<V>> iterrows ()

public ListIterator<V> itervalues ()

public final DataFrame<V> join (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.

 > DataFrame<Object> left = new DataFrame<>("a", "b");
 > left.append("one", Arrays.asList(1, 2));
 > left.append("two", Arrays.asList(3, 4));
 > left.append("three", Arrays.asList(5, 6));
 > DataFrame<Object> right = new DataFrame<>("c", "d");
 > right.append("one", Arrays.asList(10, 20));
 > right.append("two", Arrays.asList(30, 40));
 > right.append("four", Arrays.asList(50, 60));
 > left.join(right)
 >     .index();
 [one, two, three] 

Parameters
other the other data frame
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, KeyFunction<V> on)

Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.

Parameters
other the other data frame
on the function to generate the join keys
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.

Parameters
other the other data frame
join the join type
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.

Parameters
other the other data frame
join the join type
on the function to generate the join keys
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Integer... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

Parameters
other the other data frame
join the join type
cols the indices of the columns to use as the join key
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Integer... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

Parameters
other the other data frame
cols the indices of the columns to use as the join key
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Object... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

Parameters
other the other data frame
cols the names of the columns to use as the join key
Returns
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Object... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

Parameters
other the other data frame
join the join type
cols the names of the columns to use as the join key
Returns
  • the result of the join operation as a new data frame

public DataFrame<V> kurt ()

public int length ()

Return the length (number of rows) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.append(Arrays.asList("charlie", 3));
 > df.length();
 3 

Returns
  • the number of columns

public static final void main (String[] args)

Entry point to joinery as a command line tool. The available commands are:

show
display the specified data frame as a swing table
plot
display the specified data frame as a chart
compare
merge the specified data frames and output the result
shell
launch an interactive javascript shell for exploring data

Parameters
args file paths or urls of csv input data
Throws
IOException if an error occurs reading input

public Map<V, List<V>> map (Object key, Object value)

public Map<Object, List<V>> map ()

Return a map of index names to rows.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("alpha", Arrays.asList(1));
 > df.append("bravo", Arrays.asList(2));
 > df.map();
 {alpha=[1], bravo=[2]}

Returns
  • a map of index names to rows.

public Map<V, List<V>> map (Integer key, Integer value)

public DataFrame<V> max ()

public DataFrame<V> mean ()

Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);
 [4.0] 

Returns
  • the new data frame

public DataFrame<V> median ()

public final DataFrame<V> merge (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.

Parameters
other the other data frame
Returns
  • the result of the merge operation as a new data frame

public final DataFrame<V> merge (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.

Parameters
other the other data frame
Returns
  • the result of the merge operation as a new data frame

public DataFrame<V> min ()

public DataFrame<V> nonnumeric ()

Return a data frame containing only columns with non-numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.nonnumeric().columns();
 [name] 

Returns
  • a data frame containing only the non-numeric columns

public DataFrame<Boolean> notnull ()

Create a new data frame containing boolean values such that valid object references in the original data frame yield true and null references yield false.

 > DataFrame<Object> df = new DataFrame<>(
 >     Arrays.asList(
 >         Arrays.<Object>asList("alpha", "bravo", null),
 >         Arrays.<Object>asList(null, 2, 3)
 >     )
 > );
 > df.notnull().row(0);
 [true, false] 

Returns
  • the new boolean data frame

public DataFrame<Number> numeric ()

Return a data frame containing only columns with numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.numeric().columns();
 [value] 

Returns
  • a data frame containing only the numeric columns

public DataFrame<V> percentChange ()

public DataFrame<V> percentChange (int period)

public DataFrame<V> percentile (double quantile)

Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);
 [4.0] 

Returns
  • the new data frame

public DataFrame<V> pivot (Object row, Object col, Object... values)

public DataFrame<V> pivot (Integer row, Integer col, Integer... values)

public DataFrame<V> pivot (Integer[] rows, Integer[] cols, Integer[] values)

public DataFrame<U> pivot (KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)

public DataFrame<V> pivot (List<Object> rows, List<Object> cols, List<Object> values)

public final void plot (DataFrame.PlotType type)

Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot(PlotType.AREA);
  

Parameters
type the type of plot to display

public final void plot ()

Display the numeric columns of this data frame as a line chart in a new swing frame.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot();
  

public DataFrame<V> prod ()

Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .prod()
 >   .col("value");
 [6.0, 20.0] 

Returns
  • the new data frame

public static final DataFrame<Object> readCsv (String file)

Read the specified csv file and return the data as a data frame.

Parameters
file the csv file
Returns
  • a new data frame
Throws
IOException if an error reading the file occurs

public static final DataFrame<Object> readCsv (InputStream input, String separator, DataFrame.NumberDefault longDefault)

Throws
IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator)

Throws
IOException

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault, String naString)

Throws
IOException

public static final DataFrame<Object> readCsv (InputStream input)

Read csv records from an input stream and return the data as a data frame.

Parameters
input the input stream
Returns
  • a new data frame
Throws
IOException if an error reading the stream occurs

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault)

Throws
IOException

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)

Throws
IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString, boolean hasHeader)

Throws
IOException

public static final DataFrame<Object> readCsv (String file, String separator)

Throws
IOException

public static final DataFrame<Object> readCsv (String file, String separator, String naString, boolean hasHeader)

Throws
IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString)

Throws
IOException

public static final DataFrame<Object> readSql (ResultSet rs)

Read data from the provided query results into a new data frame.

Parameters
rs the query results
Returns
  • a new data frame
Throws
SQLException if an error occurs reading the results

public static final DataFrame<Object> readSql (Connection c, String sql)

Execute the SQL query and return the results as a new data frame.

 > Connection c = DriverManager.getConnection("jdbc:derby:memory:testdb;create=true");
 > c.createStatement().executeUpdate("create table data (a varchar(8), b int)");
 > c.createStatement().executeUpdate("insert into data values ('test', 1)");
 > DataFrame.readSql(c, "select * from data").flatten();
 [test, 1] 

Parameters
c the database connection
sql the SQL query
Returns
  • a new data frame
Throws
SQLException if an error occurs execution the query

public static final DataFrame<Object> readXls (String file)

Read data from the specified excel workbook into a new data frame.

Parameters
file the excel workbook
Returns
  • a new data frame
Throws
IOException if an error occurs reading the workbook

public static final DataFrame<Object> readXls (InputStream input)

Read data from the input stream as an excel workbook into a new data frame.

Parameters
input the input stream
Returns
  • a new data frame
Throws
IOException if an error occurs reading the input stream

public DataFrame<V> reindex (Integer col, boolean drop)

Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0, true)
 >   .index();
 [alpha, bravo] 

Parameters
col the column to use as the new index
drop true to remove the index column from the data, false otherwise
Returns
  • a new data frame with index specified

public DataFrame<V> reindex (Integer... cols)

Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0)
 >   .index();
 [alpha, bravo] 

Parameters
cols the column to use as the new index
Returns
  • a new data frame with index specified

public DataFrame<V> reindex (Integer[] cols, boolean drop)

Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new Integer[] { 0, 1 , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

Parameters
cols the column to use as the new index
drop true to remove the index column from the data, false otherwise
Returns
  • a new data frame with index specified

public DataFrame<V> reindex (Object... cols)

Re-index the rows of the data frame using the specified column names and removing the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo] 

Parameters
cols the column to use as the new index
Returns
  • a new data frame with index specified

public DataFrame<V> reindex (Object[] cols, boolean drop)

Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new String[] { "one", "two" , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

Parameters
cols the column to use as the new index
drop true to remove the index column from the data, false otherwise
Returns
  • a new data frame with index specified

public DataFrame<V> reindex (Object col, boolean drop)

Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo] 

Parameters
col the column to use as the new index
drop true to remove the index column from the data, false otherwise
Returns
  • a new data frame with index specified

public DataFrame<V> rename (Map<Object, Object> names)

public DataFrame<V> rename (Object old, Object name)

public DataFrame<V> resetIndex ()

Return a new data frame with the default index, rows names will be reset to the string value of their integer index.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.resetIndex()
 >   .index();
 [0, 1] 

Returns
  • a new data frame with the default index.

public DataFrame<V> reshape (Integer rows, Integer cols)

Reshape a data frame to the specified dimensions.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(3, 2)
 >   .length();
 3 

Parameters
rows the number of rows the new data frame will contain
cols the number of columns the new data frame will contain
Returns
  • a new data frame with the specified dimensions

public DataFrame<V> reshape (Collection<?> rows, Collection<?> cols)

Reshape a data frame to the specified indices.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(Arrays.asList("0", "1", "2"), Arrays.asList("0", "1"))
 >   .length();
 3 

Parameters
rows the names of rows the new data frame will contain
cols the names of columns the new data frame will contain
Returns
  • a new data frame with the specified indices

public DataFrame<V> retain (Integer... cols)

Create a new data frame containing only the specified columns.

 DataFrame<Object> df = new DataFrame<>("name", "value", "category");
  df.retain(0, 2).columns();
 [name, category] 

Parameters
cols the columns to include in the new data frame
Returns
  • a new data frame containing only the specified columns

public DataFrame<V> retain (Object... cols)

Create a new data frame containing only the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.retain("name", "category").columns();
 [name, category] 

Parameters
cols the columns to include in the new data frame
Returns
  • a new data frame containing only the specified columns

public DataFrame<V> rollapply (Function<List<V>, V> function)

public DataFrame<V> rollapply (Function<List<V>, V> function, int period)

public List<V> row (Object row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Arrays.asList("row1", "row2", "row3"),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row("row2");
 [bravo, 2] 

Parameters
row the row name
Returns
  • the list of values

public List<V> row (Integer row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row(1);
 [bravo, 2] 

Parameters
row the row index
Returns
  • the list of values

public DataFrame<V> select (Predicate<V> predicate)

Select a subset of the data frame using a predicate function.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > for (int i = 0; i < 10; i++)
 >     df.append(Arrays.asList("name" + i, i));
 > df.select(new Predicate<Object>() {
 >         @Override
 >         public Boolean apply(List<Object> values) {
 >             return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
 >         
 >     })
 >   .col(1);
 [0, 2, 4, 6, 8] } 

Parameters
predicate a function returning true for rows to be included in the subset
Returns
  • a subset of the data frame

public void set (Object row, Object col, V value)

Set the value located by the names (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set("row1", "col2", new Integer(7));
 > df.col(1);
 [7, null] 

Parameters
row the row name
col the column name
value the new value

public void set (Integer row, Integer col, V value)

Set the value located by the coordinates (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set(1, 0, new Integer(7));
 > df.col(0);
 [null, 7] 

Parameters
row the row index
col the column index
value the new value

public final void show ()

public int size ()

Return the size (number of columns) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.size();
 2 

Returns
  • the number of columns

public DataFrame<V> skew ()

public DataFrame<V> slice (Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)

public DataFrame<V> slice (Integer rowStart, Integer rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd, Object colStart, Object colEnd)

public DataFrame<V> sortBy (Comparator<List<V>> comparator)

public DataFrame<V> sortBy (Integer... cols)

public DataFrame<V> sortBy (Object... cols)

public DataFrame<V> stddev ()

Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 6, 8)
 >             )
 >     );
 > df.groupBy("name")
 >   .stddev()
 >   .col("value");
 [1.0, 2.0] 

Returns
  • the new data frame

public DataFrame<V> sum ()

Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .sum()
 >   .col("value");
 [6.0, 9.0] 

Returns
  • the new data frame

public DataFrame<V> tail ()

Return a data frame containing the last ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail()
 >   .col("value");
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 

Returns
  • the new data frame

public DataFrame<V> tail (int limit)

Return a data frame containing the last limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail(3)
 >   .col("value");
 [17, 18, 19] 

Parameters
limit the number of rows to include in the result
Returns
  • the new data frame

public Object[] toArray ()

Copy the values of contained in the data frame into a flat array of length #size() * #length().

Returns
  • the array

public U[][] toArray (U[][] array)

public U toArray (Class<U> cls)

Copy the values of contained in the data frame into a array of the specified type. If the type specified is a two dimensional array, for example double[][].class, a row-wise copy will be made.

Returns
  • the array
Throws
IllegalArgumentException if the values are not assignable to the specified component type

public U[] toArray (U[] array)

Copy the values of contained in the data frame into the specified array. If the length of the provided array is less than length #size() * #length() a new array will be created.

Returns
  • the array

public double[][] toModelMatrix (double fillValue)

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

Parameters
fillValue value to replace NA's with
Returns
  • a model matrix

public DataFrame<Number> toModelMatrixDataFrame ()

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

Returns
  • a model matrix

public String toString ()

public final String toString (int limit)

public DataFrame<U> transform (RowFunction<V, U> transform)

public DataFrame<V> transpose ()

Transpose the rows and columns of the data frame.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.transpose().flatten();
 [one, alpha, two, bravo] 

Returns
  • a new data frame with the rows and columns transposed

public List<Class<?>> types ()

Return the types for each of the data frame columns.

Returns
  • the list of column types

public DataFrame<V> unique (Integer... cols)

public DataFrame<V> unique ()

public DataFrame<V> unique (Object... cols)

public final DataFrame<V> update (DataFrame...<? extends V> others)

Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.

Parameters
others the other data frames
Returns
  • this data frame with the overwritten values

public DataFrame<V> var ()

public final void writeCsv (String file)

Write the data from this data frame to the specified file as comma separated values.

Parameters
file the file to write
Throws
IOException if an error occurs writing the file

public final void writeCsv (OutputStream output)

Write the data from this data frame to the provided output stream as comma separated values.

Throws
IOException

public final void writeSql (PreparedStatement stmt)

Write the data from the data frame to a database by executing the provided prepared SQL statement.

Parameters
stmt a prepared insert statement
Throws
SQLException if an error occurs executing the statement

public final void writeSql (Connection c, String sql)

Write the data from the data frame to a database by executing the specified SQL statement.

Parameters
c the database connection
sql the SQL statement
Throws
SQLException if an error occurs executing the statement

public final void writeXls (OutputStream output)

Write the data from the data frame to the provided output stream as an excel workbook.

Throws
IOException if an error occurs writing the file

public final void writeXls (String file)

Write the data from the data frame to the specified file as an excel workbook.

Parameters
file the file to write
Throws
IOException if an error occurs writing the file