HaskellGBM-0.1.0.0

Safe HaskellNone
LanguageHaskell2010

LightGBM.DataSet

Contents

Synopsis

Data Handling

data DataSet Source #

A set of data to use for training or prediction.

Constructors

CSVFile 

newtype HasHeader Source #

Describes whether a CSV data file has a header row or not.

Constructors

HasHeader 

Fields

fromCSV :: HasHeader -> FilePath -> DataSet Source #

Load data from a file.

LightGBM can read data from CSV or TSV files (or from LibSVM formatted files).

Note that the LightGBM data file format traditionally consists of putting the output (aka the labels) in the first column, and the inputs (aka the features) in the subsequent columns. However, you can instruct LightGBM to

  • use some other column for the labels with the LabelColumn parameter, and
  • ignore some of the feature columns with the IgnoreColumns parameter.

fromFrame :: (ColumnHeaders ts, AsVinyl ts, Foldable f, RecAll Identity (UnColumn ts) Show) => f (Record ts) -> FilePath -> IO DataSet Source #

Load data from a Frame into a DataSet

Note that this function causes the creation of a file, and it is up to the caller to control the lifetime of this file. This function is typically called in a bracket or a similar facility. For example:

withSystemTempFile "inputFrame" $ \ inputFile inputHandle -> do
  hClose trainHandle
  dataset <- fromFrame inFrame inputFile

where inFrame is the input Frame.

toCSV Source #

Arguments

:: FilePath

Output path

-> DataSet

The data to persist

-> IO () 

Write a DataSet out to a CSV file.

toFrame :: (RecVec rs, ReadRec rs) => DataSet -> IO (FrameRec rs) Source #

Convert a DataSet out to a Frame.

If the DataSet doesn't have headers, then Frame headers are generated with names column_i where i is the index of the column in question (starting at 0).

Note that this function is polymorphic in the row type - the caller will have to define that explicitly or in context. (See the doctest below for a simplistic example.)

>>> :set -XTypeOperators
>>> :set -XDataKinds
>>> import Frames ((:->))
>>> import qualified Frames as F
>>> import System.IO (hPutStrLn, hClose)
>>> import System.IO.Temp as TMP
>>> :{
  TMP.withSystemTempFile "toFrameTest" $ \ filepath handle -> do
    hPutStrLn handle "results\n1\n2\n3\n4\n5"
    hClose handle
    let ds = fromCSV (HasHeader True) filepath
    dsf <- toFrame ds :: IO (F.Frame (F.Record '["results" :-> Int]))
    return $ length dsf
:}
5
>>> :{
  TMP.withSystemTempFile "toFrameTest" $ \ filepath handle -> do
    hPutStrLn handle "1\n2\n3\n4"
    hClose handle
    let ds = fromCSV (HasHeader False) filepath
    dsf <- toFrame ds :: IO (F.Frame (F.Record '["column_0" :-> Int]))
    return $ length dsf
:}
4