| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
LightGBM.DataSet
Contents
- data DataSet = CSVFile {}
- newtype HasHeader = HasHeader {}
- fromCSV :: HasHeader -> FilePath -> DataSet
- fromFrame :: (ColumnHeaders ts, AsVinyl ts, Foldable f, RecAll Identity (UnColumn ts) Show) => f (Record ts) -> FilePath -> IO DataSet
- toCSV :: FilePath -> DataSet -> IO ()
- toFrame :: (RecVec rs, ReadRec rs) => DataSet -> IO (FrameRec rs)
Data Handling
A set of data to use for training or prediction.
Describes whether a CSV data file has a header row or not.
fromCSV :: HasHeader -> FilePath -> DataSet Source #
Load data from a file.
LightGBM can read data from CSV or TSV files (or from LibSVM formatted files).
Note that the LightGBM data file format traditionally consists of putting the output (aka the labels) in the first column, and the inputs (aka the features) in the subsequent columns. However, you can instruct LightGBM to
- use some other column for the labels with the
LabelColumnparameter, and - ignore some of the feature columns with the
IgnoreColumnsparameter.
fromFrame :: (ColumnHeaders ts, AsVinyl ts, Foldable f, RecAll Identity (UnColumn ts) Show) => f (Record ts) -> FilePath -> IO DataSet Source #
Load data from a Frame into a DataSet
Note that this function causes the creation of a file, and it is up
to the caller to control the lifetime of this file. This function
is typically called in a bracket or a similar
facility. For example:
withSystemTempFile "inputFrame" $ \ inputFile inputHandle -> do hClose trainHandle dataset <- fromFrame inFrame inputFile
where inFrame is the input Frame.
Write a DataSet out to a CSV file.
toFrame :: (RecVec rs, ReadRec rs) => DataSet -> IO (FrameRec rs) Source #
Convert a DataSet out to a Frame.
If the DataSet doesn't have headers, then Frame headers are
generated with names column_i where i is the index of the
column in question (starting at 0).
Note that this function is polymorphic in the row type - the caller will have to define that explicitly or in context. (See the doctest below for a simplistic example.)
>>>:set -XTypeOperators>>>:set -XDataKinds>>>import Frames ((:->))>>>import qualified Frames as F>>>import System.IO (hPutStrLn, hClose)>>>import System.IO.Temp as TMP>>>:{TMP.withSystemTempFile "toFrameTest" $ \ filepath handle -> do hPutStrLn handle "results\n1\n2\n3\n4\n5" hClose handle let ds = fromCSV (HasHeader True) filepath dsf <- toFrame ds :: IO (F.Frame (F.Record '["results" :-> Int])) return $ length dsf :} 5
>>>:{TMP.withSystemTempFile "toFrameTest" $ \ filepath handle -> do hPutStrLn handle "1\n2\n3\n4" hClose handle let ds = fromCSV (HasHeader False) filepath dsf <- toFrame ds :: IO (F.Frame (F.Record '["column_0" :-> Int])) return $ length dsf :} 4