Dataframes from CSV filesΒΆ
Suppose we have a collection of CSV files with data:
data1.csv:
time,temperature,humidity
0,22,58
1,21,57
2,25,57
3,26,55
4,22,53
5,23,59
data2.csv:
time,temperature,humidity
0,24,85
1,26,83
2,27,85
3,25,92
4,25,83
5,23,81
data3.csv:
time,temperature,humidity
0,18,51
1,15,57
2,18,55
3,19,51
4,19,52
5,19,57
and so on.
We can create Dask dataframes from CSV files using dd.read_csv
.
>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')
We can work with the Dask dataframe as usual, which is composed of Pandas dataframes. We can list the first few rows.
>>> df.head()
time temperature humidity
0 0 22 58
1 1 21 57
2 2 25 57
3 3 26 55
4 4 22 53
Or we can compute values over the entire dataframe.
>>> df.temperature.mean().compute()
22.055555555555557
>>> df.humidity.std().compute()
14.710829233324224