
python气象学习笔记(三):Xarray——多维数组及数据集
1.适合于处理netCDF文件,netCDF文件是xarray数据模型的来源,并与dask紧密集成用于并行计算;2.气象数据大都是多维数组,经度、纬度、高度(气压)、时间、要素,五个“维度”;
一、Xarray简介
1.适合于处理netCDF文件,netCDF文件是xarray数据模型的来源,并与dask紧密集成用于并行计算;
2.气象数据大都是多维数组,经度、纬度、高度(气压)、时间、要素,五个“维度”;
二、关键属性
values:保存数组值的numpy.ndarray或类似numpy的数组
dims:每个轴的尺寸名称(例如,(’ x ‘,’ y ‘,’ z '))
coords:一个类似dict的数组(坐标)容器,用来标记每个点(例如,一维数组的数字、日期对象或字符串)
attrs:保存任意元数据(属性)的dict
三、核心数据结构
- DataArray,存储单多维变量和它的坐标
- Dataset,存储多变量,一般变量之间共享同样的坐标
四、示例
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
f = r'.../precip.mon.mean.nc' ##打开.nc文件
##打开数据集
ds = xr.open_dataset(f)
ds
打开后如下所示:
xr.set_options(display_style="text")
ds ##文本形式呈现Xarray的描述信息
#netCDF 形式信息
ds.info()
Dataset
Xarray的 Dataset 可以看做是标签数组(DataArrays)维度对齐后的类字典容器。
异构数据,不同类型,甚至不同维度数。
除了数据集本身类似dict的接口之外,它还可以用来访问Dataset中的任何DataArray。
Dataset可以简单的理解为由多个DataArray组成的集合,具有以下关键属性:
属性 | 描述 |
---|---|
data_vars | 与数据变量相对应的DataArray对象的顺序字典OrderedDict。 |
dims | 从维度名称到每个维度固定长度的字典映射(例如: {lat: 6, lon: 6, time: 8})。 |
coords | 一种类似dict的数组(坐标)容器,用于标记每个点(例如,数字、datetime对象或字符串的一维数组 |
attrs | OrderedDict保存与数据集相关的任意元数据。 |
var_name | 获取变量 |
### Dataset当中的变量variable
ds.data_vars
# Dataset的全局属性
ds.attrs
# dataset坐标
ds.coords#['time']#.shape
ds.keys
变量的选择方式(2种)
# 变量的选择方式 1 dict-like
ds["t"]
# 变量的选择方式 2 dict-like
ds.t
变量的删除和重命名
# 删除变量
ds.drop_vars("r")
# 删除维度
ds.drop_dims("time")
# 重命名变量。注意:需要重新赋值
ds = ds.rename({"t": "temp", "z": "hgt"})
# 分配/追加新变量
ds=ds.assign(tempC=ds.temp-273.15)
all_mean_pre = ds.precip
all_mean_pre.data ##输出平均降雨量
##画出气象图
import matplotlib.pyplot as plt
plt.contourf(all_mean_pre.data[0])
DataArray
属性 | 描述 |
---|---|
data | 用numpy.ndarray 或 dask.array承载数值 |
dims | 获取维度的名字,如(x, y, z) (lat, lon, time)。 |
coords | 获取一个类似于字典的结果,里面包含各个坐标 |
attrs | 获取原始数据的属性,比如变量的名字、单位等 |
name | 数组的任意名称 |
all_mean_pre = ds.precip
all_mean_pre.values
## # 获取数值,类型为ndarray
array([[[0. , 0. , 0. , …, 0. ,
0. , 0. ],
[0. , 0. , 0. , …, 0. ,
0. , 0. ],
[0. , 0. , 0. , …, 0. ,
0. , 0. ],
…,
[0.9812404 , 0.9320181 , 0.8138066 , …, 0.91615057,
0.811867 , 0.9017285 ],
[0.7499759 , 0.77453935, 0.67079806, …, 0.6945452 ,
0.73984015, 0.7661884 ],
[0.6614727 , 0.5773457 , 0.617506 , …, 0.5728163 ,
0.6059005 , 0.70766836]],
[[0.33657596, 0.31659493, 0.32447177, ..., 0.2847315 ,
0.29603544, 0.3030814 ],
[0.2571579 , 0.28121045, 0.28316426, ..., 0.25939476,
0.25399455, 0.27813244],
[0.08254394, 0.10533275, 0.07907505, ..., 0.17585263,
0.19220772, 0.1084968 ],
...,
[1.0672555 , 1.1468236 , 1.1544616 , ..., 1.0722239 ,
1.092354 , 1.0788598 ],
[1.0344999 , 1.0159414 , 1.0127925 , ..., 0.920662 ,
1.0546967 , 0.98493016],
[0.85120386, 0.9196123 , 0.8411766 , ..., 0.9757969 ,
0.8559053 , 0.8240325 ]],
[[0.27318272, 0.2796247 , 0.18690911, ..., 0.17092893,
0.20687902, 0.25047475],
[0.22695705, 0.2617566 , 0.22545393, ..., 0.18492775,
0.22260915, 0.21192454],
[0.17479736, 0.19792725, 0.17848302, ..., 0.15230082,
0.16818552, 0.1617133 ],
...,
[0.1808518 , 0.16238523, 0.20151271, ..., 0.13494967,
0.15055057, 0.16453111],
[0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[0. , 0. , 0. , ..., 0. ,
0. , 0. ]],
...,
[[0.4533221 , 0.4402798 , 0.41549182, ..., 0.48901272,
0.47621587, 0.46599928],
[0.27381003, 0.26660118, 0.24363908, ..., 0.35271075,
0.3131813 , 0.2841748 ],
[0.28883603, 0.28670043, 0.27927038, ..., 0.30791545,
0.30115038, 0.29631212],
...,
[0.49895236, 0.46569303, 0.4533448 , ..., 0.5721229 ,
0.5297965 , 0.51515126],
[0.27511287, 0.27086392, 0.27288792, ..., 0.25866613,
0.2664522 , 0.27096364],
[0.19014314, 0.19430798, 0.20378244, ..., 0.18093485,
0.18627143, 0.18952842]],
[[0.9346109 , 0.92431957, 0.9107802 , ..., 0.9835282 ,
0.96182984, 0.94652426],
[0.75424147, 0.7374766 , 0.7250197 , ..., 0.8717252 ,
0.8284536 , 0.79627967],
[0.3659747 , 0.3257469 , 0.29567364, ..., 0.64336365,
0.5279921 , 0.43318123],
...,
[0.65148056, 0.67094034, 0.7122431 , ..., 0.6006898 ,
0.599246 , 0.6209516 ],
[0.7749112 , 0.7930069 , 0.8196056 , ..., 0.77600324,
0.7730924 , 0.7712941 ],
[1.0588437 , 1.0734682 , 1.0827916 , ..., 1.041447 ,
1.0467885 , 1.0490777 ]],
[[0.9145059 , 0.89286405, 0.87224716, ..., 1.0167563 ,
0.96425694, 0.9438042 ],
[0.41419348, 0.39934734, 0.3934382 , ..., 0.5473752 ,
0.5015972 , 0.44954816],
[0.42240566, 0.39104244, 0.36753407, ..., 0.47966093,
0.47993496, 0.43855542],
...,
[1.5263629 , 1.4504796 , 1.3817604 , ..., 1.6753782 ,
1.6673746 , 1.6095951 ],
[2.117472 , 2.1222656 , 2.1488454 , ..., 2.065093 ,
2.0847132 , 2.1085727 ],
[1.9891012 , 1.9910223 , 1.990615 , ..., 2.00997 ,
2.0054274 , 1.9869418 ]]], dtype=float32)
ds.dims # 获取维度
FrozenMappingWarningOnValuesAccess({‘lat’: 72, ‘lon’: 144, ‘time’: 547, ‘nv’: 2})
# 获取坐标
ds.coords
Coordinates:
- lat (lat) float32 288B -88.75 -86.25 -83.75 … 83.75 86.25 88.75
- lon (lon) float32 576B 1.25 3.75 6.25 8.75 … 351.2 353.8 356.2 358.8
- time (time) datetime64[ns] 4kB 1979-01-01 1979-02-01 … 2024-07-01
# 获取属性
ds.attrs
{‘Conventions’: ‘CF-1.0’,
‘curator’: ‘Dr. Jian-Jian Wang\nESSIC, University of Maryland College Park\nCollege Park, MD 20742 USA\nPhone: +1 301-405-4887’,
‘citation’: ‘Adler, R.F., G.J. Huffman, A. Chang, R. Ferraro, P. Xie, J. Janowiak, B. \nRudolf, U. Schneider, S. Curtis, D. Bolvin, A. Gruber, J. Susskind, P. \nArkin, 2003: The Version 2 Global Precipitation Climatology Project \n(GPCP) Monthly Precipitation Analysis (1979 - Present). J. Hydrometeor., \n4(6), 1147-1167.’,
‘title’: ‘GPCP Version 2.3 Combined Precipitation Dataset (Final)’,
‘platform’: ‘NOAA POES (Polar Orbiting Environmental Satellites)’,
‘source_obs’: ‘CDR RSS SSMI/SSMIS Tbs over ocean \nCDR SSMI/SSMIS rainrates over land (Ferraro) \nGeo-IR (Xie) calibrated by SSMI/SSMIS rainrates for sampling \nTOVS/AIRS empirical precipitation estimates at higher latitudes \n(ocean and land) \nGPCC gauge analysis to bias correct satellite estimates over land and \nmerge with satellite based on sampling \nOLR Precipitation Index (OPI) (Xie) used for period before 1988’,
‘documentation’: ‘http://www.esrl.noaa.gov/psd/data/gridded/data.gpcp.html’,
‘version’: ‘V2.3’,
‘Acknowledgement’: ‘\n’,
‘contributor_name’: ‘Robert Adler University of Maryland \nGeorge Huffman NASA Goddard Space Flight Center \nDavid Bolvin NASA Goddard Space Flight Center/SSAI \nEric Nelkin NASA Goddard Space Flight Center/SSAI \nUdo Schneider GPCC, Deutscher Wetterdienst \nAndreas Becker GPCC, Deutscher Wetterdienst \nLong Chiu George Mason University \nMathew Sapiano University of Maryland \nPingping Xie Climate Prediction Center, NWS, NOAA \nRalph Ferraro NESDIS, NOAA \nJian-Jian Wang University of Maryland \nGuojun Gu University of Maryland’,
‘dataset_title’: ‘Global Precipitation Climatology Project (GPCP) Monthly Analysis Product’,
‘description’: ‘https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00970’,
‘source’: ‘https://www.ncei.noaa.gov/data/global-precipitation-climatology-project-gpcp-monthly/access/’,
‘source_documentation’: ‘https://www.ncdc.noaa.gov/cdr/atmospheric/precipitation-gpcp-monthly’,
‘NCO’: ‘4.6.9’,
‘history’: ‘Generated at NOAA/ESRL PSD Sep 9 2016 based on data from source \nand stored in single netCDF4 file’,
‘References’: ‘http://www.psl.noaa.gov/data/gridded/data.gpcp.html’,
‘data_comment’: ‘Interim data covers 2024/06 through latest.’}
下面对DataArray进行操作
data = np.random.rand(4,3)
locs = ['IA','IL','IN']
times = pd.date_range("2000-01-01",periods=4)
foo = xr.DataArray(data,coords=[times,locs],dims=["time","space"])
foo
xr.DataArray(data)
##只有 data 是必须指定的;所有其它参数都可以使用默认值。
<xarray.DataArray (dim_0: 4, dim_1: 3)> Size: 96B array([[0.18121826, 0.0349818 , 0.16817579], [0.68789046, 0.00737045, 0.20960189], [0.44500228, 0.90834011, 0.53500474], [0.50612187, 0.42457244, 0.45454106]]) Dimensions without coordinates: dim_0, dim_1
foo.data
array([[0.18121826, 0.0349818 , 0.16817579], [0.68789046, 0.00737045, 0.20960189], [0.44500228, 0.90834011, 0.53500474], [0.50612187, 0.42457244, 0.45454106]])
维度名称始终存在于 xarray 的数据模型中:如果不显式提供,则使用默认格式 dim_N 自动创建。 但是,坐标始终是可选的,并且 dimensions 没有自动坐标标签。
可以用以下几种方式指定坐标:
长度等于维度值的列表,为每个维度提供坐标标签。每个值都必须采用以下形式之一:
DataArray 或 Variable
格式为 (dims, data[, attrs]) 的元组,会被转变为Variable的参数
pandas 对象或标量数值,被转换为 DataArray
一维数组或列表,它被解释为沿相同名称维度的一维坐标变量的值
{coord_name:coord} 的字典,其中值的形式与列表相同。以字典形式提供坐标可以使用不对应维度的坐标(稍后会详细介绍)。如果您以字典形式提供 coords,则必须显式指定 dim。
xr.DataArray(
data,
coords={
"time":times,
"space":locs,
"const":42,
"ranking":("space",[1,2,3]),
},
dims = ["time","space"]
)
更多内容,请参考链接: https://blog.perillaroc.wang/post/2020/04/2020-04-07-xarray-guide-data-structure/
更多推荐
所有评论(0)